Dumpster Diving: Practical Debugging Tactics
Problem resolved with a 7-character revision update
A recent development challenge sent me deep into the supporting libraries, frameworks, and build systems that underlie my OSS project (DepanFx). Many of these debugging activities involve dumpster diving - opening a software component’s packaging in order to understand the details.
It's a fascinating activity, where you get to see a lot of other people's code. You have to be prepared for lots of indirection. I’ve been gobsmacked by something so obvious I forgot to see it.
This debugging example is more routine, with dead ends and probing investigation. The resolution is subtle and has a bit of a surprise finish. The details of the problem aren’t overly specific. The need to look into the internals of an application’s underlying components is an essential debugging skill.
What’s Gone Wrong
After learning some details of Java’s JPMS module system, it was time to move on to the read side of serialization. Just as things were looking good, the application threw an exception as part of the XML read initialization. The exception indicates that the implementation for xmlpull
interface was not available.
Caused by: org.xmlpull.v1.XmlPullParserException: caused by: org.xmlpull.v1.XmlPullParserException: resource not found: /META-INF/services/org.xmlpull.v1.XmlPullParserFactory make sure that parser implementing XmlPull API is available
Solving this required chasing a few dead ends and a good bit of “dumpster diving”. For software debugging, dumpster diving is opening up a software component’s packaging and digging into the internals. The goal is to gain some understanding of the component’s interaction with other software elements. Sometimes this involves cracking open a Jar file to inspect the packaged metadata. Sometimes this involves stepping into platform code to understand the behavior. Sometimes it's digging through volumes of debug dump data.
In the end, it was a bug in the build tooling. That bug was fixed last month. The fix was simply to upgrade one of the build plugins from version 0.0.14 to 0.1.0.
Theory 1: It’s The Service
My earlier work with Java modules resolved some questions about the module packaging of XStream. This required using module-aware components with a defined automatic module name to bridge the gap between full JPMS modules and XStream’s need to stay in the unnamed module.
Based on the error, access to the valid mxparser
was failing. Common culprits for unavailable access errors are missing references for indirect dependencies and missing uses. The symptoms suggested a missing service reference in one of the module definitions.
I was able to check that the relevant libraries had the correct module and service definitions. Most importantly among the dependent libraries was mxparser. This Jar file contains the desired implementation of the XmlPull API.
A bit of simple dumpster diving on the mxparser Jar file provided the details of the component’s structure. It is published as a non-modular component, with an automatic module name specified in the MANIFEST.MF file
. As a non-modular component, the service implementations defined in the META_INF/services/
should be visible to the XmlPull component. But this wasn’t working.
The use of services was a new challenge for modular Java components. Before JPMS, software components could use services from any provider. Getting the right implementation can require a direct build dependency. With JPMS, modules need to define the services that they use. These JPMS obligations lead to a requires statement and a uses statement in the module-info file.
But this service use was coming from a module-aware component that was specifically intended to bridge the gap. The indirect dependency via XStream might need to be explicit versus transitive. Since the service use originates in a non-modular component, there is no (compile-time) mechanism to add the service definition.
The missing references theory led to several added build dependencies. These were added to various build definitions. A trial run added the service definition in the modular components that surrounded the XmlPull use.
Both surprising and disappointing, these efforts did nothing to solve the problem. The XmlPullParserException occurred for every attempt to read from a file.
Back To Simplicity
The earlier work for XStream integration used a simple three-component application, Xdemo. The lessons learned from XDemo for XStream integration have been applied to the larger application, DepanFX. DepanFX currently consists of twelve components. DepanFX includes both the Spring Boot and JavaFX frameworks. There are lots of possible confounding variables.
The simpler XDemo application would provide an easier environment to work out the correct set of dependency declarations. The unmarshal()
method in the TreeConverter
needed a bit of work, and the main()
method needed to conditionally perform a load()
.
This was an objectively small change, about 40 lines of commit change. This change added a file load capability to the XDemo application. The relative size impact on the application was larger. Adding those lines to the original 240 lines was a x1.17 size increase.
Once these changes were integrated and behaving, I fired up the XDemo application in the debugger. I was expecting to have the same exception, in an environment that would be easier to work with.
The XDemo application worked fine. The unmarshal()
method in the TreeConverter
needed a bit of work, but the discovery of mxparser
for the XmlPull service worked perfectly.
This was another surprise. Something in the complexity of the DepanFX structure was leading to the problem. Time for some serious dumpster diving.
Dumpster Diving XStream
Loading a file in the smaller XDemo works as expected, but this action fails in the larger DepanFX fails. This information is very constructive. The immediate debugging goal becomes very explicit. We need to answer the question “Where does the behavior differ?”
The XDemo and DepanFX applications use the XStream facade in a similar fashion. The fromXML()
method expects an open and accessible java.io.Reader
. It should provide a deserialized object after successful completion. Instead, the use in DepanFX results in the thrown exception. What causes this difference?
Stepping into the fromXML()
method takes the debugger into a cluster of methods with similar names and different arguments. It’s a common debugging experience to rattle through a couple of interface layers before getting to the core function. One area for attention is the repackaging of supplied parameters as partial results for an internal response.
In this case, we quickly come to a createReader()
call that repackages the supplied reader.
Stepping into this code takes the application into the XmlPull implementation. Here we find a Singleton pattern for the XmlPullParserFactory
. The creation of the XmlPullParserFactory
is lazy. The public createDefaultParser()
method is synchronized to ensure proper initialization of the XmlPullParserFactory
.
Continuing the step-into adventure, the construction of the XmlPullParserFactory
leads to a newInstance()
method. A quick review of the debug context from the IDE shows that the class is the one from the XmlPull Jar file. This confirms that some of the dependency configuration is correct.
Debugging further in the DepanFX usage, the XmlPullParserFactory
fails when it tries to load a resource. The newInstance()
method relies on the classloader’s getResourceAsStream()
to locate a service definition file named /META-INF/services/org.xmlpull.v1.XmlPullParserFactory
.
This behavior is consistent with the Java specification for service definitions, but it does not use the ServiceLoader
API defined since the Java 1.6 specification. Granted, the ServiceLoader API is quite awkward. The reasons for this custom service lookup may vary, including subtle behavioral differences, the awkwardness of the supplied API, and legacy support for pre-Java 1.6 applications.
Isolating the point of failure seems helpful, but there are a handful of unresolved context questions. The code is relying on a context
variable for the resource lookup. That variable starts as null
, but winds up being the class of the XmlPullParserFactory
. Does the XmlPullParserFactory
class provide any other ways to configure the result?
Repeating the dumpster dive in the XDemo application confirms that this is the location where the behaviors split. In the XDemo application, the resource is located and provides the fully-qualified class name (FQCN) for the desired io.github.xstream.mxparser.MXParser
instance.
Dumpster Diving The Classloader
Why are two similarly structured applications having different results when they try to load a resource? Debugging this will require digging into the classloader.
Dumpster diving into a classloader can be a daunting task. The Java classloaders often have thousands of classes, especially for applications that depend on large platforms and libraries. The data structures are opaque, and often include lazy load elements that confound simple inspection. This adventure turns out more simply and the results are more direct.
As an additional challenge, you may need to disable or change the way your debugger handles step filters. Since there should never be any bugs in the JVM components, many debuggers disable stepping into the JVM methods. This includes the classloader’s methods such as getResourceAsStream()
.
For VSCode, you will want to ensure that skipping the JDK and the ClassLoader are not on the list of classes to skip. You can check the settings.json
file to confirm that skipClasses
behavior is disabled.
"java.debug.settings.stepping.skipClasses": [
// "$JDK", "java.lang.ClassLoader",
],
The getResourceAsStream()
method goes right to work. It checks if the module for the class instance has a name. Based on that choice, it follows a couple of different execution paths.
In the XDemo case, the XmlPullParserFactory
class is in the unnamed module and uses the system classloader to obtain the resource. In the DepanFX case, the XmlPullParserFactory
class is in a named module. As a named module, it can only access its own resources and the lookup fails.
One striking detail is that the module for the XmlPullParserFactory
is named xmlpull
. This is the name the JPMS would assign for a simple Jar file that happens to be on the module path. Somehow, the DepanFX environment is causing the XmlPull component to be placed on the module path.
Dumpster Diving Application Startup
To confirm the theory that the XmlPull Jar file is winding up the module path, we need to see the actual Java executable command line. With gradle, this can be accomplished by adding the --debug
flag with a run target. The --debug
option produces a lot of output, but it confirms the diagnosis.
This shows an abbreviated version of the debug information from the XDemo application.
[INFO] [org.gradle.process.internal.DefaultExecHandle] Starting process 'command 'C:\Program Files\Eclipse Adoptium\jdk-17.0.7.7-hotspot\bin\java.exe''. Working directory: XdemoApp Command: C:\Program Files\Eclipse Adoptium\jdk-17.0.7.7-hotspot\bin\java.exe -Dfile.encoding=windows-1252 -Duser.country\=US -Duser.language=en -Duser.variant -cp LIB\com.thoughtworks.xstream\xstream\1.4.20\e2315b8b2e95e9f21697833c8e56cdd9c98a5ee\xstream-1.4.20.jar;LIB/xmlpull\xmlpull\1.1.3.1\2b8e230d2ab644e4ecaa94db7cdedbc40c805dfa\xmlpull-1.1.3.1.jar --module-path XDEMO\XdemoApp\build\libs\XdemoApp.jar;XDEMO\XdemoTreeXstream\build\libs\XdemoTreeXstream.jar;XDEMO\XdemoTree\build\libs\XdemoTree.jar;LIB\io.github.x-stream\mxparser\1.2.2\476fb3b3bb3716cad797cd054ce45f89445794e9\mxparser-1.2.2.jar --module xdemo.app/demo.xdemo.XdemoApp test_202308181735.out
The important detail here is the -cp
section (it follows after -Duser.variant
). Here the XStream and XmlPull Jar files are included on the classpath.
For the DepanFX application, the debug information for the run command is very different. Most importantly, there is only a module path(--module-path
) parameter and no classpath (-cp
) command line parameter. Both the XStream and the XmlPull Jar files are incorrectly on the module path.
2023-10-03T11:17:52.251-0500 [INFO] [org.gradle.process.internal.DefaultExecHandle] Starting process 'command 'C:\Program Files\Eclipse Adoptium\jdk-17.0.7.7-hotspot\bin\java.exe''. Working directory: DepanFxApp Command: C:\Program Files\Eclipse Adoptium\jdk-17.0.7.7-hotspot\bin\java.exe --add-modules javafx.controls,javafx.fxml --module-path DEPAN_FX\*;LIB\com.google.guava\guava\31.1-jre\60458f877d055d0c9114d9e1a2efb737b4bc282c\guava-31.1-jre.jar;LIB\SPRING\*;LIB\net.rgielen\javafx-weaver-core\1.3.0\9abbf9c67198929b2709d0ceb773af277e1fa46b\javafx-weaver-core-1.3.0.jar;LIB\GUAVA\*;LIB\com.google.code.findbugs\jsr305\3.0.2\25ea2e8b0c338a877313bd4672d3fe056ea78f0d\jsr305-3.0.2.jar;LIB\org.checkerframework\checker-qual\3.12.0\d5692f0526415fcc6de94bb5bfbd3afd9dd3b3e5\checker-qual-3.12.0.jar;LIB\com.google.errorprone\error_prone_annotations\2.11.0\c5a0ace696d3f8b1c1d8cc036d8c03cc0cbe6b69\error_prone_annotations-2.11.0.jar;LIB\com.google.j2objc\j2objc-annotations\1.3\ba035118bc8bac37d7eff77700720999acd9986d\j2objc-annotations-1.3.jar;LIB\jakarta.annotation\jakarta.annotation-api\2.1.1\48b9bda22b091b1f48b13af03fe36db3be6e1ae3\jakarta.annotation-api-2.1.1.jar;LIB\org.yaml\snakeyaml\1.33\2cd0a87ff7df953f810c344bdf2fe3340b954c69\snakeyaml-1.33.jar;LIB\ch.qos.logback\logback-classic\1.4.11\54450c0c783e896a1a6d88c043bd2f1daba1c382\logback-classic-1.4.11.jar;LIB\org.apache.logging.log4j\log4j-to-slf4j\2.20.0\d37f81f8978e2672bc32c82712ab4b3f66624adc\log4j-to-slf4j-2.20.0.jar;LIB\org.slf4j\jul-to-slf4j\2.0.7\a48f44aeaa8a5ddc347007298a28173ac1fbbd8b\jul-to-slf4j-2.0.7.jar;LIB\org.slf4j\slf4j-api\2.0.7\41eb7184ea9d556f23e18b5cb99cad1f8581fc00\slf4j-api-2.0.7.jar;LIB\com.thoughtworks.xstream\xstream\1.4.20\e2315b8b2e95e9f21697833c8e56cdd9c98a5ee\xstream-1.4.20.jar;LIB\org.springframework\spring-jcl\6.0.11\c9b16cdb6d4874ba4118fcdd4b0335f6278b378\spring-jcl-6.0.11.jar;LIB\io.github.x-stream\mxparser\1.2.2\476fb3b3bb3716cad797cd054ce45f89445794e9\mxparser-1.2.2.jar;LIB\ch.qos.logback\logback-core\1.4.11\2f9f280219a9922a74200eaf7138c4c17fb87c0f\logback-core-1.4.11.jar;LIB\org.apache.logging.log4j\log4j-api\2.20.0\1fe6082e660daf07c689a89c94dc0f49c26b44bb\log4j-api-2.20.0.jar;LIB\xmlpull\xmlpull\1.1.3.1\2b8e230d2ab644e4ecaa94db7cdedbc40c805dfa\xmlpull-1.1.3.1.jar;LIB\JAVAFX\**\javafx-fxml-20-win.jar;LIB\JAVAFX\**\javafx-controls-20-win.jar;LIB\JAVAFX\**\javafx-graphics-20-win.jar;LIB\JAVAFX\**\javafx-base-20-win.jar --patch-module depanfx.app\=DEPAN_FX\DepanFxApp\build\resources\main --module depanfx.app/com.pnambic.depanfx.DepanFxApplication -Dfile.encoding=windows-1252 -Duser.country\=US -Duser.language=en -Duser.variant --module depanfx.app/com.pnambic.depanfx.DepanFxApplication
There are a lot of other changes here, and several subtle details.
There is a new
--add-modules javafx.controls,javafx.fxml
parameter and values. This suggests some interaction with the JavaFX components.A change of placement for the
--module-path
parameter, previously located after the-Duser.variant
parameter.The
--module
parameter is now specified twice with the same value in each occurrence.There is a newly added
--patch-module
parameter.Lots of added dependencies. Many indirect dependencies and many from the Spring platform components.
The JavaFX Jar files are unique for the platform, with a
win
segment in the file name.
It seems that many unidentified actors are mucking with the command line contents of the run command.
Dumpster Diving The JavaFX Plugin
Now we need to figure out how the build system is changing the application launch process. In this case, the usual suspects are the build plugins. We might have to check them all. The presence of the --add-modules
parameter indicates some interactions with the JavaFX plugin. That’s as good a place to start as any.
The documentation for the org.openjfx.javafxplugin
is fairly thin. The primary discussion of its behavior is the one-line
Simplifies working with JavaFX 11+ for gradle projects.
This is not uncommon for Open Source Software. If you need details, the code provides them.
The code for the plugin is fairly simple, with five files and a few hundred lines of code. The plugin itself is only 128 lines [JavaFxPlugin]. The plugin implements the apply()
method with a couple of private helper methods.
The interesting part is at the end of the apply()
method. The JavaFX plugin searches its project for tasks named run
that are associated with the application
plugin. If it finds them, it alters the classpath and writes the module path. It also introduces an --add-modules
option. This is the smoking gun that focuses attention on the JavaFX plugin.
The plugin's internal putJavaFXJarsOnModulePathForClasspathApplication()
method encapsulates a good bit of complexity. The filtering of the classpath and the construction of the module path seemed symmetric, but a clear understanding of the consequences was difficult based on an inspection of the code.
Git’s blame feature provides a link to the code’s author, a possible source of clarity about the code’s intent. It also reveals that Git is showing code that is only a few weeks old.
A bit more research into the release and migration notes provides this detail for the JavaFX gradle plugin version 0.1.0
In the previous version, the classpath/module path was rewritten. This is no longer the case.
Included in the major 0.1.0 release is the fix for this problem. Their Issue #133. Now solved, at least from my perspective.
Since XDemo had no dependency on the JavaFX plugin, the Java plugin’s construction of the classpath was unchanged and everything worked. DepanFX’s dependency on the 0.0.14 version was bringing in broken code that incorrectly changed the classpath.
In the end, it was a bug in the build tooling. I resolved the problem with an upgrade to one of the build plugins.
It’s a bit like the parable of the old engineer and the hammer. This was a 7-character change from version 0.0.14 to 0.1.0. You merely need to know which set of characters to change. After that change, everything worked fine.
Discussion
I never blame the tooling. Over and over, I’ve learned that I might misunderstand or disagree with the compiler, the virtual machine, or the operating system. But blaming an erroneous behavior on the tooling is almost always an erroneous behavior.
Until it isn’t. In this case, the plugin did the wrong thing. It was very nice to resolve this issue by updating the version number to a significant release.
There are a lot of secrets in the build process. This is kind of a fact of life. It is still annoying when one tool messes with another tool’s careful work and gets it wrong. There are lots of details and angles in any significant application, but it is still extremely difficult to discover the causes of undesirable behavior. It is easy to say that better documentation would help, but who has the time to write up all those details and keep everyone informed about changes? These are daunting practical tasks.
Conclusion
In the end, this debugging activity involved a variety of dumpster diving activities
Dive into the Jar file structure to ensure that the correct metadata is being provided.
Dive into the internal implementation of the XStream platform.
Dive into the internal implementation of the Classloader’s get resource behaviors.
Dive into lengthy debug outputs.
Dive into the mechanics of gradle’s application run task.
Dive into the internal implementation of the JavaFX plugin.
All manner of tools are required for these exploratory tasks.
Access to the internals of Jar files. The
jar -xf
command will work, but this is easier if the IDE or file browser can open and inspect these zip-structured files.A debugger that can step into methods, including external and platform libraries and platform libraries. This is easier with an IDE and source code for the libraries.
Access to the source code of plugins. For this problem, the GitHub repository was a practical solution.
Debugging build and dependency problems can be a challenging problem. Solving these problems can require poking into the internals of the platform. It was a relief that this problem was solved with a simple version update.