Exploring Java's Module System and Gradle's Integration

Exploring Java's Module System and Gradle's Integration

The Riddle of Automatic Modules

·

15 min read

One of the riddles that arose while trying to use XStream in a modular Java (JPMS) application was the question of what makes a jar into a module.

In hindsight, the answer is clear: A jar is a module if it is placed on the JVM’s modulepath.

Despite this answer’s apparent simplicity and clarity, a couple of tricky bits remain.

  • How does a developer direct a jar onto the module path?

  • What happens to dependencies that are not modules?

As it turns out, no amount of reading the Java Language Specification or the Java Virtual Machine Specification will answer these questions. The answer lies within Gradle.

The answers to these questions are obfuscated by the opaque and helpful behavior of JPMS-compliant software development tools. As it turns out, Gradle (version >= 7.0.0) sorts a project’s dependent jars into modular and non-modular jars.

Working With Modules

Let’s consider the development of a modern Java application. In addition to adopting some modern GUI libraries (JavaFX), the application is taking advantage of the Java Platform Module System (JPMS). Compared to early versions of Java (pre-Java 9), the module system provides stronger encapsulation guarantees based on explicit dependencies and explicit exports.

With non-modular Java projects, using most external libraries requires nothing more than their addition to the dependency declarations in the project’s build.gradle file’s dependency clause.

dependencies {
  implementation "com.google.guava:guava:${guavaVersion}"
  implementation 'org.springframework.boot:spring-boot-starter'
  implementation 'org.springframework.boot:spring-boot-autoconfigure'
  implementation "org.slf4j:slf4j-api:${slf4jVersion}"
}

When a library is added to a project’s dependencies as an implementation dependency, the library becomes available at compilation and at runtime. Before modular Java projects, this meant that all of the implementation libraries were added to the classpath at runtime.

With a modular Java project, referencing these external libraries from code in a dependent module requires an additional step. The module name that goes with the external jar must also be added to a dependent module’s module-info file.

Once the external jar’s module name is added to the module definition, the dependent module’s code can use the external jar’s exposed packages after the external jar’s module name is added as a module dependency.

This leads to module requirement declarations that are somewhat redundant concerning the build definition.

    requires com.google.common;
    requires org.slf4j;
    requires spring.boot;
    requires spring.boot.autoconfigure;
    requires spring.context;

If the module name isn’t part of the library’s direct documentation, a bit of dumpster diving into the jar generally does the trick. Unpacking the jar will normally reveal a module-info file or an Automatic-Module-Name property in the MANIFEST.MF file. From there, you’re golden.

But what happens when a dependent library does not have either one of these properties?

Adding a Non-Modular Library

The XStream library is a sweet little system for serializing Java objects. It is easy to use and supports XML serialization of many Java classes. XStream can serialize a broad range of objects, not just ones that conform to the JavaBeans conventions. In particular, objects do not need a no-arg constructor.

A generic class to serialize an object to XML with XStream can be as simple as a single line of code.

package demo.xdemo.tree.xstream;

import java.io.IOException;
import java.io.OutputStreamWriter;

import com.thoughtworks.xstream.XStream;

public class SimplePersist<T> {

  public void save(OutputStreamWriter dst, T item) throws IOException {
    new XStream().toXML(item, dst);
  }
}

However, getting this kind of generic serialization to work can be a tricky business. Complex serialization cases can involve deep traversals through an object’s dependency graph. This dependency graph can flow through many packages and can depend on JVM internal types. Many of the encountered objects are serialized via reflection APIs. Successful serialization can require access to unexpected packages and classes.

To support rich and complex object dependency graphs, the XStream developers have chosen to remain non-modular. As the language specification clarifies, the rules for unnamed modules are designed to maximize their interoperation with other modules.

Java Language Specification
7.7.5 Unnamed Modules
An unnamed module reads every observable module

By remaining non-modular, XStream avoids forcing every dependent project to become modular.

So what happens when we add the XStream dependency to the Gradle build? This line will add XStream to the compile and runtime dependencies.

  implementation "com.thoughtworks.xstream:xstream:${xstreamVersion}"

The next trick is to reference this library in the module-info files for the dependent project. For this, we need to figure out the module name. But the XStream jar contains neither a module-info file nor a Automatic-Module-Name property. What is the module name for the XStream jar?

The Rabbit Hole of Module Categories

Now that the XStream jar has been added to our modular project’s dependency set, it is easy to anticipate the next step. The module name for XStream’s jar needs to be added to the module-info file as a dependency.

A variety of sources, including Google searches, describe how the module name is extracted from a jar.

A popular Stackoverflow answer is typical

  • If the jar has a module-info file, use the module name given there.

  • If the jar has an Automatic-Module-Name property in its MANIFEST.MF file, use the module name given there.

  • Otherwise, infer a module name from the jar’s filename. The inference rules are precise (ModuleFinder.of()), but can be summarized as throwing away the type and version suffixes.

For XStream, the current release is delivered in the xstream-1.4.20.jar file. It’s easy to infer that the module name could be xstream. Adding that name as a required module is a hard fail. The pattern for allowing reflection to modularized components like JavaFX does not work.

    requires xstream;  // BAD: Will not work !!!
    …
    opens demo.xdemo.tree.simple to xstream;

Adding the module name inferred from the ModuleFinder.of() specification does not work for either requirements or reflection.

Opaque And Confusing

One naive mental model for Gradle builds is that adding an implementation dependency for an external jar leads to the jar being added to both the compile and runtime classpaths. Given past exposure to non-modular Java projects, this is a reasonable understanding of Gradle’s mechanics.

There are few obvious violations of this model for a new project that uses explicit module definitions. Once the module names for external jars are discovered, a modular project can successfully reference any of its dependent modules by including them in both the build file and the module-info file. Unmodularized components become inaccessible, but this is readily fixed by adding the missing module-info files.

Since many popular libraries have converted to modules or use the Automatic-Module-Name property, their use via a simple implementation dependency is consistent with the model of a single set of external jars on the classpath.

The successful discovery of the modules that are added to the dependency definition, and the assumed equivalence of dependencies and the classpath, can lead to a flawed theory of module discovery. Modular applications launched by Gradle appear to discover modules in the jars that are expected to be on the classpath.

The naive model has a hard fail with XStream. If modules are discovered on the classpath, then XStream’s module name should be xstream.

🤔
The naive assumption is supported by some sections of the Gradle 8.2.1 documentation.  For example, the Java Plugin page does not mention modules.  All of the configurations (e.g. implementation) are described in terms of their impact on the project’s classpath.  Some of this should probably be updated.

A Better Model

The naive model that a Gradle implementation dependency results in a jar on the classpath is not correct for modular Java projects. A JPMS-aware JVM (e.g. the java executable) expects modular jars to be listed on the modulepath, a new and separate construct from the pre-JPMS classpath.

The modulepath is similar to the classpath. Like the classpath, the modulepath contains a set of jars. Like the classpath, the set of jars on the modulepath defines the types and values that are available to other software components. The key difference is that the JVM enforces strong encapsulation rules for components on the modulepath. The encapsulation rules protect internal components from outside meddling.

Gradle partitions the set of jars listed as implementation dependencies into modular and non-modular categories. The modular jars are placed on the modulepath. All of the non-modular components are included in the JVM via the classpath.

Gradle uses a “module-aware” heuristic to determine which jars are placed on the modulepath. A jar is module-aware if it includes a module-info file or its MANIFEST.MF file has an Automatic-Module-Name property. All other jars are non-modular and are placed on the classpath.

The separation of dependencies into modular and non-modular components is a common convention for many JPMS-aware development tools. Both Gradle and Visual Studio Code exhibit these model-aware behaviors. I suspect that a module-aware heuristic is used in Maven, Eclipse, and NetBeans.

Once these details of Gradle’s internal behaviors are exposed, the different categories of modules that are described in the Java Language Specification can be described in practical terms.

  1. If the jar is on the modulepath and it has a module-info file, use the module name given there.

  2. If the jar is on the modulepath and its MANIFEST.MF file has an Automatic-Module-Name property, use the module name given there.

  3. If the jar is on the modulepath and it has no explicit name, infer a module name from the jar’s file name.

  4. If the jar is on the classpath, its packages become part of an unnamed module.

Note that the module-aware heuristic guarantees that only module-aware jars are ever placed on the modulepath. In practice, only cases a), b), or d) occur with Gradle-packaged applications. Even though the Java Language Specification defines a behavior for the handling of simple jars on the module path (category c), this never happens in practice.

Coupled with the understanding that JPMS-aware JVMs use a separate modulepath, this explains the odd behaviors of the XStream components. Since its library does not satisfy the module-aware heuristic, the XStream code is placed into an unnamed module. As a member of an unnamed module, none of the XStream code is available to named modules.

RTFM or The Gory Details

Workarounds are available, but first let’s look more closely at the gory details. Although the Java Plugin page seems a bit out of date, other sections confirm the improved model of Gradle and JVM interaction.

The Application Plugin page contains some of the details in its section Building applications using the Java Module System. The primary point here is to use the mainModule property to run the application in a JVM that is JPMS-aware and JPMS-enforcing. The details of modular Java support are actually delegated to a discussion on the Java Library Plugin page.

The Java Library Plugin page provides numerous details about Java modules in its section Building Modules for the Java Module System. In this section, the existence of the java.modularity.inferModulePath variable reveals much of the underlying mechanisms.

Gradle will automatically put a Jar of your dependencies on the module path, instead of the classpath, if … [the Jar is module-aware]

The subsection Using libraries that are not modules, is explicit about the behavior:

A third case are traditional libraries that provide no module information at all …. Gradle puts such libraries on the classpath instead of the module path. The classpath is then treated as one module (the so called unnamed module) by Java.

The Java Library Plugin page confirms the model that Gradle separates the single set of implementation dependencies into classpath and modulepath properties for the JVM. It also confirms that jars in an unnamed module are not available to named modules. It also describes the workaround for applications that need to use non-module-aware jars (section Building an automatic module).

Caught Red Handed

This behavior can be confirmed with some deep dives into the application launch scripts.

Gradle’s application plug-in assembles the various components of an application and builds the JVM launch command that executes the program. If an application project is based on modular Java (i.e. has a non-null application.mainModule), the launch script prepares both a classpath and modulepath. Implementation dependencies that look like modules are added to the module path. Other implementation dependencies are added to the classpath.

These definitions occur near line 155 in most versions of the Unix launch script, around line 70 for the Windows .bat variation.

CLASSPATH=$APP_HOME/lib/xstream-1.4.20.jar:$APP_HOME/lib/xmlpull-1.1.3.1.jar

MODULE_PATH=$APP_HOME/lib/…:$APP_HOME/lib/…:$APP_HOME/lib/mxparser-1.2.2.jar

Curiously, the transitive dependencies from XStream get split between modular and non-modular components. The indirect dependency mxparser (with Automatic-Module-Name: io.github.xstream.mxparser) has been updated to be JPMS-aware.

Examining the actual application launch command further confirms the separation of implementation dependencies into classpath and modulepath segments.

If the application defines a mainClassName, it is launched as a non-modular application.

application {
  mainClassName = "demo.xdemo.XdemoApp"
}

This can be seen when the launch script is adapted to echo the actual launch command.

$ ./bin/XdemoApp "test.out_`date +%Y%m%d%H%M`" bin
exec java -classpath …/xdemo/XdemoApp/build/install/XdemoApp/lib/XdemoApp.jar;…/xdemo/XdemoApp/build/install/XdemoApp/lib/XdemoTreeXstream.jar;…/xdemo/XdemoApp/build/install/XdemoApp/lib/XdemoTree.jar;…/xdemo/XdemoApp/build/install/XdemoApp/lib/xstream-1.4.20.jar;…/xdemo/XdemoApp/build/install/XdemoApp/lib/mxparser-1.2.2.jar;…/xdemo/XdemoApp/build/install/XdemoApp/lib/xmlpull-1.1.3.1.jar demo.xdemo.XdemoApp test.out_202308071433 bin

$ cat test.out_202308071433
<demo.xdemo.tree.simple.SimpleTreeModel>
  <containers>
    <string>bin</string>
  </containers>
  <documents/>
</demo.xdemo.tree.simple.SimpleTreeModel>

When the application is revised to use a module-aware definition, the results are very different.

application {
  mainClass = "demo.xdemo.XdemoApp"
  mainModule = "xdemo.app"
}

Now, running the application clearly shows that a separate modulepath property is being used with the JVM.

 ./bin/XdemoApp "test.out_`date +%Y%m%d%H%M`" bin
exec java -classpath …/xdemo/XdemoApp/build/install/XdemoApp/lib/xstream-1.4.20.jar;…/xdemo/XdemoApp/build/install/XdemoApp/lib/xmlpull-1.1.3.1.jar --module-path …/xdemo/XdemoApp/build/install/XdemoApp/lib/XdemoApp.jar;…/xdemo/XdemoApp/build/install/XdemoApp/lib/XdemoTreeXstream.jar;…/xdemo/XdemoApp/build/install/XdemoApp/lib/XdemoTree.jar;…/xdemo/XdemoApp/build/install/XdemoApp/lib/mxparser-1.2.2.jar --module xdemo.app/demo.xdemo.XdemoApp test.out_202308071429 bin
…

Once again, the evidence confirms that a project's implementation dependencies are separated into classpath and modulepath based on the module-aware heuristic. Regardless, there needs to be a mechanism for projects with explicitly named modules to use code provided by simple jars.

Work Arounds

Regardless of the access rules enforced by the modular-aware JVM, the reason for including XStream as an implementation dependency is to use the functionality provided in the jar. Somehow, there needs to be a path for code in a named module to access the behaviors from an unnamed module.

The Gradle Java Library Plugin page provides a summary of the steps to connect named modules with unnamed modules. The basic workaround is another component (e.g. a Gradle project) that bridges the gap between the named modules and simple jars.

Gradle Java Library Plugin section Using libraries that are not modules

.. if you cannot avoid to rely on a library without module information, you can wrap that library in an automatic module as part of your project.

This bridge project is non-modular in the sense that it does not have a module-info file. The absence of a module-info file allows the code in this project to access any visible components in all modules, whether named or unnamed modules.

If the bridge component could be placed on the modulepath, code in the named modules could access the code in the bridge component. However, the Gradle launch environment only places module-aware jars on the module path. In order to make the bridge component accessible to other named modules, it needs to be made module-aware without providing a module-info file.

Gradle Java Library Plugin section Building an automatic module

.. an automatic module can be used as an adapter between your real modules and a traditional library on the classpath.

With the module-aware heuristic for categorizing application dependencies, adding an Automatic-Module-Name property to the MANIFEST.MF file marks a project as module-aware. With Gradle, the Automatic-Module-Name property for a project is defined with a few lines of configuration settings. In the demo application, the subproject for the bridge component provides a module name.

jar {
  manifest {
    attributes 'Automatic-Module-Name': 'xdemo.tree.xstream'
  }
}

After these settings are added to the bridge project’s build.gradle definition, the project’s code is available to dependent modular projects. Adding the expected requires xdemo.tree.xstream; statement in their module-info file, code in the named modules can create an instance of the SimplePersist class and call its save() method.

Remaining Problems

The use of an automatic module to wrap the code in the unnamed module makes it possible for modular code to access the capabilities of non-modular jars. The modular components can only access the non-modular components that are wrapped by the bridge projects. This provides a safe and robust pattern for module projects to access the code in non-modular jars.

The bridge project uses wrapper classes, like SimplePersist, to hide the XStream types from named modules. For the simplest wrappers, the exposed behaviors can simply delegate to the internal field. Other wrappers might add a semantic layer that is tailored to the application. Wrapper classes in a bridge project work well to allow a modular project to call non-modular code.

However, there are problems if the non-modular jar expects to use reflection for any of its passed data. A JPMS-enforcing JVM will only allow reflection for types that are defined in open packages. The Java module system provides an open directive that allows named modules to gain access to specific directories. The open directive cannot be used to open a named module’s package to XStream since XStream is in an unnamed module.

One of the most commonly used converters in XStream uses reflection to discover the persistable properties of objects and types. As a modular application, the lack of access to an object’s class structure leads immediately to trouble serializing the model data.

$ ./bin/XdemoApp "test.out_`date +%Y%m%d%H%M`" bin
Exception in thread "main" com.thoughtworks.xstream.converters.ConversionException: No converter available
---- Debugging information ----
message             : No converter available
type                : demo.xdemo.tree.simple.SimpleTreeModel
converter           : com.thoughtworks.xstream.converters.reflection.ReflectionConverter
message[1]          : Unable to make field private final java.util.List demo.xdemo.tree.simple.SimpleTreeModel.containers accessible: module xdemo.tree does not "opens demo.xdemo.tree.simple" to unnamed module @358c99f5
-------------------------------
        at com.thoughtworks.xstream.core.DefaultConverterLookup.lookupConverterForType(DefaultConverterLookup.java:88)
        at com.thoughtworks.xstream.XStream$1.lookupConverterForType(XStream.java:478)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:49)
        at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:44)
        at com.thoughtworks.xstream.core.TreeMarshaller.start(TreeMarshaller.java:83)
        at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.marshal(AbstractTreeMarshallingStrategy.java:37)
        at com.thoughtworks.xstream.XStream.marshal(XStream.java:1303)
        at com.thoughtworks.xstream.XStream.marshal(XStream.java:1292)
        at com.thoughtworks.xstream.XStream.toXML(XStream.java:1265)
        at xdemo.tree.xstream/demo.xdemo.tree.xstream.TreePersist.save(TreePersist.java:24)
        at xdemo.app/demo.xdemo.XdemoApp.main(XdemoApp.java:19)

Workarounds for reflection with inaccessible types are more challenging. Fortunately, these are rare. Even for the few external projects that remain non-modular, fewer of them rely on reflection for their proper operation. Unfortunately, XStream is one of the few external jars that face this challenge.

For the XStream demo project, it is straightforward to add a converter class for the TreeModel type. The custom converter can use the public API for TreeModel to handle the serialization of concrete instances. Deserialization of these files for the TreeModel type would rely on some integration with the TreeModelBuilder class.

Summary

The JPMS standard defines a powerful and effective means to define and control intermodular dependencies. The effective management of these dependencies helps control unforeseen interactions among software components in large systems.

Although JPMS was new with Java 9, the next LTS (Long Term Support) release, Java 21 is expected in late 2023. Many common libraries are now available in module-aware packages. Whether that means a module-info file or just an updated MANIFEST.MF, module support can almost be certain.

Except when it isn’t. Components that deal heavily with introspection and have many dependencies from non-modular applications can find the modular access rules limiting. For XStream, this requires staying outside the modular platform. No amount of research or dumpster diving provides a module name for the XStream component, blocking access from modular software components.

This combination of modular and non-modular components can be confusing. Historical behaviors, and opaque and confusing documentation, can easily mislead a developer that is new to JPMS. Documented behaviors turn out to be impossible, and elements of the Java toolchain (Gradle, IDEs) are now checking the internal structure of dependent jars. Gradle’s heuristics for separating module-aware components does the trick, but it is a hidden sleight of hand trick.

Regardless of Gradle’s helpful trickery, the modular components require a bridge component to access the non-modular components. The bridge component follows the pattern of an automatic module built as a non-modular component. The jar cannot contain a module-info file, and its MANIFEST.MF file must include an Automatic-Module-Name attribute.

For XStream, the bridge component makes some architectural sense. Serialization should be separate from data models. The use of a non-modular component to provide serialization just forces the application into a good pattern.