Alternatives for naming automatic modules (#AutomaticModuleNames)

mark.reinhold at mark.reinhold at
Mon Apr 3 16:29:15 UTC 2017

Assume, for the purpose of this note, that we decide to retain automatic
modules more or less as-is.  How should the automatic-module name of an
existing JAR file be computed?

The fundamental problem here is that we're trying to infer a sensible
name from very little information, so we have to do the best we can with
what we've got.

                                  * * *

Robert Scholte and Brian Fox suggest computing the automatic-module name
of a JAR-file artifact from the Maven group and artifact identifiers in
the `` file of the artifact itself, when available.  They
observe that 94% of the artifacts in the Maven Central repository contain
such a file, since Maven inserts them by default.  They conclude that
this approach would both "nearly eliminate" module-name conflicts and,
via the default effect, establish what they consider to be a desirable
pattern for module names [1].

I argued previously that module names neither need be nor should be
globally unique, since modules are more abstract than the artifacts that
define them [2].  Not everyone found that persuasive, however, so here
are some further thoughts on the Scholte/Fox approach, backed up by data.

  (a) That 94% of the artifacts in Central contain `` files
      does not imply that this approach would be applicable to 94% of the
      artifacts commonly used in applications today.  Less than half of
      the artifacts in the most popular projects in Central, according to
      three different "top 100" surveys [3][4][5], contain such files
      (see [6] for a list of these projects).  The ratio doesn't change
      if you examine just the artifacts with the highest version number
      for each project, so it's not the case that only newer artifacts
      have this file.  It appears, then, that many popular projects are
      not built with Maven, or if they are then the default insertion of
      the `` file is disabled.

  (b) Scholte and Fox don't say exactly how module names should be formed
      from group-id/artifact-id pairs.  The obvious method is just to
      concatenate them and then replace non-identifier characters with
      periods.  This can, however, result in very long module names with
      redundant tokens.  Here are a few examples from testing this method
      against the popular artifacts mentioned above (from the "verbose"
      column in [6]):




      Not only are these module names annoyingly verbose, but some module
      authors would strongly object to such names being established for
      their own modules via the default effect.

  (c) A more sophisticated method would use heuristics to produce shorter
      module names.  A trick that works surprisingly well, at least for
      these artifacts, is simply to scan the name computed in (b) from
      left to right and remove redundant tokens (the "automagic" column
      in [6]).  This leads to module names that often more readable, but
      they're sometimes quite different from the original coordinates.
      Any similar heuristic is likely suffer from the same problem, and
      any such approach is likely to be rejected by module authors who
      prefer a different style of name.

  (d) Making the Java SE Platform Specification depend upon the behavior
      of a particular build tool is inappropriate on principle, no matter
      how popular that tool.  If we were to adopt this kind of approach
      then the naming algorithm would have to be demoted to the realm of
      the "host system", in the parlance of the JLS [7], and therefore
      established not by standard but only by convention.

  (e) Maven may be the most popular Java build tool today, but it's not
      the only popular such tool.  Gradle, for one, is fairly popular
      yet it does not generate `` files, at least not by
      default.  Even if this naming algorithm were merely a convention
      it would be questionable -- though perhaps defensible -- for the
      JDK to establish a convention that is, essentially, specific to
      one build tool.

Considering all these issues, I don't think the Scholte/Fox approach is
viable.  Available data suggests that it would help with less than half
of the most popular projects in use today.  It would result in module
names that are either annoyingly verbose or not so obviously related to
their artifacts' original coordinates, and in either case objectionable
to some module authors.  It would, finally, raise non-trivial issues in
terms of standardization and convention.

                                  * * *

Stephen Colebourne argues that the best name for a module, whether it's
automatic or explicit, is the name of its "highest" package [8].  We
could, in theory, scan a JAR file, determine its highest package, and
use that as the name of the corresponding automatic module.  (The module
system must scan the JAR file anyway, in order to identify its packages
and service providers, so this would be no less efficient in terms of
I/O operations.)

This would seem to be an attractive option, but when tested against the
popular artifacts mentioned above it fares only a little better than the
Scholte/Fox approach.  Only about 70% of those artifacts have a single
"highest" package.  The remainder have multiple such packages, and
there's no obvious way to disambiguate them.

                                  * * *

The least-bad method that we've found to compute the name of an automatic
module is to do a simple transformation of its JAR-file name.  This has
been in the JPMS specification and the prototype RI for over a year now.
This method is not ideal, but the concerns raised about it so far can be
addressed effectively by tools and documentation or else are really
concerns about all kinds of modules, whether automatic or explicit.
Herewith a summary of those concerns, and responses to them.

  (1) It's difficult for the user of an automatic module to figure out
      the name of the module.

This is contrary to my experience, having presented this method to a wide
variety of developers at many conferences over the last few years.  They
immediately understand how it works after being shown an example or two.
This concern applies at least as well to any other method for naming
automatic modules, and in any case I expect the IDEs eventually to offer
completion hints for all the observable modules on a project's module
path, whether automatic or explicit.

  (2) A module name computed by this method was not chosen by the author
      of the component.

A module name produced by this method almost always corresponds clearly
to the JAR file's Maven artifact identifier.  It therefore is a name that
was chosen by the author, though for a somewhat different purpose, i.e.,
as an artifact name rather than a module name.

  (3) Module names computed by this method do not follow the popular
      reverse domain-name convention [9].

This concern applies to all kinds of modules, not just automatic modules.
It also assumes agreement with the reverse domain-name convention; many
developers prefer that convention but some do not, and it is not the
place of a module system, a build tool, or an artifact repository to
enforce such conventions.

  (4) A module name computed by this method is based not upon intrinsic
      properties of the module's class-file content but, rather, the more
      ephemeral name of the artifact in which it's packaged.

True.  A method that uses only intrinsic information found in the classes
defined in the module would be preferable in principle.  We have not yet,
however, found such a method.

  (5) Suppose that I publish my own explicit module, and its declaration
      requires some other module, and that module is automatic since it's
      really just a JAR-file component that hasn't yet been modularized.
      I'm being rude to the author of that component: If my module is
      popular then this could establish the automatic-module name as the
      module name of their component via the default effect, and they
      might prefer a different name.

Yes, that would be rude.  We should strongly advise module authors not
to publish explicit modules that depend upon components that have not
yet been modularized.  It's perfectly fine to create and use such modules
in a limited fashion, e.g., within a team or organization as part of a
long-term modularization effort.  Such modules should not, however, be
published to Maven Central or any similar public repository [a].

  (6) It's risky to require an automatic module in my own explicit
      module.  If the author of the underlying JAR-file component
      explicitly modularizes it later on then they might choose a name
      that's different from the automatic name, and they might choose
      not to export all of its packages, and then I'll have to change
      my module, and anyone who uses my module might have to upgrade.

This concern applies to all kinds of modules, not just automatic modules.
The author of an explicit module could, at any time, change the module's
name or its set of exported packages.  Such changes may be more likely
for automatic modules until they're explicitly modularized, but that's
just one more reason to treat automatic modules as a transitional tool
rather than a long-term solution, and never to publish an explicit module
that depends upon automatic modules.

  (7) If the names of two automatic modules conflict then they cannot
      both be placed on the same module path.

A simple workaround for this problem is for the user of such modules to
rename one of the relevant JAR files.


More information about the jpms-spec-experts mailing list