Advice + proposals regarding automodule naming
forax at univ-mlv.fr
forax at univ-mlv.fr
Tue Jan 17 22:11:11 UTC 2017
i fully agree with you that Maven can not use automatic modules.
Automatic modules have weird name rules, everything is exported and has no dependency itself*, so they are useless if you already have already a trove of info like the Maven POM.
In my opinion, the real question is not how to map existing Maven artifacts to Java modules but more,
how Maven 4 artifacts are mapped to Java modules and then how to make the transition between Maven 3 artifacts to Maven 4 artifacts as smooth as possible.
Here is my take on what can be a Maven 4 artifact,
- a Maven 4 artifact can only depends other Maven 4 artifact (and their are some way to see a Maven3 artifact as a Maven 4 artifact if the POM is siple enough),
- a Maven 4 artifact do not allow split packages (a lot of Maven 3 artifact uses split packages because it's a cool way to do an after the fact modularisation
without changing the name of the module)
- a Maven 4 artifact info is specified with info extracted from the module-info and from the POM
(version is in the POM, exported packages are in the module-info, ...)
once you have the precise rules, it will be easier to see how to map a Maven 3 artifact to a Maven 4 and what are the compatibility rules.
* apart if you want to play with configurations that mix modulepath and classpath but these kind of configurations are really hard to debug.
----- Mail original -----
> De: "Robert Scholte" <rfscholte at apache.org>
> À: "Remi Forax" <forax at univ-mlv.fr>
> Cc: jpms-spec-experts at openjdk.java.net, "Brian Fox" <brianf at sonatype.com>
> Envoyé: Mardi 17 Janvier 2017 13:04:08
> Objet: Re: Advice + proposals regarding automodule naming
> Hi Rémi,
> In the end every non-jdk.* and non-java.* module in the module-info will
> be a dependency in your buildtool descriptor. Such module must match
> exactly one versionless dependency, or conflictId as we call it, which is
> in general the groupId + artifactId (type and classifier are not relevant
> for this story).
> By ignoring the groupId a module can referred by multiple dependencies. So
> we can expect collissions. For that reason Brian did a quick scan over
> Maven Central to count the number of duplicate artifactIds.
> Here's the artifactIds with 100+ groupIds:
> maven_artifact_id count(DISTINCT maven_group_id) count(maven_group_id)
> library 391 6854
> core 312 8188
> common 142 5084
> ui 138 1414
> In theory I could have a Maven project with 391 'library'-jars on the
> classpath without any problem. And as long as they are direct dependencies
> I have control over this by simply not adding 'library' as requirement to
> module-info. The issues start when different 'library'-jars are transitive
> dependencies and when they are marked are required in the module-info file
> of my direct or transitive dependencies.
> Developers of the 'library'-jars cannot use library as the module name and
> are forced to pick another name. As developer of my project in the end I
> decide which versions of dependencies are used. If the 'library'-jar gets
> a different module name and my dependency is still referring to the old
> module name, the project can't be built.
> What I expect is that developers are forced to remove the requirements
> from their module-info because of the mentioned issues. So instead of
> increasing the number requirements it will be reduced. For that reason we
> say either use a unique module name from the beginning (GA) or wait until
> a dependency has its own module name before adding it as requirement.
> As far as I know this is the first time the JDK/JRE decides (proposes) a
> name for an entity based on another entity. There are no relations between
> method-, class-, or package-names and there doesn't have to be a relation
> between the module name and the filename, so please don't try to do so.
> On Mon, 16 Jan 2017 16:44:03 +0100, Remi Forax <forax at univ-mlv.fr> wrote:
>> Hi Robert,
>> the problem with automatic modules is more general that just the name,
>> automatics modules also creates a flat hierarchy which doesn't map well
>> with the Maven artifact descriptor.
>> I wonder why you want Maven to use automatic modules, or said
>> differently Maven has a lot of information about the artifact, why do
>> you want to forget all these information when fetching a Maven artifact.
>> I think that one problem is that you do not want to create a
>> module-info.class from the Maven POM and insert it into the jar because
>> it will change the artifact*.
>> This kind of modules is supported by jigsaw under the name of synthetic
>> modules. A synthetic module is a module with a module descriptor not
>> created by javac but by another tool.
>> In my opinion, automatic modules are interesting when you have jar that
>> do not come from Maven central but comes from an ad-hoc build tool and
>> will be considered as a leaf of the dependency DAG.
>> Otherwise, for existing module system, using a synthetic module seem to
>> be a better idea.
>> * given you have also the problem of split packages, you also need a way
>> to merge several artifacts into one modular jar because it's the easy
>> way to solve the split package problem.
>> ----- Mail original -----
>>> De: "Robert Scholte" <rfscholte at apache.org>
>>> À: jpms-spec-experts at openjdk.java.net
>>> Cc: "Apache Maven Dev" <dev at maven.apache.org>
>>> Envoyé: Lundi 16 Janvier 2017 10:37:08
>>> Objet: Advice + proposals regarding automodule naming
>>> This is a message from Robert Scholte and Brian Fox. We both have been
>>> talking about this topic several weeks with other Maven developers and
>>> came to the conclusion that we should warn the jigsaw team with their
>>> current approach regarding auto modules. We will share our experiences,
>>> thoughts, conclusions and will suggest two proposals.
>>> Traditionally, the Java ecosystem has been very mature in terms of
>>> and namespacing. The reverse fqdn introduced into the java package was a
>>> great choice to ensure classes don’t conflict. Popular build tools such
>>> Maven and nearly all those that followed built upon that this key
>>> with the introduction of “GroupId” also using the fqdn as part of the
>>> to ensure the coordinates were properly namespaced.
>>> We’ve seen some ecosystems diverge from this leading to new challenges
>>> that ultimately had to be reversed. A great example can be seen in the “
>>> tragic mistake from npm creators ”  which was to launch without a
>>> namespace concept. Eventually, NPM started running out of useful names
>>> had to backtrack to introduce “scopes” which is really just a namespace
>>> . The real problem here is that the major change in namespace was
>>> backed in after several years of momentum without it. It’s taken a long
>>> time for tooling and best practice to catch up to scopes and in the
>>> interim, people have been left with a dual mode, some namespaced, some
>>> namespaced situation that has created chaos. 
>>> The real issue at hand here as we consider behaviors in the jigsaw
>>> automodule revolves around two well studied concepts.
>>> The most important is the “Default effect”  which states that
>>> the default behavior is will become the most prominent best practice. A
>>> default that uses a filename to generate a very short, un-namespaced
>>> module id effectively sets the behavior to create generic names that
>>> eventually conflict...exactly what we’ve seen in npm.
>>> Additionally, The switching costs introduced in overcoming a default
>>> un-namespaced module id to one with a unique namespace is also
>>> once you consider all the potential users. This is why API change is
>>> and changing the module id after the fact from the default is
>>> an API change.
>>> The second principal at hand is the “Principle of least astonishment”.
>>> want to find a default that doesn’t violate what most users would
>>> to be the most obvious. One could argue the current auto module
>>> doesn’t violate this principle, but it’s important to consider alternate
>>> suggestions in this light.
>>> First, lets explore the potential downsides if the default effect takes
>>> hold with the currently generated auto module id. In Apache Maven, the
>>> artifact id is the part of the coordinate that generates the filename.
>>> This means that com.somecompany:artifact:version will become
>>> artifact-version.jar, which would result in automodule id “artifact”.
>>> Armed with this understanding, that does an analysis of the Maven
>>> ecosystem have to say about potential conflicts in the automodule id?
>>> If we ignore the groupid and version of all the components in the Maven
>>> Central repository, we end up with over 13,500 (7% of the total
>>> group:artifact combinations) conflicts. This does not consider conflicts
>>> across other repositories, or within customer portfolios yet it is
>>> telling. Conflicts will happen. In some cases, the number of conflicts
>>> the same common names is well above 100. The list of conflicts as of
>>> October, 2016 can be seen here. 
>>> At this point, hopefully we’ve made the case for at least establishing a
>>> default module id that
>>> 1. Uses namespaces to minimizes id conflicts when possible
>>> 2. Leverages the default effect to create a de facto best practice
>>> 3. Follows the principle of least astonishment
>>> We have two potential proposals that solve these goals.
>>> Proposal 1: Leverage existing coordinates when available.
>>> Maven is inarguably the most popular build system for Java components,
>>> with Maven Central being the default and largest repository of Java
>>> components in the world. By default, every jar built by Maven
>>> automatically gets a simple properties file inserted into it with its
>>> unique coordinates. Now, not every jar in Central was built with Maven,
>>> however 94% of them were, as we can find the pom.properties file in
>>> 1,806,023 of the 1,913,561 central components . Talk about the default
>>> effect in action!
>>> It’s further important to recognize that given a jar with a
>>> declaring coordinates, it means that the project itself has chosen those
>>> coordinates as their own name. In other words, this is how they refer to
>>> themselves, even if other consumers may not be using Maven directly.
>>> If automodule were able to peek inside a jar and generate the default id
>>> using the groupid and artifactid present in the file, this would nearly
>>> eliminate all instances of id conflict because a significant portion of
>>> the Java ecosystem is in fact built with Maven. Additionally, the fact
>>> that 1.8 million (and counting) modules would have namespace as the
>>> default behavior means we’ve taken a huge step in setting the best
>>> practice of picking module ids with a namepace. Additionally, since the
>>> project itself has chosen these coordinates and uses them as their
>>> distribution mechanism, this follows the principle of least astonishment
>>> to consumers regardless of their chosen build system. Finally, since all
>>> of the above are true, it’s unlikely the project would need to migrate
>>> a new module id when they adopt jigsaw natively, thus avoiding an API
>>> switching cost for their users.
>>> Proposal 2: Drop automodules
>>> Right now Jigsaw tries to calculate a module name solely based on the
>>> of the jar file, which now already causes issues. Besides the fact that
>>> the module name is not guaranteed unique compared with its Maven
>>> coordinate, there are extra transformations which makes it even less
>>> guaranteed that it is unique; e.g. dashes are replaced by dots (which
>>> both valid artifactId characters), in some cases the number and their
>>> following characters are stripped off. For artifacts like
>>> jboss-servlet-api_4.0_spec it makes sense, however we already see issues
>>> here where commons-lang, commons-lang2 and commons-lang3 get the same
>>> module name,
>>> even though they have different artifactIds and contain different
>>> packages. Choosing different artifactIds and packages was a very wise
>>> decision because it made it possible that these jars could live next to
>>> each other. Removing that separation by the authors is a very unwise
>>> Another known example is the jsrNNN jars, which now all get jsr as the
>>> module name.
>>> Is it highly unlikely there is one single rule to capture all the use
>>> cases and which always result in a module name we can work with.
>>> For that reason the other proposal is to simply drop automodules. Don’t
>>> try to come up with a name for unnamed jars. It might look like the
>>> feature of automodules makes migrating easier because every dependency
>>> will get a name so can complete your module-info for all requirements,
>>> we expect that once Jigsaw comes to speed the invalid module names are
>>> actually blocking further development due to name collisions or forced
>>> renaming by transitive modular jars.
>>> The advantage of this proposal is that library builders are not forced
>>> keep the proposed module name in order to maintain backwards
>>> with the default.. Instead library builders can pick a more suitable
>>> module name. The modular system doesn’t allow the same package to be
>>> exported by multiple jars (and automodules exports every package).
>>> builders can fix this is their new jars, however if end users would
>>> require both jars because they were specified as requirements in
>>> transitive jars, you cannot compile this project. There’s just no
>>> dependency-excludes like Maven has, because “requires” in the
>>> really means requires. Dropping automodules will prevent these kind of
>>> issues, because a package can only be exported by a named module.
>>> Sure, this means that for end users they cannot refer to every jar in
>>> their module-info. But at least if they add a “requires” to their
>>> module-info, they can ensure that it’ll always refer to the intended
>>> modular jar. With build tools like Maven the chance of missing artifacts
>>> on the classpath has already been reduced a lot. In general builds have
>>> become quite stable, so we don’t expect that developers will translate
>>> dependencies to the module-info file, especially if we warn them about
>>> possible consequences of depending on automodules. Only referring to
>>> modules and even a single “requires” is already a gain. There’s no
>>> to try to speed this up and give the developer the false impression that
>>> it’ll keep working when upgrading to real modular jars. Focus should be
>>> the target, not on the path how to reach it.
>>> Dropping the automodules will prevent a lot of discussions about what is
>>> the correct way to select a module name and will give the responsibility
>>> for the name back to the place where it belongs: the developer.
>>>  The fact that so much of the npm ecosystem is effectively
>>> not-namespaced is has actually
>>> created potential build time malware injection possibilities. If I know
>>> a package in use by a
>>> company through log analysis, bug report analysis etc, I could
>>> go register the same
>>> name in the default repo with a very high semver and know that it’s very
>>> likely this would be
>>> picked up over the intended internally developed module because there’s
>>>  https://en.wikipedia.org/wiki/Default_effect_(psychology)
>>>  https://en.wikipedia.org/wiki/Principle_of_least_astonishment
>>>  http://openjdk.java.net/jeps/261 #Risk and assumptions
> >> https://firstname.lastname@example.org/msg06623.html
More information about the jpms-spec-observers