Looking ahead: proposed Hg forest consolidation for JDK 10
goetz.lindenmaier at sap.com
Wed Oct 12 13:57:02 UTC 2016
Hi Erik and Joe,
I added more comments inline below :)
But to subsume and not discuss my personal work setup:
The planned setup
- is benefitial if your work spans the sub-repos
- imposes overhead if the work concentrates in one sub-repo.
It moves efforts from
keeping repos in sync / doing spanning changes
managing development setups/builds (branches, local clones, CONF).
Also, I really think it brings mercurial closer to its limits, and you need to
work around these. The basic idea of git and mercurial was to quickly
clone, edit, submit, discard. Not to setup a local master repo and
be your own SCM admin.
> -----Original Message-----
> From: Erik Helin [mailto:erik.helin at oracle.com]
> Sent: Mittwoch, 12. Oktober 2016 14:16
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Cc: joe darcy <joe.darcy at oracle.com>; jdk9-dev at openjdk.java.net
> Subject: Re: Looking ahead: proposed Hg forest consolidation for JDK 10
> Hi Goetz,
> thanks for looking through the JEP and providing us with feedback! Please
> my replies inline.
> On 2016-10-12, Lindenmaier, Goetz wrote:
> > Hi Joe,
> > thanks for your detailed answer. Unfortunately it
> > doesn't dispel my concerns.
> > > Hi Goetz,
> > >
> > > On 10/11/2016 2:30 AM, Lindenmaier, Goetz wrote:
> > > > Hi,
> > > >
> > > > I see several problems with this approach.
> > > >
> > > > 1.) Mercurial already has problems scaling with the current repositories.
> > > > This will get worse with bigger repos. E.g. 'hg diff' takes
> > > > 14 secs on jdk, but only 2 secs on jaxp:
> > > > jdk: ~90000 files, 15000 changes, hg diff takes 14 secs
> > > > jaxp: ~12000 files, 1000 changes, hg diff takes 2 secs
> > >
> > > By its nature, hg diff needs to walk the directory tree so a bigger tree
> > > will generally be slower.
> > Yes, and that's bad!
> > > Doing a diff on a particular subdirectory, say
> > > for hotspot, should have comparable performance as today.
> > The use case of hg diff is to find what was changed. Obviously, if I only
> > do it on the subdir, I might miss something.
> Are you ever running `hg tdiff` today? If not, then
> how do you know you are not missing sending out part of your patch for
> If you are comfortable doing `hg diff` in only the hotspot repository
> today, then I don't see how running `hg diff hotspot` would make you
> less comfortable.
I don't know about hg tdiff.
I must run hg diff on the full repo to make sure there is not
boguous code somewhere. I don't need to diff jdk, if I only
make a webrev for hotspot.
But hg diff is only the example for a command I already now find
Also, I don't want to optimize my personal work setup, I just want
to mention possible pain-points. Still, hints how to do something
more efficient are always welcome :)
> > > The fsmonitor extension,
> > > https://www.mercurial-scm.org/wiki/FsMonitorExtension, could help in
> > > this case too.
> > >
> > > > 2.) Cloning the repo does not scale.
> > > > Cloning the root repo and calling get_source.sh takes 20 min.
> > > > I ususally only clone the root repo and hotspot. This only
> > > > takes 3 min.
> > > > I don't think merging the repos might improve the 20 mins.
> > > > In contrary, as cloning the jdk repo takes most of the time,
> > > > and the others run in parallel, cloning an even bigger repo
> > > > will be slower.
> > > > Alternatively, one could hold a 'master' repo and replicate that
> > > > by local copy. But this shows similar timings (1:40 vs. 9min).
> > >
> > > We've discussed this kind of use-case internally as well. The
> > > recommendation is to have a designated local master and then do local
> > > clones of that. On a unix system if the local clones are on the same
> > > disk, hard links are used with a copy-on-write policy so the clones are
> > > space-efficient and time-efficient to create. The local clone times
> > > we've seen are about 30 seconds in that case.
> > I would have to run the watchman on all the machines I happen to
> > work on. A possible solution imposing work on every user.
> Again, working with local clones does scale. Do you use an SSD on the
> machines you are working on? If so, then cloning the consolidated
> repository locally shouldn't take more than approx 30 sec.
We have approx. 100 servers available. I work on a filer visible
on all these. Working on local discs is not an option. First, there
is not enough space for everybody on the local disc (especially not
for full repos). Second, I need to test my changes on different
Remember, I can not use jprt (Volker said that might be changed, soon,
which would be great!).
> As for watchman, that is only used to speed up `hg status`, so running
> e.g. `hg diff` or `hg status` in a subdirectory as I explained will
> yield similar benefits.
> > > > 3.) Having to clone the full repos will require considerably more
> > > > disk space.
> > > > I'm working on various issues in hotspot and keep them seperated
> > > > by doing this in individual repositories that only contain hotspot.
> > > > These repos will require considerably more space.
> > > If disk space is a concern, you can use mq or bookmarks against a single
> > > repo.
> > I use mq a lot. But often for separate tasks separate repos are required.
> > Say, I'm working on
> > - testing a change of someone other against head revision to review it.
> > - developing the s390 port with a mq that contains 10 patches
> > - looking for a performance regression by syncing to older revisions,
> > building and running benchmarks in a script.
> > You can't combine such tasks with a mq in one repo.
> No, but you can with e.g. bookmarks (or branches) and the CONF feature
> of the build system. I, for example, have one CONF per bookmark. That
> means I can update my code to the feature/bug I'm currently working on,
> make some changes, and get an incremental compilation (instead of
> If you prefer separate repositories, then again, local clones will help
Well, to use local clones I first need local 'masters'. If you switch around
between 7, 8, 9/hs, 9/dev, 9/client etc, that's already more space for
masters than I currently use overall to keep 10 hotspot repos around.
> decrease the overhead. If you are open to discussing/sharing your
> workflow and trying out some features of Mercurial and the build system,
> then I'm confident we can find an effective way for you to work with a
> consolidated forest.
I guess if you actually merge the repos, I need to change my workflow :)
I'll try if the example merged repo is available.
> > > > 4.) There will be additional merges because changes that are now done
> > > > in two repos will then be done in a single repo. If I then sync back
> > > > a few hotspot changes, a lot of files in the other subdirectories
> > > > will get touched. This slows down sync and causes rebuilds.
> > > > Sure this might just be what is intended, but currently I don't
> > > > need to rebuild jdk etc. very often.
> > >
> > > While hotspot and the rest of the JDK can often be treated as
> > > approximately independent, they are not truly independent.
> > Yes, but they _are_ approximately independent. That suffices to
> > avoid lot's of boilerplate work.
> > In other SCM systems you can sync back only a subdirectory.
> > Mercurial does not support that.
> If I understand, you seem to be working mostly in the hotspot repository
> (and sometimes in the top-level)? If that is the case, then you are not
> feeling the pain of doing dependent changes between top-level, hotspot
> and the jdk. Many other developers feel this pain clearly, particulary
> developers working with:
> - performance (often require dependent changes in both JDK and hotspot)
> - build (often require dependent changes in all repos)
> - runtime (the tools in the jdk, e.g. jstat, often require
> dependent changes in jdk and hotspot)
> - testing (test often needs to be updated in both jdk, hotspot and
> Furthermore, many of us are interested in all changes going in, because
> there might be performance or functional regressions introduced due to
> changes in another repository. Having a way to perform a bisect
> tremendously helps in those situations.
Well I understand all these issues. You could also argue that the
Bisect is slower. Before, you could easily do the bisect only on hotspot...
until you reach an incompatibility.
It drills down to
- if you do work in several repos the same time it's better
- if you don't, it's not better.
> Again, if you mostly work in one repository, then you will not have been
> exposed to these problems. However, many developers in the OpenJDK
> project tend to work across many repositories.
> > > > 5.) It will get harder to monitor submitted changes that are relevant
> > > > for a specific area. E.g., I might only want to see changes in hotspot.
> > > > In the web frontend, you can not browse changes on subdirectory
> > > > Maybe this can be solved, as the commandline 'hg log' etc. already
> > > support
> > > > this.
> > >
> > > We don't have plans to change the Hg web UI so I think a command line
> > > solution would be appropriate here.
> > You should consider fixing this, maybe as a follow up. You can already
> > browse file history, This should be also possible for directories.
> That is a feature request that needs to be sent to the Mercurial
> project. The web ui of the OpenJDK repositories comes from the `hg
> serve` command, that is not something that Oracle has developed. I know
> that Mozilla has awarded a grant to the Mercurial project recently to
> improve the web UI , the Mercurial developers are probably interested
> in this kind of feedback.
> > > > 6.) A single repo will simplify making combined changes. So there will be
> > > > more of these. But combined changes complicate handling of our
> > > > licensed code.
> > > > In our activities as licensee, we are consuming hotspot change-wise.
> > > > This is because we modified a lot in hotspot, and merging hotspot
> > > > changes step by step simplifies the merging.
> > > > On the other side, we consume the changes to jdk etc. as chunks.
> > > > This is because we changed much less in these directories so
> > > > that merging causes less problems. Also, there are much more
> > > > changes and we don't have the manpower to consume them
> > > wise.
> > > > Having combined changes requires more synchronization between
> > > > the two merging tasks. It's already an increasing effort in
> > > > jdk9.
> > > > Also, to follow these two different merging approaches for hotspot
> > > > and the rest, we would have to first split the single repo into
> > > > two parts.
> > > >
> > > >
> > > > Comments to the JEP:
> > > >
> > > > I appreciate that the change history is kept as it makes research
> > > > in old changes more easy. On the other side, dropping the history
> > > > might speed up handling of the new repo.
> > >
> > > We are aware that Facebook has developed Hg plugins to allow shallow
> > > clones, i.e. clones without all the history, but we haven't investigated
> > > using them yet.
> > >
> > > >
> > > > I also appreciate the changes in directory layout. If the
> > > > repos are merged, this should be done this way.
> > > >
> > > > We find it difficult to keep the jtreg runner in sync with our
> > > > current version of jdk9, especially as we have two of them (We
> > > > test openJdk and SAP JVM 9, and within SAP JVM 9 hotspot and
> > > > jdk often differ in a few builds.)
> > > > I would appreciate if the runner could be included in the
> > > > root/test directory.
> > >
> > > I'm not quite sure what you are referring to by the jtreg runner.
> > I mean the code in http://hg.openjdk.java.net/code-tools/jtreg
> > As Andrew stated, some subdirectories are pretty stable. It
> > might completely make sense to merge these into one repository, but I'm
> > really concerned about jdk and hotspot.
> > In general, I think those people that are highly specialized on complex
> > subcomponents of the VM will suffer from this. They often are fine
> > just working with hotspot / jdk etc.. In general, these people develop
> > new components in the latest branch.
> You mean working on tip? The OpenJDK repositories do not make use of
> branches (besides having the sole default branch).
Sorry, I mean the latest, unreleased Java version. On all the older versions
work involving only one repo (hotspot etc) should be really rare.
> > Those people that have to maintain and test the VM really will profit
> > from the new setup. They anyways always operate with the full
> > repo tree.
> > Having this said, I think it would make more sense to put the legacy code
> > base into merged repos, and not the development branch?
> When you say branch, do you mean "forest" (the wording is important here
> for me to understand since branch also in a concept in Mercurial)? That
> is, do you think of jdk9/dev, jdk9/hs and jdk9/jdk9 as branches?
> I personally always work with the "full tree" since it is crucial to
> develop changes on top of a stable "tree configuration". Even though
> hotspot and the jdk ususally are compatible with each other give or take
> a few days, there have been plenty of situations where having the
> repositories in a non-tested configuration can result in rather funky
Well, we are running jdk 4 with hotspot from 8u40, and jdk 8 with
Hotspot from jdk9b116. Fixing the issues that arise from this
is for us much less effort than supporting all the different hotspots
we would have. So it's not that bad (once we get it fixed from you :))
> Again, I'm pretty confident that we can find a way for you (and the rest
> of the SAP contributors) to work effectively with a consolidated OpenJDK
> repository. I just need to learn more about your particular use case to
> come up with a nice solution. And you guys of course have to be willing
> to change your workflow slightly :)
> : https://blog.mozilla.org/blog/2015/12/10/mozilla-open-source-support-
> > Best regards,
> > Goetz.
More information about the jdk9-dev