Looking ahead: proposed Hg forest consolidation for JDK 10
maurizio.cimadamore at oracle.com
Tue Oct 18 11:01:11 UTC 2016
Hi Erik - thanks for the comments. I indeed got the hardlink story
backwards - which means the size of a local clone will be relatively
stable in time - good news!
Regarding shaare/bookmarks, that is a good suggestion. I tried bookmarks
extensively, and I found them a bit too finicky to use with a single
repo (as with branches, you need to be very aware in which bookmark you
are on, or mistakes can be very common). Additionally, using just a
single repo creates problem when you want to store 'project' specific
metadata (tempoarry tests, IDE config and the likes).
So, using 'share' seems like a major step forward because it allows you
to 'forget' about branches being there (each folder will be on a
One thing that will remain convoluted (if I'm understanding how
bookmarks work correctly) is for instance when you want to fetch new
changes from the remote repo. In that case, I think you need to:
* go in the main repo forest (the one with the 'main' bookmark)
* do a pull/update
* go in the share you were working with
* do an hg pull 'main'
Is that correct? And, more importantly, could other bookmarks/shares be
left as they are (i.e. not updated) ? I'd like the various shares to be
as independent as possible.
On a separate note, langtools is one of those cases (there are probably
others, like Nashorn) where the repo is currently fairly isolated from
the rest of the JDK - meaning that you can just fetch a langtools repo,
build it and test it in isolation. A similar case could arise if a JDK
developer would like to work, say, only on a reduced set of JDK modules.
While cloning all files is perfectly acceptable disk size-wise I find
the lack of granularity a tad annoying in the general case (and I've
built some tools to overcome that problem).
On 17/10/16 12:47, Erik Helin wrote:
> Hi Maurizio,
> thanks for your feedback! Please see my replies inline.
> On 2016-10-13, Maurizio Cimadamore wrote:
>> Hi Joe,
>> some comments on this. As my workflow typically involve cloning one
>> langtools repo per each new fix, I'll start with discussing local clones
>> first. Starting with some concrete numbers, I am currently working on 2
>> forests (jdk 9 and valhalla); between these two forests I currently have ~35
>> langtools clones (for various prototypes and bug fixes). Also, as I'm
>> working on two machines, I keep them in sync using Unison, a very common
>> sync tool in linux land based on rsync.
>> I have been experimenting with local clones, to see to which degree a local
>> clone could save in terms of space. My findings are that a local clone takes
>> around 800M - which seems consistent with the fact that Mercurial hardlinks
>> the repo files but not the history, which is simply copied.
> You might have gotten this the wrong way around. Mercurial will use
> hard links for most of the metadata for local clones on a
> file system that support hard links. The source code files themselves
> won't be hard links (otherwise, if you edited one file in one local
> clone then the file sharing the same inode in another local clone would
> get changed).
>> For people like me, working on langtools, that's quite a significant jump in
>> terms of space - a clean langtools repo is around 150M. So, in my specific
>> case, disk usage will jump from 150M * 35 =~ 5G to 800M * 35 =~ 28G (this
>> is a very conservative estimate - since it's assuming that all files are
>> hardlinked, which will not be the case as soon as I start making some
>> changes in the local clones). While this is not a deal breaker in terms of
>> disk spaces (my SSD has 200G in total), it poses serious strain on my
>> ability to do regular syncing/backups.
> Thanks for sharing your workflow! For this use case, could you perhaps
> try out the `hg share` extension? You need to enable the extension in
> your .hgrc. A share is like a clone, but Mercurial will share the store
> folder between all shares. This is *not* done using hardlinks, if you
> look in the .hg folder for a share, you will not see the "store" folder
> (you will see a file named sharedpath instead).
> Using shares on their own can be a bit tricky, but if you combine them
> with bookmarks, then you get a very powerful solution. In your case, I
> would suggest the following:
> $ hg clone http://hg.openjdk.java.net/jdk9/consol-proto
> $ cd consol-proto
> $ hg bookmark '@' # traditional name for "master" bookmark
> $ cd ..
> $ hg share -B consol-proto bugfix-1
> $ cd bugfix-1
> $ hg bookmark 'bugfix-1'
> You will now end up with two directories, consol-proto and bugfix-1,
> both looking like a full forest, but they will share the same Mercurial
> store *and* list of bookmarks (but the active bookmark won't be shared).
> Since the shares use different bookmarks, the work you do in a share
> won't interfere with the work you do in another share (you will get
> multiple heads, but each head will have a bookmark associated with it).
> For backing up, you now only need to back up the consol-proto
> repository (it contains all the bookmarks and all commits). There is no
> need to back up the shares, they can always be created from the
> consol-proto repository.
> On my machine, using Linux 4.3.3 and ext4 as my filesystem, with hg
> version 3.8.1, a share uses 661 MB of disk. If you know want 35 shares,
> you would end up using 35 * 661 = 22.6 GB. But you only have to back up
> one repository!
>> Add to this the fact that most backup/syncing tools explicitly calls out the
>> hardlink case as being problematic. Unison doesn't support them, rsync
>> supports them to some degree, and even some professional backup tools I'm
>> using no do support them (or recommend to do without them anyway). So, local
>> cloning could be a fine solution when working on one machine, but as soon as
>> you start considering back up, you have troubles. For this reasons I will
>> have to consider to change my day to day workflow, and to try and avoid
>> relying on clone as much as I did - which poses issues: for instance, if I
>> keep all my patches in the same repo (by using either MQ or bookmarks) - how
>> do I differentiate between the different IDE projects to work on them?
> If you are using shares as suggested above, you would have one folder
> with all the source code for each bookmark.
>> Last but not the least - if the local clone size I'm seeing now (800M) is
>> almost entirely history-driven, and that already accounts for 50% of the
>> total size - doesn't that mean that i.e. in 2-3 years time, the size of the
>> history will trump the size of the files, meaning that the advantages of
>> doing local clones will be smaller and smaller over time?
> No, it is the other way around. For a local clone, you share all of the
> history (using hard links). So the size of your local clones scale with
> size of the source files (the same is true for shares). This can easily
> be verified by doing `du -ms .hg` for a share, I get 6 MB.
>> On a separate and more <meta> note, it seems to me that this effort is two
>> things at once:
>> * a repo consolidation: use a single repo instead of a forest
>> * a source restructuring
>> Each of the above moves has risks and costs for people in the OpenJDK land.
>> For instance, as discussed above, the repo consolidation might mean
>> significantly change the workflow people use on a daily basis (see above).
>> At the same time, the source restructuring is posing issues for things like
>> builds, IDE support, and the likes.
>> I wonder if it wouldn't be sensible to do the repo restructuring now, where
>> the new repo is simply a consolidated version of the new one; no need to
>> update build scripts to take into account new paths. Then, maybe in the next
>> release (JDK 11), we could attack the source restructuring problem. This way
>> people will have more time to adjust to the big changes that are coming.
>> What do you think?
>> On 11/10/16 03:11, joe darcy wrote:
>>> Looking ahead to JDK 10, a group of JDK engineers have been exploring
>>> consolidating the large number of Hg repositories in an open JDK forest to
>>> a single one with the goal of using the consolidated arrangement for JDK
>>> This message is being sent to jdk9-dev since a jdk10-dev alias to discuss
>>> JDK 10 doesn't exist yet.
>>> A JEP describing the project has been submitted :
>>> JDK-8167368: Consolidate JDK 10 OpenJDK repositories to a single
>>> The text of the JEP describes the motivation and current state of the work
>>> in more detail, including proposed changes to the file layout. Publication
>>> of the prototype consolidated repository is planned, but not done yet. The
>>> email below has a list of additional anticipated questions and answers.
>>> We feel this consolidated arrangement offers some significant structural
>>> advantages for managing the JDK's source code and we are now asking for
>>> feedback on this potential change. In particular, if you feel there is a
>>> show-stopper problem with making this change, please let us know!
>>> I'd like to acknowledge the work of Stefan Sarne, Stuart Marks, and
>>> Ingemar Aberg participating in discussions leading up to the prototype and
>>> I'd like to especially recognize the contributions of Erik Helin for savvy
>>> Hg manipulations and Erik Joelsson for skillful build wrangling in this
>>> Please send initial comments by October 18, 2016.
>>> Q: What about the set of forests for JDK 10? Are we going to have master,
>>> dev, client, hotspot, etc. the same set as in 9?
>>> A: That is a separate question from the repository consolidation, but
>>> there will likely be simplifications here too. Discussions on that point
>>> will come later.
>>> Q: I usually just build the code in repo X today. Will I have have to
>>> build the *whole JDK* now?
>>> A: Not necessarily. The same top-level build targets should work in the
>>> consolidated forest.
>>> Q: Does disk usage change?
>>> A: The total disk usage of the current forest compared to the consolidated
>>> forest is nearly the same.
>>> Q: In more detail, how were the changesets imported?
>>> A: The scripts used for the consolidation conversion are attached to the
>>> Q: What happens to the Hg hashes?
>>> A: The conversion scheme used in the prototype does *not* preserve Hg
>>> hashes of changesets compared the current forests. However, the bug ids
>>> are preserved and can be searched for. In addition, one or more
>>> pre-consolidation forests should be archived in perpetuity so that URLs in
>>> bug comments continue to work, etc.
>>> A mapping of the old hashes to the corresponding new hashes might be
>>> generated and placed in the final new repo.
>>> Q: I'm allergic to tabs; what about jcheck?
>>> A: If history is preserved, the checking done by jcheck needs to be
>>> modified for the consolidated forest. One way to do this is to augment the
>>> white lists used in jcheck with the conflicting changesets. This approach
>>> may not be elegant, but it is effective and doesn't appear to appreciably
>>> impact jcheck running times.
>>> Q: Will the future 9 update forest also have this consolidation
>>> A: The script used to do the consolidation conversion is deterministic and
>>> could be run to create the 9 update forest in the future at the
>>> discretion of the 9 update team.
>>> Q: For backports for forwardports, will there be a script to translate
>>> patch files across the consolidation boundary?
>>> A: That work is planned, but not yet done; see JDK-8165623: Create patch
>>> translator to update paths pre/post consolidation.
>>> Q: It's the 21st century and I develop using an IDE. That is still going
>>> to work, right?
>>> A: The prototype to date does include updating the various IDE support
>>> files, but bug JDK-8167142 has been filed to track that work.
More information about the jdk9-dev