Looking ahead: proposed Hg forest consolidation for JDK 10
maurizio.cimadamore at oracle.com
Thu Oct 13 13:21:55 UTC 2016
some comments on this. As my workflow typically involve cloning one
langtools repo per each new fix, I'll start with discussing local clones
first. Starting with some concrete numbers, I am currently working on 2
forests (jdk 9 and valhalla); between these two forests I currently have
~35 langtools clones (for various prototypes and bug fixes). Also, as
I'm working on two machines, I keep them in sync using Unison, a very
common sync tool in linux land based on rsync.
I have been experimenting with local clones, to see to which degree a
local clone could save in terms of space. My findings are that a local
clone takes around 800M - which seems consistent with the fact that
Mercurial hardlinks the repo files but not the history, which is simply
For people like me, working on langtools, that's quite a significant
jump in terms of space - a clean langtools repo is around 150M. So, in
my specific case, disk usage will jump from 150M * 35 =~ 5G to 800M *
35 =~ 28G (this is a very conservative estimate - since it's assuming
that all files are hardlinked, which will not be the case as soon as I
start making some changes in the local clones). While this is not a deal
breaker in terms of disk spaces (my SSD has 200G in total), it poses
serious strain on my ability to do regular syncing/backups.
Add to this the fact that most backup/syncing tools explicitly calls out
the hardlink case as being problematic. Unison doesn't support them,
rsync supports them to some degree, and even some professional backup
tools I'm using no do support them (or recommend to do without them
anyway). So, local cloning could be a fine solution when working on one
machine, but as soon as you start considering back up, you have
troubles. For this reasons I will have to consider to change my day to
day workflow, and to try and avoid relying on clone as much as I did -
which poses issues: for instance, if I keep all my patches in the same
repo (by using either MQ or bookmarks) - how do I differentiate between
the different IDE projects to work on them?
Last but not the least - if the local clone size I'm seeing now (800M)
is almost entirely history-driven, and that already accounts for 50% of
the total size - doesn't that mean that i.e. in 2-3 years time, the size
of the history will trump the size of the files, meaning that the
advantages of doing local clones will be smaller and smaller over time?
On a separate and more <meta> note, it seems to me that this effort is
two things at once:
* a repo consolidation: use a single repo instead of a forest
* a source restructuring
Each of the above moves has risks and costs for people in the OpenJDK
land. For instance, as discussed above, the repo consolidation might
mean significantly change the workflow people use on a daily basis (see
above). At the same time, the source restructuring is posing issues for
things like builds, IDE support, and the likes.
I wonder if it wouldn't be sensible to do the repo restructuring now,
where the new repo is simply a consolidated version of the new one; no
need to update build scripts to take into account new paths. Then, maybe
in the next release (JDK 11), we could attack the source restructuring
problem. This way people will have more time to adjust to the big
changes that are coming.
What do you think?
On 11/10/16 03:11, joe darcy wrote:
> Looking ahead to JDK 10, a group of JDK engineers have been exploring
> consolidating the large number of Hg repositories in an open JDK
> forest to a single one with the goal of using the consolidated
> arrangement for JDK 10.
> This message is being sent to jdk9-dev since a jdk10-dev alias to
> discuss JDK 10 doesn't exist yet.
> A JEP describing the project has been submitted :
> JDK-8167368: Consolidate JDK 10 OpenJDK repositories to a single
> The text of the JEP describes the motivation and current state of the
> work in more detail, including proposed changes to the file layout.
> Publication of the prototype consolidated repository is planned, but
> not done yet. The email below has a list of additional anticipated
> questions and answers.
> We feel this consolidated arrangement offers some significant
> structural advantages for managing the JDK's source code and we are
> now asking for feedback on this potential change. In particular, if
> you feel there is a show-stopper problem with making this change,
> please let us know!
> I'd like to acknowledge the work of Stefan Sarne, Stuart Marks, and
> Ingemar Aberg participating in discussions leading up to the prototype
> and I'd like to especially recognize the contributions of Erik Helin
> for savvy Hg manipulations and Erik Joelsson for skillful build
> wrangling in this project.
> Please send initial comments by October 18, 2016.
> Q: What about the set of forests for JDK 10? Are we going to have
> master, dev, client, hotspot, etc. the same set as in 9?
> A: That is a separate question from the repository consolidation, but
> there will likely be simplifications here too. Discussions on that
> point will come later.
> Q: I usually just build the code in repo X today. Will I have have to
> build the *whole JDK* now?
> A: Not necessarily. The same top-level build targets should work in
> the consolidated forest.
> Q: Does disk usage change?
> A: The total disk usage of the current forest compared to the
> consolidated forest is nearly the same.
> Q: In more detail, how were the changesets imported?
> A: The scripts used for the consolidation conversion are attached to
> the JEP.
> Q: What happens to the Hg hashes?
> A: The conversion scheme used in the prototype does *not* preserve Hg
> hashes of changesets compared the current forests. However, the bug
> ids are preserved and can be searched for. In addition, one or more
> pre-consolidation forests should be archived in perpetuity so that
> URLs in bug comments continue to work, etc.
> A mapping of the old hashes to the corresponding new hashes might be
> generated and placed in the final new repo.
> Q: I'm allergic to tabs; what about jcheck?
> A: If history is preserved, the checking done by jcheck needs to be
> modified for the consolidated forest. One way to do this is to augment
> the white lists used in jcheck with the conflicting changesets. This
> approach may not be elegant, but it is effective and doesn't appear to
> appreciably impact jcheck running times.
> Q: Will the future 9 update forest also have this consolidation
> A: The script used to do the consolidation conversion is deterministic
> and could be run to create the 9 update forest in the future at the
> discretion of the 9 update team.
> Q: For backports for forwardports, will there be a script to translate
> patch files across the consolidation boundary?
> A: That work is planned, but not yet done; see JDK-8165623: Create
> patch translator to update paths pre/post consolidation.
> Q: It's the 21st century and I develop using an IDE. That is still
> going to work, right?
> A: The prototype to date does include updating the various IDE support
> files, but bug JDK-8167142 has been filed to track that work.
More information about the jdk9-dev