Call for Discussion: New Project: Skara -- investigating source code management options for the JDK sources
joe.darcy at oracle.com
Fri Jul 27 20:10:20 UTC 2018
On 7/27/2018 2:26 AM, Mario Torre wrote:
> Hi Martijn,
> How many contributions from developers in those git mirrors came into
> OpenJDK (or even, how many contributions happened on those mirrors
> outside of OpenJDK development?).
> I think the point about performance is sound , but I would be very
> careful to introduce a new SCM, lots of developers are used with
> mercurial now, and even if git is probably just a small learning step
> away, I would argue that this is unnecessary to the people who are
> already contributing.
>  It really is terrible now with a single repo, but is it a problem
> of mercurial really? Git also carries all the history in the clone,
> did somebody do some testing on this, and I mean, on the same servers
> and network?
In Mercurial, when a file is moved, its history is restarted, meaning a
full copy of the file is stored. Therefore, lots of file moves will tend
to make a Mercurial repo get disproportionally larger. In the JDK, many
files were moved in JDK 9 for modularity and large numbers of files were
moved again in JDK 10 for the repo consolidation.
The Mercurial representation of JDK 8 GA takes about 412 MB, JDK 9 GA
~808 MB, and JDK 10 GA ~1553 MB. Given the number of changesets in JDK
10, extrapolating from the good linear fit between number of changesets
and size in the JDK 7 and 8 update releases, one would expect JDK 10 in
hg to take in the neighborhood of 450 MB - 500 MB. Therefore, the file
moves are certainly bulking up the repo size, contributing to the
increased download times.
While a simple import of the JDK sources into git can lead to a larger
representation, if the git repo is repacked , it will result in a
much, much small representation. Basically a repack is requesting git
use forward and backward differencing with a large window to look for a
more compact representation; this will remove the excess size introduced
by the file moves. In particular, by running
git repack -a -d --depth=250 --window=250 -f
on some git imports of the JDK we've done internally, we ended with a
git repo size of recent JDK sources of around 300 MB, roughly 5X
smaller. That 300 MB includes all the JDK changeset history and tags, etc.
In some experiments with hosting providers, cloning such a repacked git
repo can be completed within 1 to 3 minutes, which is considerably
faster than the clone times we see now from hg.openjdk.java.net.
More information about the discuss