Call for Discussion: New Project: Skara -- investigating source code management options for the JDK sources
erik.osterlund at oracle.com
Mon Jul 30 12:41:02 UTC 2018
I don't see the fascination with reinventing the source code hosting wheel again for our project. Perhaps there is a good point to it, but I can't currently seem to see it. It seems like some are saying that with custom mercurial hacks we can achieve smaller repos competitive with git for cloning. Although to fully utilize that and actually get fast cloning times for everyone working on the project, we would need mirrors of the hack in multiple countries, with backup, synchronization and other hosting stuff (security patching and what not), and it still comes with space-time tradeoffs. Even then there are things like object pooling across repos allowing people to have their own forks without unnecessary disk overheads that perhaps we could solve by digging into mercurial and writing our own extensions and custom hosting solution. I'm sure if we tried hard and put lots of resources into reinventing solutions for these source code hosting problems, it would be almost as good a!
But is it worth our focus and effort to reinvent source code hosting again because OpenJDK is so special, instead of just putting it on github like everybody else and have small repos natively with git (without hacks), good tooling, fast access with mirrors everywhere, backups, cross-repo object sharing, programmable bots (that can be used to e.g. check automatically if it builds on Oracle external platforms like PPC/S390/AArch64/Zero, so we can notice problems before they are pushed), etc for free? I would personally rather ride on the source code hosting experience and expertise of GitHub than to chase after homegrown solutions to patch the problems.
On Jul 30 2018, at 1:53 pm, Aleksey Shipilev <shade at redhat.com> wrote:
> On 07/30/2018 01:13 PM, Weijun Wang wrote:
> > Joe said on Jul 28:
> > > In Mercurial, when a file is moved, its history is restarted, meaning a full copy of the file is stored. Therefore, lots of file moves will tend to make a Mercurial repo get disproportionally larger. In the JDK, many files were moved in JDK 9 for modularity and large numbers of files were moved again in JDK 10 for the repo consolidation.
> > > The Mercurial representation of JDK 8 GA takes about 412 MB, JDK 9 GA ~808 MB, and JDK 10 GA ~1553 MB.
> > So this is related to Mercurial's design that a rename equals to a remove and a create.
> > Maybe we can fix Mercurial to make this a real "move", and I doubt if there is a space-time tradeoff here.
> What I meant to say is that space-time tradeoff between on-the-wire format (bundles) and on-the-disk
> format (.hg folder) is there, and you can choose either, depending on the context. Publishing blobs
> in on-the-wire format has better compatibility, while tarballs in on-the-disk format are ultimately
> faster to "clone".
> Two mega-moves (Jigsaw in 9, and monorepo in 10) inflated the on-the-disk size quite badly, as Joe
> indicated above, but on-the-wire format size seems to remain okay. So, if we enabled CDN-backed
> bundles-assisted clone, it should probably cut down clone pains, at least for our Europe-side folks,
> at the expense of some client CPU churn associated with converting on-the-wire to on-the-disk during
> the clone.
> Some optimization for on-the-disk size is possible if you re-clone the repo with
> "--config=format.generaldelta=1 --config=format.aggressivemergedeltas=1", thus optimizing internal
> .hg metadata. That would take a lot of time. If you have some time to spare, then it makes sense to
> do so. My build scripts do that automatically before packaging the .hg snapshots.
> Also, it seems that doing the "clone --pull" twice with generaldelta enabled compacts metadata even
> more: jdk/jdk .hg size fell from 1.5 GB to 1.2 GB uncompressed, and from 750M to 590M
> xz9-compressed. I just fixed my build scripts and currently testing them.
More information about the discuss