A List implementation backed by multiple small arrays rather than the traditional single large array.
opinali at gmail.com
Mon Mar 29 08:08:36 PDT 2010
Initially, it would be good enough to replace only java.util.ArrayList with
minimal overhead. ArrayList does not support efficient add-at-front or other
enhancements of ArrayDeque; but ArrayList is still a much more important and
popular collection, it's the primary "straight replacement for primitive
arrrays" and I guess it should continue with that role.
One problem of both ArrayList and primitive arrays is that they're not
GC-friendly; huge arrays suck for GC. IBM's realtime Metronome collector
uses the "arraylet" structure for primitive arrays, so there is a hard
upper-limit on object size (well, at least as long as apps don't define
classes with thousands of fields, I guess). This avoids the whole issue of
"large objects" which permits a simpler heap layout, better incremental GC,
etc. There are two tradeoffs. First, some overhead for all array operations
- but this is the least important, remarkably as the arraylet trick is
implement at the VM level so we can rely on the JIT to perform extra
optimizations (e.g., unrolling and other loop optimizations; bounds-check
elimination and other array opts, may be arraylet-aware so most overhead is
cancelled or at least lifted out of loop bodies and hot paths.) Second, no
support at all for huge arrays is incompatible with native code that expects
a continuous layout, e.g. for the bytes inside Images - so all these uses
must be identified and fixed somehow, e.g using DirectBuffers, or changing
the native layer to understand arraylets (image libaries may be OK with
banding), or in the worst case just copy the data to/from a continuous,
native array (in most cases I think this copy already happens for other
reasons, so there's no extra copy, just a slightly more expensive copy).
Now we're talking about some big VM change of course, but HotSpot would not
be the first production VM to do this so maybe it's a viable project for the
future, remarkably as Sun plans to keep raising the bar in
incremental/realtime GC (G1 may already be a great step forward, but huge
arrays will always spoil the fun for many apps).
In summary I think the ChunkedArrayList would serve only as a stopgap
solution, with extremely limited benefits unless it's sufficiently good so
like Martin says, we can replace more List implementations. And I'll even
add, replace many other collections too - e.g. a giant HashMap will contain
a giant Entry array inside it, I want this array to be chunked too
(ConcurrentHashMap already is, but it's tuned up differently, for concurrent
usage - and that's just one example anyway). And by "replace" I further mean
"change the implementation of all existing collections that are
array-backed", not "offer new collections" as the latter will only be
heavily used ten years from today when JavaSE7 is considered the minimum
JavaSE release to be supported by apps/libraries/frameworks/containers/etc.
Even then, the benefits will be clearly inferior to what can be achireved by
2010/3/29 Martin Buchholz <martinrb at google.com>
> On Sun, Mar 28, 2010 at 04:55, Kevin L. Stern <kevin.l.stern at gmail.com>
> > I put together the following class, ChunkedArrayList, in response to
> > Martin's request (excerpted from an earlier conversation on this web
> > below.
> > Thoughts?
> This class is well on the way to what I was thinking of,
> but my bar for acceptance is a little higher.
> In particular, I don't want to add yet another class
> that is can replace some, but not all of existing
> list implementations.
> Most obviously, I don't want to lose the ability,
> introduced in ArrayDeque, of having O(1) insertion
> at the front and end of the collection.
> Perhaps you can do this by having one "arraylet"
> always be shared by both ends, which
> grow towards each other in circular fashion.
> I also think we should shrink the array when
> necessary, so that occupancy never drops
> below, say 50%.
> Perhaps we should also have amortized O(1)
> insertion in the middle by using a "gap array".
> Probably more important for byte/char collections
> like StringBuilder...
> I believe there are more complicated implementations
> that permit O(1) insertions at the ends, and only
> O(sqrt(N)) space overhead.
> E.g. Use your favorite search engine to do
> some research on:
> Resizable arrays in optimal time and space
> Succinct dynamic data structures
> Meta-comment: there is not enough transfer of
> academic research results into practice; I would think this
> is one of the responsibilities of the researchers.
> I presume you'd be willing to sign a
> contributor agreement to get your changes into
> the JDK someday.
> > Regards,
> > Kevin
> > On Tue, Mar 9, 2010 at 3:15 PM, Martin Buchholz <martinrb at google.com>
> > wrote:
> > It surely is not a good idea to use a single backing array
> > for huge arrays. As you point out, it's up to 32GB
> > for just one object. But the core JDK
> > doesn't offer a suitable alternative for users who need very
> > large collections.
> > It would have been more in the spirit of Java to have a
> > collection class instead of ArrayList that was not fastest at
> > any particular operation, but had excellent asymptotic behaviour,
> > based on backing arrays containing backing arrays.
> > But:
> > - no such excellent class has been written yet
> > (or please point me to such a class)
> > - even if it were, such a best-of-breed-general-purpose
> > List implementation would probably need to be introduced as a
> > separate class, because of the performance expectations of
> > existing implementations.
> > In the meantime, we have to maintain what we got,
> > and that includes living with arrays and classes that wrap them.
> > Changing the spec is unlikely to succeed..
> > Martin
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the core-libs-dev