A List implementation backed by multiple small arrays rather than the traditional single large array.

Ben Manes ben_manes at yahoo.com
Tue Mar 30 22:45:34 UTC 2010

You might be able to take some ideas from the VList data structure and zipper model. The research on persistent data structures tend to have some fairly interesting ideas and some approaches might work well here.

From: Kevin L. Stern <kevin.l.stern at gmail.com>
To: Martin Buchholz <martinrb at google.com>
Cc: core-libs-dev at openjdk.java.net
Sent: Tue, March 30, 2010 4:25:41 AM
Subject: Re: A List implementation backed by multiple small arrays rather than the traditional single large array.

Hi Martin,

Thanks much for your feedback.  The first approach 
that comes to mind to implement O(1) time front as well as rear 
insertion is to create a cyclic list structure with a 
front/rear pointer - to insert at the front requires decrementing the 
front pointer (modulo the size) and to insert at the rear requires 
incrementing the rear pointer (modulo the size).  We need to resize when the two pointers bump into each other.  Could you explain more about 
your suggestion of introducing an arraylet that is shared by the front 
and the rear?  It's not clear to me how that would help and/or be a better approach than the cyclic list.  Anyhow, the 
paper that you reference, "Resizable arrays in optimal time and space", 
gives a deque so if we take that approach then the deque is specified.

Shrinking the array is not a problem - this comes 'for free' (in the sense that it's required) in the optimal space data structure that you reference.

Regarding the gap array suggestion, it is not clear to me how we will still 
compute the correct arraylet/offset for an index in O(1) time if we have arraylets of arbitrary size.  Even worse, if we go with the optimal 
space data structure we will not have the option of creating arraylets 
of arbitrary size or with arbitrary gaps between elements.

You are absolutely right about the n^(1/2) space overhead; I was not aware of this research.  I'll go ahead and implement the structure 
defined in "Resizable arrays in optimal time and space" (once I find some time to do so).



On Mon, Mar 29, 2010 at 2:23 AM, Martin Buchholz <martinrb at google.com> wrote:

On Sun, Mar 28, 2010 at 04:55, Kevin L. Stern <kevin.l.stern at gmail.com> wrote:
>>> I put together the following class, ChunkedArrayList, in response to
>>> Martin's request (excerpted from an earlier conversation on this web board)
>>> below.
>>> https://docs.google.com/leaf?id=0B6brz3MPBDdhMGNiNGIwMTQtMTgxMi00ODlmLTk4ZGYtOWY2NDE0M2E5M2Zl&sort=name&layout=list&num=50
>>> Thoughts?
>This class is well on the way to what I was thinking of,
>>but my bar for acceptance is a little higher.
>>In particular, I don't want to add yet another class
>>that is can replace some, but not all of existing
>>list implementations.
>>Most obviously, I don't want to lose the ability,
>>introduced in ArrayDeque, of having O(1) insertion
>>at the front and end of the collection.
>>Perhaps you can do this by having one "arraylet"
>>always be shared by both ends, which
>>grow towards each other in circular fashion.
>>I also think we should shrink the array when
>>necessary, so that occupancy never drops
>>below, say 50%.
>>Perhaps we should also have amortized O(1)
>>insertion in the middle by using a "gap array".
>>Probably more important for byte/char collections
>>like StringBuilder...
>>I believe there are more complicated implementations
>>that permit O(1) insertions at the ends, and only
>>O(sqrt(N)) space overhead.
>>E.g. Use your favorite search engine to do
>>some research on:
>>Resizable arrays in optimal time and space
>>Succinct dynamic data structures
>>Meta-comment: there is not enough transfer of
>>academic research results into practice; I would think this
>>is one of the responsibilities of the researchers.
>>I presume you'd be willing to sign a
>>contributor agreement to get your changes into
>>the JDK someday.
>>> Regards,
>>> Kevin
>>> On Tue, Mar 9, 2010 at 3:15 PM, Martin Buchholz <martinrb at google.com>
>>> wrote:
>>>     It surely is not a good idea to use a single backing array
>>>     for huge arrays.  As you point out, it's up to 32GB
>>>     for just one object.  But the core JDK
>>>     doesn't offer a suitable alternative for users who need very
>>>     large collections.
>>>     It would have been more in the spirit of Java to have a
>>>     collection class instead of ArrayList that was not fastest at
>>>     any particular operation, but had excellent asymptotic behaviour,
>>>     based on backing arrays containing backing arrays.
>>>     But:
>>>     - no such excellent class has been written yet
>>>      (or please point me to such a class)
>>>     - even if it were, such a best-of-breed-general-purpose
>>>      List implementation would probably need to be introduced as a
>>>      separate class, because of the performance expectations of
>>>      existing implementations.
>>>     In the meantime, we have to maintain what we got,
>>>     and that includes living with arrays and classes that wrap them.
>>>     Changing the spec is unlikely to succeed..
>>>     Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100330/69719c34/attachment.html>

More information about the core-libs-dev mailing list