Parallel GC and array object layout: way off the base and laid out in reverse?
thomas.schatzl at oracle.com
Wed Sep 4 13:49:42 UTC 2013
On Wed, 2013-09-04 at 15:28 +0400, Aleksey Shipilev wrote:
> On 09/04/2013 02:56 PM, Thomas Schatzl wrote:
> > Also the results are wrong:
> >>> $ java -XX:+UseParallelGC ArrayLayoutTest
> >>> Before the GC:
> >>> array is at 4120951026 (0 units off base)
> >>> object is at 4120951033, 7 units off base, toString = 0
> > ^^^ the first array element is typically on
> > word offset 3 or so... (iirc in the simplest case: 1 word header, 1 word
> > klass pointer, 1 int element size and possibly some padding).
> That is not &arr[i], i.e. not the (arr + i*sizeof(oop)). This is the
> location of object referred by the arr[i], i.e. (long)arr[i]. The issue
> is about GC laying out the referenced objects in the reverse order.
Okay, now I understand the purpose of the test. Thanks for the
> I had updated the test one more time to make in clearer. It does not
> uses either Integer or Strings to dodge any sort of magic VM can do
> This is the sample output:
> $ ~/Install/jdk8b104/bin/java -XX:-UseCompressedOops ArrayLayoutTest
> Before the GC:
> array is at 140679387257752 (0 units off base)
> object is at 140679387257856, 104 units off base, toString = 0
> object is at 140679387257880, 128 units off base, toString = 1
> object is at 140679387257904, 152 units off base, toString = 2
> object is at 140679387257928, 176 units off base, toString = 3
> object is at 140679387257952, 200 units off base, toString = 4
> object is at 140679387257976, 224 units off base, toString = 5
> object is at 140679387258000, 248 units off base, toString = 6
> object is at 140679387258024, 272 units off base, toString = 7
> object is at 140679387258048, 296 units off base, toString = 8
> object is at 140679387258072, 320 units off base, toString = 9
> Without the compressed oops, 1 unit = 1 byte.
> Which means that for freshly allocated object, array of 10 references
> takes 80 bytes, plus 24 bytes for the header (8+8 headers, 4 array size,
> 4 for alignment), totalling 104 bytes. Right after the array, we start
> to layout the referenced objects, which take 24 bytes each (16 bytes for
> the header + 4 bytes int + 4 bytes alignment up to 8 bytes). Notice how
> dense they are packed.
> After the GC:
> array is at 140676601743672 (0 units off base)
> object is at 140676601743992, 320 units off base, toString = 0
> object is at 140676601743968, 296 units off base, toString = 1
> object is at 140676601743944, 272 units off base, toString = 2
> object is at 140676601743920, 248 units off base, toString = 3
> object is at 140676601743896, 224 units off base, toString = 4
> object is at 140676601743872, 200 units off base, toString = 5
> object is at 140676601743848, 176 units off base, toString = 6
> object is at 140676601743824, 152 units off base, toString = 7
> object is at 140676601743800, 128 units off base, toString = 8
> object is at 140676601743776, 104 units off base, toString = 9
> Now it's different. We know the array and all the referenced values got
> promoted, because their addresses changed. But now, we see the
> referenced objects are laid out in reverse! What gives?
There is no preservation of the placement order of objects during
The collectors do not particularly try very hard to keep objects
together, except maybe trying a rough depth-first traversal within a
single collector thread (I may be completely wrong, I would need to have
a look at the particular implementation; maybe others can chime in).
In addition to that, local allocation buffers, threading, work stealing
and (large) object array handling make somewhat sure that it is unlikely
that the allocation order is preserved. Since the collectors use
different implementations, the actual allocation order is also not the
same across collectors.
> Yes, and user code should be oblivious to this. However, I ask the
> different question: whether we should lay out the referenced elements in
> their indexed order, not in reverse.
Imo it's not clear whether there is a big difference, as future access
order would be important here.
Preferential access may go in either direction or completely
independent of the array (if the program accesses lots of unrelated
objects for each array element anyway).
In this particular case, modern hw prefetchers also work well in the
At the moment, access information is not gathered anywhere in the VM afaik.
Even if the information were available and somehow used it is not clear
whether the effort spent on gathering and applying this information
amortizes itself later.
Maybe there are good studies on current hardware on realistic loads
about that somewhere?
More information about the hotspot-gc-dev