From stephen.bohne at sun.com Mon May 5 09:40:31 2008 From: stephen.bohne at sun.com (stephen.bohne at sun.com) Date: Mon, 05 May 2008 16:40:31 +0000 Subject: hg: jdk7/hotspot-rt/hotspot: 16 new changesets Message-ID: <20080505164102.AFDE427D06@hg.openjdk.java.net> Changeset: 9e5a7340635e Author: sgoldman Date: 2008-04-17 07:16 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/9e5a7340635e 6688137: c++ interpreter fails on 64bit sparc Summary: Misc. 64bit and endian fixes for sparc Reviewed-by: never, kvn, rasbold Contributed-by: volker.simonis at gmail.com ! src/cpu/sparc/vm/bytecodeInterpreter_sparc.hpp ! src/cpu/sparc/vm/cppInterpreter_sparc.cpp ! src/share/vm/interpreter/bytecodeInterpreter.cpp Changeset: b130b98db9cf Author: kvn Date: 2008-04-23 11:20 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/b130b98db9cf 6689060: Escape Analysis does not work with Compressed Oops Summary: 64-bits VM crashes with -XX:+AggresiveOpts (Escape Analysis + Compressed Oops) Reviewed-by: never, sgoldman ! src/cpu/sparc/vm/assembler_sparc.cpp ! src/cpu/sparc/vm/assembler_sparc.hpp ! src/cpu/sparc/vm/sparc.ad ! src/cpu/x86/vm/assembler_x86_64.cpp ! src/cpu/x86/vm/assembler_x86_64.hpp ! src/cpu/x86/vm/stubGenerator_x86_64.cpp ! src/cpu/x86/vm/x86_64.ad ! src/share/vm/opto/connode.cpp ! src/share/vm/opto/connode.hpp ! src/share/vm/opto/escape.cpp ! src/share/vm/opto/macro.cpp ! src/share/vm/opto/memnode.cpp ! src/share/vm/runtime/sharedRuntime.cpp Changeset: d942c7e64bd9 Author: never Date: 2008-04-23 13:57 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/d942c7e64bd9 6601321: Assert(j == 1 || b->_nodes[j-1]->is_Phi(),"CreateEx must be first instruction in block") Reviewed-by: kvn, rasbold, sgoldman, jrose ! src/share/vm/opto/lcm.cpp Changeset: 72f4a668df19 Author: kvn Date: 2008-04-23 19:09 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/72f4a668df19 6625997: CastPP, CheckCastPP and Proj nodes are not dead loop safe Summary: EA and initialization optimizations could bypass these nodes. Reviewed-by: rasbold, never ! src/share/vm/opto/cfgnode.cpp ! src/share/vm/opto/connode.hpp ! src/share/vm/opto/multnode.hpp ! src/share/vm/opto/node.hpp Changeset: e0bd2e08e3d0 Author: never Date: 2008-04-24 11:13 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/e0bd2e08e3d0 6663848: assert(i < Max(),"oob") in C2 with -Xcomp Summary: NeverBranchNodes aren't handled properly Reviewed-by: kvn, sgoldman, rasbold, jrose ! src/share/vm/opto/cfgnode.cpp ! src/share/vm/opto/cfgnode.hpp ! src/share/vm/opto/compile.cpp + test/compiler/6663848/Tester.java Changeset: a76240c8b133 Author: rasbold Date: 2008-04-28 08:08 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/a76240c8b133 Merge ! src/share/vm/opto/memnode.cpp ! src/share/vm/opto/node.hpp ! src/share/vm/runtime/sharedRuntime.cpp Changeset: 53735b80b9f1 Author: sbohne Date: 2008-05-01 09:38 -0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/53735b80b9f1 Merge Changeset: c0939256690b Author: rasbold Date: 2008-04-24 14:02 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/c0939256690b 6646019: array subscript expressions become top() with -d64 Summary: stop compilation after negative array allocation Reviewed-by: never, jrose ! src/share/vm/opto/parse2.cpp + test/compiler/6646019/Test.java Changeset: 3e2d987e2e68 Author: rasbold Date: 2008-04-29 06:52 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/3e2d987e2e68 Merge Changeset: 6e825ad773c6 Author: jrose Date: 2008-04-29 19:40 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/6e825ad773c6 6695288: runThese tests expr30303 and drem00301m1 fail when compiled code executes without deopt Summary: rework Value method for ModD and ModF, to DTRT for infinities Reviewed-by: sgoldman, kvn, rasbold ! src/share/vm/opto/divnode.cpp Changeset: 60b728ec77c1 Author: jrose Date: 2008-04-29 19:45 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/60b728ec77c1 6652736: well known classes in system dictionary are inefficiently processed Summary: combine many scalar variables into a single enum-indexed array in SystemDictionary. Reviewed-by: kvn ! agent/src/share/classes/sun/jvm/hotspot/memory/SystemDictionary.java ! src/share/vm/classfile/javaClasses.cpp ! src/share/vm/classfile/javaClasses.hpp ! src/share/vm/classfile/systemDictionary.cpp ! src/share/vm/classfile/systemDictionary.hpp ! src/share/vm/runtime/globals.hpp ! src/share/vm/runtime/vmStructs.cpp ! src/share/vm/services/threadService.cpp Changeset: bcdc68eb7e1f Author: sbohne Date: 2008-05-02 08:22 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/bcdc68eb7e1f Merge Changeset: c0492d52d55b Author: apetrusenko Date: 2008-04-01 15:13 +0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/c0492d52d55b 6539517: CR 6186200 should be extended to perm gen allocation to prevent spurious OOM's from perm gen Reviewed-by: ysr, jmasa ! src/share/vm/gc_implementation/concurrentMarkSweep/cmsPermGen.cpp ! src/share/vm/gc_implementation/concurrentMarkSweep/cmsPermGen.hpp ! src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.cpp ! src/share/vm/gc_implementation/parallelScavenge/psParallelCompact.cpp ! src/share/vm/gc_implementation/parallelScavenge/vmPSOperations.cpp ! src/share/vm/gc_implementation/shared/vmGCOperations.cpp ! src/share/vm/gc_implementation/shared/vmGCOperations.hpp ! src/share/vm/includeDB_core ! src/share/vm/memory/gcLocker.cpp ! src/share/vm/memory/genCollectedHeap.hpp ! src/share/vm/memory/permGen.cpp ! src/share/vm/memory/permGen.hpp ! src/share/vm/runtime/globals.hpp ! src/share/vm/runtime/vm_operations.hpp Changeset: 3febac328d82 Author: apetrusenko Date: 2008-04-16 12:58 +0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/3febac328d82 Merge - src/cpu/sparc/vm/disassembler_sparc.cpp - src/cpu/x86/vm/disassembler_x86.cpp - src/share/vm/compiler/disassemblerEnv.hpp ! src/share/vm/gc_implementation/parallelScavenge/psParallelCompact.cpp ! src/share/vm/includeDB_core ! src/share/vm/runtime/globals.hpp Changeset: fcbfc50865ab Author: iveresov Date: 2008-04-29 13:51 +0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/fcbfc50865ab 6684395: Port NUMA-aware allocator to linux Summary: NUMA-aware allocator port to Linux Reviewed-by: jmasa, apetrusenko ! build/linux/makefiles/mapfile-vers-debug ! build/linux/makefiles/mapfile-vers-product ! src/os/linux/vm/os_linux.cpp ! src/os/linux/vm/os_linux.hpp ! src/os/linux/vm/os_linux.inline.hpp ! src/os/solaris/vm/os_solaris.cpp ! src/os/solaris/vm/os_solaris.inline.hpp ! src/os/windows/vm/os_windows.cpp ! src/os/windows/vm/os_windows.inline.hpp ! src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.hpp ! src/share/vm/gc_implementation/shared/mutableNUMASpace.cpp ! src/share/vm/gc_implementation/shared/mutableNUMASpace.hpp ! src/share/vm/includeDB_core ! src/share/vm/runtime/os.hpp Changeset: 8bd1e4487c18 Author: iveresov Date: 2008-05-04 03:29 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/8bd1e4487c18 Merge ! make/linux/makefiles/mapfile-vers-debug ! make/linux/makefiles/mapfile-vers-product ! src/os/windows/vm/os_windows.cpp ! src/share/vm/gc_implementation/parallelScavenge/psParallelCompact.cpp ! src/share/vm/includeDB_core ! src/share/vm/memory/genCollectedHeap.hpp ! src/share/vm/runtime/globals.hpp From dgrove at google.com Mon May 5 10:42:59 2008 From: dgrove at google.com (Dan Grove) Date: Mon, 5 May 2008 10:42:59 -0700 Subject: compressed oops and 64-bit header words Message-ID: Hi- I talked some with the Nikolay Igotti about compressed oops in OpenJDK7. He tells me that the mark word and class pointer remain 64 bits when compressed oops are being used. It seems that this leaves a fair amount of the bloat in place when moving from 32->64 bits. I'm interesting in deprecating 32-bit VM's at my employer at some point. Doing this is going to require that 64-bit VM's have as little bloat as possible. Has there been any consideration of making the mark word and class pointer 32 bits in cases where the VM fits within 4GB? It seems like this would be a major win. A second benefit here is that the "add and shift" currently required on dereference of compressed oops could be eliminated in cases where the VM fit inside 4GB. Dan From Vladimir.Kozlov at Sun.COM Mon May 5 11:45:35 2008 From: Vladimir.Kozlov at Sun.COM (Vladimir Kozlov) Date: Mon, 05 May 2008 11:45:35 -0700 Subject: compressed oops and 64-bit header words In-Reply-To: References: Message-ID: <481F55CF.2090102@sun.com> Dan, Only the mark word is 64 bits. The klass pointer is 32-bits but in the current implementation the gap after klass is not used. I am working on to use the gap for a field or array's length. The mark word may contain a 64-bits tread pointer (for Biased Locking). Thanks, Vladimir Dan Grove wrote: > Hi- > > I talked some with the Nikolay Igotti about compressed oops in > OpenJDK7. He tells me that the mark word and class pointer remain 64 > bits when compressed oops are being used. It seems that this leaves a > fair amount of the bloat in place when moving from 32->64 bits. > > I'm interesting in deprecating 32-bit VM's at my employer at some > point. Doing this is going to require that 64-bit VM's have as little > bloat as possible. Has there been any consideration of making the mark > word and class pointer 32 bits in cases where the VM fits within 4GB? > It seems like this would be a major win. A second benefit here is that > the "add and shift" currently required on dereference of compressed > oops could be eliminated in cases where the VM fit inside 4GB. > > Dan From Coleen.Phillimore at Sun.COM Mon May 5 12:20:49 2008 From: Coleen.Phillimore at Sun.COM (Coleen Phillimore - Sun Microsystems) Date: Mon, 05 May 2008 15:20:49 -0400 Subject: compressed oops and 64-bit header words In-Reply-To: <481F55CF.2090102@sun.com> References: <481F55CF.2090102@sun.com> Message-ID: <481F5E11.9080803@sun.com> Actually, we are using the gap for a field and array length in the code now, but the code Vladimir showed me makes the allocation code a lot cleaner for the instance field case. In the array case in 64 bits, compressing the _klass pointer into 32 bits allows us to move the _length field into the other 32 bits, which because of alignment saves 64 bits. There was a 32 bit alignment gap after the _length field, if not compressed with the klass pointer. The mark word can also contain a forwarding pointer used during GC, so can't be 32 bits. The compression that we use allows for 32G because we shift into the least significant bits - the algorithm is (ptr-heap_base)>>3. Coleen Vladimir Kozlov wrote: > Dan, > > Only the mark word is 64 bits. The klass pointer is 32-bits but > in the current implementation the gap after klass is not used. > > I am working on to use the gap for a field or array's length. > > The mark word may contain a 64-bits tread pointer (for Biased Locking). > > Thanks, > Vladimir > > Dan Grove wrote: >> Hi- >> >> I talked some with the Nikolay Igotti about compressed oops in >> OpenJDK7. He tells me that the mark word and class pointer remain 64 >> bits when compressed oops are being used. It seems that this leaves a >> fair amount of the bloat in place when moving from 32->64 bits. >> >> I'm interesting in deprecating 32-bit VM's at my employer at some >> point. Doing this is going to require that 64-bit VM's have as little >> bloat as possible. Has there been any consideration of making the mark >> word and class pointer 32 bits in cases where the VM fits within 4GB? >> It seems like this would be a major win. A second benefit here is that >> the "add and shift" currently required on dereference of compressed >> oops could be eliminated in cases where the VM fit inside 4GB. >> >> Dan From Y.S.Ramakrishna at Sun.COM Mon May 5 12:56:14 2008 From: Y.S.Ramakrishna at Sun.COM (Y Srinivas Ramakrishna) Date: Mon, 05 May 2008 12:56:14 -0700 Subject: compressed oops and 64-bit header words In-Reply-To: <481F5E11.9080803@sun.com> References: <481F55CF.2090102@sun.com> <481F5E11.9080803@sun.com> Message-ID: Hi Coleen -- Regarding:- > The mark word can also contain a forwarding pointer used during GC, so > > can't be 32 bits. What about the lock record / stack pointer when the object is locked? Would that also be a factor precluding compression to 32 bits even if GC were to compress and decompress forwarding pointers at some cost (in time and representable heap size, because of the need for the forwarding bit even in the compressed case). -- ramki From dgrove at google.com Mon May 5 13:01:43 2008 From: dgrove at google.com (Dan Grove) Date: Mon, 5 May 2008 13:01:43 -0700 Subject: compressed oops and 64-bit header words In-Reply-To: <481F5E11.9080803@sun.com> References: <481F55CF.2090102@sun.com> <481F5E11.9080803@sun.com> Message-ID: Thanks Colleen and Vladimir- What I'm wondering is whether there could be a third mode: 1. > 32GB - uses uncompressed pointers 2. (something less than 4GB) < Xmx < 32GB - uses compressed pointers (along with 64-bit mark word), 64-bit ABI 3. whole app fits in 4GB - uses 32-bit pointers in heap, but 64-bit ABI. The idea here is that I'd prefer to pay no penalty over 32-bit when my app runs in 64-bit mode and the app fits in 4GB of memory (my reason for this is that I want to support our JNI libraries only in 64-bit mode, and deprecate the 32-bit JNI libraries). Does this make any sense to you? Dan On Mon, May 5, 2008 at 12:20 PM, Coleen Phillimore - Sun Microsystems wrote: > > Actually, we are using the gap for a field and array length in the code > now, but the code Vladimir showed me makes the allocation code a lot cleaner > for the instance field case. > > In the array case in 64 bits, compressing the _klass pointer into 32 bits > allows us to move the _length field into the other 32 bits, which because of > alignment saves 64 bits. There was a 32 bit alignment gap after the _length > field, if not compressed with the klass pointer. > > The mark word can also contain a forwarding pointer used during GC, so > can't be 32 bits. > > The compression that we use allows for 32G because we shift into the least > significant bits - the algorithm is (ptr-heap_base)>>3. > > Coleen > > > > Vladimir Kozlov wrote: > > > Dan, > > > > Only the mark word is 64 bits. The klass pointer is 32-bits but > > in the current implementation the gap after klass is not used. > > > > I am working on to use the gap for a field or array's length. > > > > The mark word may contain a 64-bits tread pointer (for Biased Locking). > > > > Thanks, > > Vladimir > > > > Dan Grove wrote: > > > > > Hi- > > > > > > I talked some with the Nikolay Igotti about compressed oops in > > > OpenJDK7. He tells me that the mark word and class pointer remain 64 > > > bits when compressed oops are being used. It seems that this leaves a > > > fair amount of the bloat in place when moving from 32->64 bits. > > > > > > I'm interesting in deprecating 32-bit VM's at my employer at some > > > point. Doing this is going to require that 64-bit VM's have as little > > > bloat as possible. Has there been any consideration of making the mark > > > word and class pointer 32 bits in cases where the VM fits within 4GB? > > > It seems like this would be a major win. A second benefit here is that > > > the "add and shift" currently required on dereference of compressed > > > oops could be eliminated in cases where the VM fit inside 4GB. > > > > > > Dan > > > > > > From Paul.Hohensee at Sun.COM Mon May 5 13:42:09 2008 From: Paul.Hohensee at Sun.COM (Paul Hohensee) Date: Mon, 05 May 2008 16:42:09 -0400 Subject: compressed oops and 64-bit header words In-Reply-To: References: <481F55CF.2090102@sun.com> <481F5E11.9080803@sun.com> Message-ID: <481F7121.2050102@sun.com> An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20080505/3a6efb0c/attachment.html From Y.S.Ramakrishna at Sun.COM Mon May 5 13:46:46 2008 From: Y.S.Ramakrishna at Sun.COM (Y Srinivas Ramakrishna) Date: Mon, 05 May 2008 13:46:46 -0700 Subject: compressed oops and 64-bit header words In-Reply-To: References: <481F55CF.2090102@sun.com> <481F5E11.9080803@sun.com> Message-ID: Answering my own question, given my relative ignorance of the current scheme ... > > > The mark word can also contain a forwarding pointer used during GC, > so > > > > can't be 32 bits. > > What about the lock record / stack pointer when the object is locked? I suppose one could (a) either allocate the lock records in the heap and compress them in the same manner, or (b) allocate them in a special lock record area and compress the pointers as an offset, and (c) for the stacks, somehow ensure sufficient alignment of all thread stacks and thus be able to compress stack pointers as well. > Would that also be a factor precluding compression to 32 bits even if > GC were to > compress and decompress forwarding pointers at some cost (in time > and representable heap size, because of the need for the forwarding bit > even in the compressed case). I guess that it would boil down to the measured tradeoff between the memory traffic savings from the compression versus the dynamic cost of decompression. -- ramki From Coleen.Phillimore at Sun.COM Mon May 5 15:47:59 2008 From: Coleen.Phillimore at Sun.COM (Coleen Phillimore) Date: Mon, 05 May 2008 18:47:59 -0400 Subject: compressed oops and 64-bit header words In-Reply-To: References: <481F55CF.2090102@sun.com> <481F5E11.9080803@sun.com> Message-ID: <481F8E9F.8000208@sun.com> Hi, It made sense when I first read it but in order to have 32 bit pointers in #3, I can't imagine not having to encode and decode them by some heap base in order to dereference these pointers, so the only difference between #2 and #3 is the shift instruction to get to 32G. We didn't believe that the shift causes much of a performance penalty so we didn't implement it this way. We would like to measure this at some point though, and if it is faster could add this mode fairly easily. thanks! Coleen Dan Grove wrote: > Thanks Colleen and Vladimir- > > What I'm wondering is whether there could be a third mode: > > 1. > 32GB - uses uncompressed pointers > 2. (something less than 4GB) < Xmx < 32GB - uses compressed pointers > (along with 64-bit mark word), 64-bit ABI > 3. whole app fits in 4GB - uses 32-bit pointers in heap, but 64-bit ABI. > > The idea here is that I'd prefer to pay no penalty over 32-bit when my > app runs in 64-bit mode and the app fits in 4GB of memory (my reason > for this is that I want to support our JNI libraries only in 64-bit > mode, and deprecate the 32-bit JNI libraries). > > Does this make any sense to you? > > Dan > > On Mon, May 5, 2008 at 12:20 PM, Coleen Phillimore - Sun Microsystems > wrote: > >> Actually, we are using the gap for a field and array length in the code >> now, but the code Vladimir showed me makes the allocation code a lot cleaner >> for the instance field case. >> >> In the array case in 64 bits, compressing the _klass pointer into 32 bits >> allows us to move the _length field into the other 32 bits, which because of >> alignment saves 64 bits. There was a 32 bit alignment gap after the _length >> field, if not compressed with the klass pointer. >> >> The mark word can also contain a forwarding pointer used during GC, so >> can't be 32 bits. >> >> The compression that we use allows for 32G because we shift into the least >> significant bits - the algorithm is (ptr-heap_base)>>3. >> >> Coleen >> >> >> >> Vladimir Kozlov wrote: >> >> >>> Dan, >>> >>> Only the mark word is 64 bits. The klass pointer is 32-bits but >>> in the current implementation the gap after klass is not used. >>> >>> I am working on to use the gap for a field or array's length. >>> >>> The mark word may contain a 64-bits tread pointer (for Biased Locking). >>> >>> Thanks, >>> Vladimir >>> >>> Dan Grove wrote: >>> >>> >>>> Hi- >>>> >>>> I talked some with the Nikolay Igotti about compressed oops in >>>> OpenJDK7. He tells me that the mark word and class pointer remain 64 >>>> bits when compressed oops are being used. It seems that this leaves a >>>> fair amount of the bloat in place when moving from 32->64 bits. >>>> >>>> I'm interesting in deprecating 32-bit VM's at my employer at some >>>> point. Doing this is going to require that 64-bit VM's have as little >>>> bloat as possible. Has there been any consideration of making the mark >>>> word and class pointer 32 bits in cases where the VM fits within 4GB? >>>> It seems like this would be a major win. A second benefit here is that >>>> the "add and shift" currently required on dereference of compressed >>>> oops could be eliminated in cases where the VM fit inside 4GB. >>>> >>>> Dan >>>> >>>> From Vladimir.Kozlov at Sun.COM Mon May 5 16:15:46 2008 From: Vladimir.Kozlov at Sun.COM (Vladimir Kozlov) Date: Mon, 05 May 2008 16:15:46 -0700 Subject: compressed oops and 64-bit header words In-Reply-To: <481F8E9F.8000208@sun.com> References: <481F55CF.2090102@sun.com> <481F5E11.9080803@sun.com> <481F8E9F.8000208@sun.com> Message-ID: <481F9522.80408@sun.com> Coleen is right. In the implementation we are working on the shift and decode/encode instructions "will" fold into address expression. So the "only" penalty you will pay is an additional memory for 64-bit mark words. And on x86 you will win very big even with this penalty since in 64-bits mode you have more registers for local values (less stack/memory accesses). Thanks, Vladimir Coleen Phillimore wrote: > > Hi, > It made sense when I first read it but in order to have 32 bit pointers > in #3, I can't imagine not having to encode and decode them by some heap > base in order to dereference these pointers, so the only difference > between #2 and #3 is the shift instruction to get to 32G. We didn't > believe that the shift causes much of a performance penalty so we didn't > implement it this way. We would like to measure this at some point > though, and if it is faster could add this mode fairly easily. > > thanks! > Coleen > > Dan Grove wrote: >> Thanks Colleen and Vladimir- >> >> What I'm wondering is whether there could be a third mode: >> >> 1. > 32GB - uses uncompressed pointers >> 2. (something less than 4GB) < Xmx < 32GB - uses compressed pointers >> (along with 64-bit mark word), 64-bit ABI >> 3. whole app fits in 4GB - uses 32-bit pointers in heap, but 64-bit ABI. >> >> The idea here is that I'd prefer to pay no penalty over 32-bit when my >> app runs in 64-bit mode and the app fits in 4GB of memory (my reason >> for this is that I want to support our JNI libraries only in 64-bit >> mode, and deprecate the 32-bit JNI libraries). >> >> Does this make any sense to you? >> >> Dan >> >> On Mon, May 5, 2008 at 12:20 PM, Coleen Phillimore - Sun Microsystems >> wrote: >> >>> Actually, we are using the gap for a field and array length in the code >>> now, but the code Vladimir showed me makes the allocation code a lot >>> cleaner >>> for the instance field case. >>> >>> In the array case in 64 bits, compressing the _klass pointer into 32 >>> bits >>> allows us to move the _length field into the other 32 bits, which >>> because of >>> alignment saves 64 bits. There was a 32 bit alignment gap after the >>> _length >>> field, if not compressed with the klass pointer. >>> >>> The mark word can also contain a forwarding pointer used during GC, so >>> can't be 32 bits. >>> >>> The compression that we use allows for 32G because we shift into the >>> least >>> significant bits - the algorithm is (ptr-heap_base)>>3. >>> >>> Coleen >>> >>> >>> >>> Vladimir Kozlov wrote: >>> >>> >>>> Dan, >>>> >>>> Only the mark word is 64 bits. The klass pointer is 32-bits but >>>> in the current implementation the gap after klass is not used. >>>> >>>> I am working on to use the gap for a field or array's length. >>>> >>>> The mark word may contain a 64-bits tread pointer (for Biased Locking). >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> Dan Grove wrote: >>>> >>>> >>>>> Hi- >>>>> >>>>> I talked some with the Nikolay Igotti about compressed oops in >>>>> OpenJDK7. He tells me that the mark word and class pointer remain 64 >>>>> bits when compressed oops are being used. It seems that this leaves a >>>>> fair amount of the bloat in place when moving from 32->64 bits. >>>>> >>>>> I'm interesting in deprecating 32-bit VM's at my employer at some >>>>> point. Doing this is going to require that 64-bit VM's have as little >>>>> bloat as possible. Has there been any consideration of making the mark >>>>> word and class pointer 32 bits in cases where the VM fits within 4GB? >>>>> It seems like this would be a major win. A second benefit here is that >>>>> the "add and shift" currently required on dereference of compressed >>>>> oops could be eliminated in cases where the VM fit inside 4GB. >>>>> >>>>> Dan >>>>> >>>>> > From dgrove at google.com Mon May 5 16:27:33 2008 From: dgrove at google.com (Dan Grove) Date: Mon, 5 May 2008 16:27:33 -0700 Subject: compressed oops and 64-bit header words In-Reply-To: <481F8E9F.8000208@sun.com> References: <481F55CF.2090102@sun.com> <481F5E11.9080803@sun.com> <481F8E9F.8000208@sun.com> Message-ID: Hi Colleen- I'm not worried about the shift instruction - I agree that it's unlikely to matter. What I am worried about is have the standard object header have 2 64-bit words in (well, 1 64-bit word, 1 32-bit word, and 32 bits of pad). What I'm worried about is the increase in memory footprint and its impact on performance. I was pointed to http://ieeexplore.ieee.org/iel5/9012/28612/01281667.pdf?arnumber=1281667 , which (conveniently) breaks out the performance impact of compressing the header versus compressing references versus both. So what I would really be interested would be a way to have both the pointers/words in the header and the oops be 32 bits. I think this would be a good win, when coupled with the extra registers when using the 64-bit ABI. Dan On Mon, May 5, 2008 at 3:47 PM, Coleen Phillimore wrote: > > Hi, > It made sense when I first read it but in order to have 32 bit pointers in > #3, I can't imagine not having to encode and decode them by some heap base > in order to dereference these pointers, so the only difference between #2 > and #3 is the shift instruction to get to 32G. We didn't believe that the > shift causes much of a performance penalty so we didn't implement it this > way. We would like to measure this at some point though, and if it is > faster could add this mode fairly easily. > > thanks! > Coleen > > > > Dan Grove wrote: > > > Thanks Colleen and Vladimir- > > > > What I'm wondering is whether there could be a third mode: > > > > 1. > 32GB - uses uncompressed pointers > > 2. (something less than 4GB) < Xmx < 32GB - uses compressed pointers > > (along with 64-bit mark word), 64-bit ABI > > 3. whole app fits in 4GB - uses 32-bit pointers in heap, but 64-bit ABI. > > > > The idea here is that I'd prefer to pay no penalty over 32-bit when my > > app runs in 64-bit mode and the app fits in 4GB of memory (my reason > > for this is that I want to support our JNI libraries only in 64-bit > > mode, and deprecate the 32-bit JNI libraries). > > > > Does this make any sense to you? > > > > Dan > > > > On Mon, May 5, 2008 at 12:20 PM, Coleen Phillimore - Sun Microsystems > > wrote: > > > > > > > Actually, we are using the gap for a field and array length in the code > > > now, but the code Vladimir showed me makes the allocation code a lot > cleaner > > > for the instance field case. > > > > > > In the array case in 64 bits, compressing the _klass pointer into 32 > bits > > > allows us to move the _length field into the other 32 bits, which > because of > > > alignment saves 64 bits. There was a 32 bit alignment gap after the > _length > > > field, if not compressed with the klass pointer. > > > > > > The mark word can also contain a forwarding pointer used during GC, so > > > can't be 32 bits. > > > > > > The compression that we use allows for 32G because we shift into the > least > > > significant bits - the algorithm is (ptr-heap_base)>>3. > > > > > > Coleen > > > > > > > > > > > > Vladimir Kozlov wrote: > > > > > > > > > > > > > Dan, > > > > > > > > Only the mark word is 64 bits. The klass pointer is 32-bits but > > > > in the current implementation the gap after klass is not used. > > > > > > > > I am working on to use the gap for a field or array's length. > > > > > > > > The mark word may contain a 64-bits tread pointer (for Biased > Locking). > > > > > > > > Thanks, > > > > Vladimir > > > > > > > > Dan Grove wrote: > > > > > > > > > > > > > > > > > Hi- > > > > > > > > > > I talked some with the Nikolay Igotti about compressed oops in > > > > > OpenJDK7. He tells me that the mark word and class pointer remain 64 > > > > > bits when compressed oops are being used. It seems that this leaves > a > > > > > fair amount of the bloat in place when moving from 32->64 bits. > > > > > > > > > > I'm interesting in deprecating 32-bit VM's at my employer at some > > > > > point. Doing this is going to require that 64-bit VM's have as > little > > > > > bloat as possible. Has there been any consideration of making the > mark > > > > > word and class pointer 32 bits in cases where the VM fits within > 4GB? > > > > > It seems like this would be a major win. A second benefit here is > that > > > > > the "add and shift" currently required on dereference of compressed > > > > > oops could be eliminated in cases where the VM fit inside 4GB. > > > > > > > > > > Dan > > > > > > > > > > > > > > > > > > > > > > > > > > From Vladimir.Kozlov at Sun.COM Mon May 5 17:22:56 2008 From: Vladimir.Kozlov at Sun.COM (Vladimir Kozlov) Date: Mon, 05 May 2008 17:22:56 -0700 Subject: compressed oops and 64-bit header words In-Reply-To: References: <481F55CF.2090102@sun.com> <481F5E11.9080803@sun.com> <481F8E9F.8000208@sun.com> Message-ID: <481FA4E0.7000603@sun.com> Dan, Thank you for the paper. I think, the benefit they have with the compressed header comes mostly from a compressed vtable pointer. Which in our VM corresponds to a klass pointer which is also compressed. So in this sense we also have compressed header. I can not say what the performance benefit we have now with compressed oops since the generated code for a klass pointer load/stores currently is not what we would like to have (and we are working to improve it). I doubt that the compressed markword will give big difference. But I may be wrong. Thanks, Vladimir Dan Grove wrote: > Hi Colleen- > > I'm not worried about the shift instruction - I agree that it's > unlikely to matter. What I am worried about is have the standard > object header have 2 64-bit words in (well, 1 64-bit word, 1 32-bit > word, and 32 bits of pad). > > What I'm worried about is the increase in memory footprint and its > impact on performance. I was pointed to > http://ieeexplore.ieee.org/iel5/9012/28612/01281667.pdf?arnumber=1281667 > , which (conveniently) breaks out the performance impact of > compressing the header versus compressing references versus both. > > So what I would really be interested would be a way to have both the > pointers/words in the header and the oops be 32 bits. I think this > would be a good win, when coupled with the extra registers when using > the 64-bit ABI. > > Dan > > On Mon, May 5, 2008 at 3:47 PM, Coleen Phillimore > wrote: >> Hi, >> It made sense when I first read it but in order to have 32 bit pointers in >> #3, I can't imagine not having to encode and decode them by some heap base >> in order to dereference these pointers, so the only difference between #2 >> and #3 is the shift instruction to get to 32G. We didn't believe that the >> shift causes much of a performance penalty so we didn't implement it this >> way. We would like to measure this at some point though, and if it is >> faster could add this mode fairly easily. >> >> thanks! >> Coleen >> >> >> >> Dan Grove wrote: >> >>> Thanks Colleen and Vladimir- >>> >>> What I'm wondering is whether there could be a third mode: >>> >>> 1. > 32GB - uses uncompressed pointers >>> 2. (something less than 4GB) < Xmx < 32GB - uses compressed pointers >>> (along with 64-bit mark word), 64-bit ABI >>> 3. whole app fits in 4GB - uses 32-bit pointers in heap, but 64-bit ABI. >>> >>> The idea here is that I'd prefer to pay no penalty over 32-bit when my >>> app runs in 64-bit mode and the app fits in 4GB of memory (my reason >>> for this is that I want to support our JNI libraries only in 64-bit >>> mode, and deprecate the 32-bit JNI libraries). >>> >>> Does this make any sense to you? >>> >>> Dan >>> >>> On Mon, May 5, 2008 at 12:20 PM, Coleen Phillimore - Sun Microsystems >>> wrote: >>> >>> >>>> Actually, we are using the gap for a field and array length in the code >>>> now, but the code Vladimir showed me makes the allocation code a lot >> cleaner >>>> for the instance field case. >>>> >>>> In the array case in 64 bits, compressing the _klass pointer into 32 >> bits >>>> allows us to move the _length field into the other 32 bits, which >> because of >>>> alignment saves 64 bits. There was a 32 bit alignment gap after the >> _length >>>> field, if not compressed with the klass pointer. >>>> >>>> The mark word can also contain a forwarding pointer used during GC, so >>>> can't be 32 bits. >>>> >>>> The compression that we use allows for 32G because we shift into the >> least >>>> significant bits - the algorithm is (ptr-heap_base)>>3. >>>> >>>> Coleen >>>> >>>> >>>> >>>> Vladimir Kozlov wrote: >>>> >>>> >>>> >>>>> Dan, >>>>> >>>>> Only the mark word is 64 bits. The klass pointer is 32-bits but >>>>> in the current implementation the gap after klass is not used. >>>>> >>>>> I am working on to use the gap for a field or array's length. >>>>> >>>>> The mark word may contain a 64-bits tread pointer (for Biased >> Locking). >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> Dan Grove wrote: >>>>> >>>>> >>>>> >>>>>> Hi- >>>>>> >>>>>> I talked some with the Nikolay Igotti about compressed oops in >>>>>> OpenJDK7. He tells me that the mark word and class pointer remain 64 >>>>>> bits when compressed oops are being used. It seems that this leaves >> a >>>>>> fair amount of the bloat in place when moving from 32->64 bits. >>>>>> >>>>>> I'm interesting in deprecating 32-bit VM's at my employer at some >>>>>> point. Doing this is going to require that 64-bit VM's have as >> little >>>>>> bloat as possible. Has there been any consideration of making the >> mark >>>>>> word and class pointer 32 bits in cases where the VM fits within >> 4GB? >>>>>> It seems like this would be a major win. A second benefit here is >> that >>>>>> the "add and shift" currently required on dereference of compressed >>>>>> oops could be eliminated in cases where the VM fit inside 4GB. >>>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>> >> From dgrove at google.com Wed May 7 23:28:53 2008 From: dgrove at google.com (Dan Grove) Date: Wed, 7 May 2008 23:28:53 -0700 Subject: compressed oops and 64-bit header words In-Reply-To: <481FA4E0.7000603@sun.com> References: <481F55CF.2090102@sun.com> <481F5E11.9080803@sun.com> <481F8E9F.8000208@sun.com> <481FA4E0.7000603@sun.com> Message-ID: Thanks Vladimir. I'm still worried about the memory bloat from having (effectively) 2 64-bit words in the object header, rather than 2 32-bit words. If we consider an average (non-array) object size around 30-40 bytes, this is a significant overhead. It seems that if users were willing to declare that they were running inside a 4GB virtual address space (and in my case, users would be willing to do in order to avoid memory bloat), we should be able to do this. On linux, I believe that if the process were running with a "ulimit -v XXXX" shell, we could make guarantees that all address would fit in 32 bits, even for a 64-bit VM. Do you agree that this would make sense? Dan 2008/5/5 Vladimir Kozlov : > Dan, > > Thank you for the paper. > I think, the benefit they have with the compressed header comes > mostly from a compressed vtable pointer. Which in our VM corresponds > to a klass pointer which is also compressed. > So in this sense we also have compressed header. > > I can not say what the performance benefit we have now with > compressed oops since the generated code for a klass pointer > load/stores currently is not what we would like to have > (and we are working to improve it). > > I doubt that the compressed markword will give big difference. > But I may be wrong. > > > > Thanks, > Vladimir > > Dan Grove wrote: > > > Hi Colleen- > > > > I'm not worried about the shift instruction - I agree that it's > > unlikely to matter. What I am worried about is have the standard > > object header have 2 64-bit words in (well, 1 64-bit word, 1 32-bit > > word, and 32 bits of pad). > > > > What I'm worried about is the increase in memory footprint and its > > impact on performance. I was pointed to > > http://ieeexplore.ieee.org/iel5/9012/28612/01281667.pdf?arnumber=1281667 > > , which (conveniently) breaks out the performance impact of > > compressing the header versus compressing references versus both. > > > > So what I would really be interested would be a way to have both the > > pointers/words in the header and the oops be 32 bits. I think this > > would be a good win, when coupled with the extra registers when using > > the 64-bit ABI. > > > > Dan > > > > On Mon, May 5, 2008 at 3:47 PM, Coleen Phillimore > > wrote: > > > > > Hi, > > > It made sense when I first read it but in order to have 32 bit pointers in > > > #3, I can't imagine not having to encode and decode them by some heap base > > > in order to dereference these pointers, so the only difference between #2 > > > and #3 is the shift instruction to get to 32G. We didn't believe that the > > > shift causes much of a performance penalty so we didn't implement it this > > > way. We would like to measure this at some point though, and if it is > > > faster could add this mode fairly easily. > > > > > > thanks! > > > Coleen > > > > > > > > > > > > Dan Grove wrote: > > > > > > > > > > Thanks Colleen and Vladimir- > > > > > > > > What I'm wondering is whether there could be a third mode: > > > > > > > > 1. > 32GB - uses uncompressed pointers > > > > 2. (something less than 4GB) < Xmx < 32GB - uses compressed pointers > > > > (along with 64-bit mark word), 64-bit ABI > > > > 3. whole app fits in 4GB - uses 32-bit pointers in heap, but 64-bit ABI. > > > > > > > > The idea here is that I'd prefer to pay no penalty over 32-bit when my > > > > app runs in 64-bit mode and the app fits in 4GB of memory (my reason > > > > for this is that I want to support our JNI libraries only in 64-bit > > > > mode, and deprecate the 32-bit JNI libraries). > > > > > > > > Does this make any sense to you? > > > > > > > > Dan > > > > > > > > On Mon, May 5, 2008 at 12:20 PM, Coleen Phillimore - Sun Microsystems > > > > wrote: > > > > > > > > > > > > > > > > > Actually, we are using the gap for a field and array length in the code > > > > > now, but the code Vladimir showed me makes the allocation code a lot > > > > > > > > > > > > cleaner > > > > > > > > > > > > for the instance field case. > > > > > > > > > > In the array case in 64 bits, compressing the _klass pointer into 32 > > > > > > > > > > > > bits > > > > > > > > > > > > allows us to move the _length field into the other 32 bits, which > > > > > > > > > > > > because of > > > > > > > > > > > > alignment saves 64 bits. There was a 32 bit alignment gap after the > > > > > > > > > > > > _length > > > > > > > > > > > > field, if not compressed with the klass pointer. > > > > > > > > > > The mark word can also contain a forwarding pointer used during GC, so > > > > > can't be 32 bits. > > > > > > > > > > The compression that we use allows for 32G because we shift into the > > > > > > > > > > > > least > > > > > > > > > > > > significant bits - the algorithm is (ptr-heap_base)>>3. > > > > > > > > > > Coleen > > > > > > > > > > > > > > > > > > > > Vladimir Kozlov wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > Dan, > > > > > > > > > > > > Only the mark word is 64 bits. The klass pointer is 32-bits but > > > > > > in the current implementation the gap after klass is not used. > > > > > > > > > > > > I am working on to use the gap for a field or array's length. > > > > > > > > > > > > The mark word may contain a 64-bits tread pointer (for Biased > > > > > > > > > > > > > > > > > > Locking). > > > > > > > > > > > > > > > > > > Thanks, > > > > > > Vladimir > > > > > > > > > > > > Dan Grove wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi- > > > > > > > > > > > > > > I talked some with the Nikolay Igotti about compressed oops in > > > > > > > OpenJDK7. He tells me that the mark word and class pointer remain 64 > > > > > > > bits when compressed oops are being used. It seems that this leaves > > > > > > > > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > > > > > fair amount of the bloat in place when moving from 32->64 bits. > > > > > > > > > > > > > > I'm interesting in deprecating 32-bit VM's at my employer at some > > > > > > > point. Doing this is going to require that 64-bit VM's have as > > > > > > > > > > > > > > > > > > > > > > > > > little > > > > > > > > > > > > > > > > > > > > > > > > > bloat as possible. Has there been any consideration of making the > > > > > > > > > > > > > > > > > > > > > > > > > mark > > > > > > > > > > > > > > > > > > > > > > > > > word and class pointer 32 bits in cases where the VM fits within > > > > > > > > > > > > > > > > > > > > > > > > > 4GB? > > > > > > > > > > > > > > > > > > > > > > > > > It seems like this would be a major win. A second benefit here is > > > > > > > > > > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > > > > > > > > the "add and shift" currently required on dereference of compressed > > > > > > > oops could be eliminated in cases where the VM fit inside 4GB. > > > > > > > > > > > > > > Dan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20080507/8dfd52ba/attachment.html From Vladimir.Kozlov at Sun.COM Thu May 8 08:12:40 2008 From: Vladimir.Kozlov at Sun.COM (Vladimir Kozlov) Date: Thu, 08 May 2008 08:12:40 -0700 Subject: compressed oops and 64-bit header words In-Reply-To: References: <481F55CF.2090102@sun.com> <481F5E11.9080803@sun.com> <481F8E9F.8000208@sun.com> <481FA4E0.7000603@sun.com> Message-ID: <48231868.803@sun.com> Dan, It is not 2 64-bits words, it is 1 and half :) since klass is 32-bits and we use other 32-bits for a field. So the overhead is only 4 bytes. Also don't forget that all objects are aligned to 8 bytes in the heap even in 32-bits VM. So the average overhead will be less. I want to be clear that it is not that we totally against your suggestion. It is resources we need to implement it which we don't have currently. On other hand, VM is open source now so you or your colleges can do it and help us all. Thanks, Vladimir Dan Grove wrote: > Thanks Vladimir. I'm still worried about the memory bloat from having > (effectively) 2 64-bit words in the object header, rather than 2 32-bit > words. If we consider an average (non-array) object size around 30-40 > bytes, this is a significant overhead. It seems that if users were > willing to declare that they were running inside a 4GB virtual address > space (and in my case, users would be willing to do in order to avoid > memory bloat), we should be able to do this. > > On linux, I believe that if the process were running with a "ulimit -v > XXXX" shell, we could make guarantees that all address would fit in 32 > bits, even for a 64-bit VM. Do you agree that this would make sense? > > Dan > > 2008/5/5 Vladimir Kozlov >: > > Dan, > > > > Thank you for the paper. > > I think, the benefit they have with the compressed header comes > > mostly from a compressed vtable pointer. Which in our VM corresponds > > to a klass pointer which is also compressed. > > So in this sense we also have compressed header. > > > > I can not say what the performance benefit we have now with > > compressed oops since the generated code for a klass pointer > > load/stores currently is not what we would like to have > > (and we are working to improve it). > > > > I doubt that the compressed markword will give big difference. > > But I may be wrong. > > > > > > > > Thanks, > > Vladimir > > > > Dan Grove wrote: > > > > > Hi Colleen- > > > > > > I'm not worried about the shift instruction - I agree that it's > > > unlikely to matter. What I am worried about is have the standard > > > object header have 2 64-bit words in (well, 1 64-bit word, 1 32-bit > > > word, and 32 bits of pad). > > > > > > What I'm worried about is the increase in memory footprint and its > > > impact on performance. I was pointed to > > > > http://ieeexplore.ieee.org/iel5/9012/28612/01281667.pdf?arnumber=1281667 > > > , which (conveniently) breaks out the performance impact of > > > compressing the header versus compressing references versus both. > > > > > > So what I would really be interested would be a way to have both the > > > pointers/words in the header and the oops be 32 bits. I think this > > > would be a good win, when coupled with the extra registers when using > > > the 64-bit ABI. > > > > > > Dan > > > > > > On Mon, May 5, 2008 at 3:47 PM, Coleen Phillimore > > > > wrote: > > > > > > > Hi, > > > > It made sense when I first read it but in order to have 32 bit > pointers in > > > > #3, I can't imagine not having to encode and decode them by some > heap base > > > > in order to dereference these pointers, so the only difference > between #2 > > > > and #3 is the shift instruction to get to 32G. We didn't believe > that the > > > > shift causes much of a performance penalty so we didn't implement > it this > > > > way. We would like to measure this at some point though, and if it is > > > > faster could add this mode fairly easily. > > > > > > > > thanks! > > > > Coleen > > > > > > > > > > > > > > > > Dan Grove wrote: > > > > > > > > > > > > > Thanks Colleen and Vladimir- > > > > > > > > > > What I'm wondering is whether there could be a third mode: > > > > > > > > > > 1. > 32GB - uses uncompressed pointers > > > > > 2. (something less than 4GB) < Xmx < 32GB - uses compressed > pointers > > > > > (along with 64-bit mark word), 64-bit ABI > > > > > 3. whole app fits in 4GB - uses 32-bit pointers in heap, but > 64-bit ABI. > > > > > > > > > > The idea here is that I'd prefer to pay no penalty over 32-bit > when my > > > > > app runs in 64-bit mode and the app fits in 4GB of memory (my > reason > > > > > for this is that I want to support our JNI libraries only in 64-bit > > > > > mode, and deprecate the 32-bit JNI libraries). > > > > > > > > > > Does this make any sense to you? > > > > > > > > > > Dan > > > > > > > > > > On Mon, May 5, 2008 at 12:20 PM, Coleen Phillimore - Sun > Microsystems > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Actually, we are using the gap for a field and array length > in the code > > > > > > now, but the code Vladimir showed me makes the allocation > code a lot > > > > > > > > > > > > > > > cleaner > > > > > > > > > > > > > > > for the instance field case. > > > > > > > > > > > > In the array case in 64 bits, compressing the _klass pointer > into 32 > > > > > > > > > > > > > > > bits > > > > > > > > > > > > > > > allows us to move the _length field into the other 32 bits, which > > > > > > > > > > > > > > > because of > > > > > > > > > > > > > > > alignment saves 64 bits. There was a 32 bit alignment gap > after the > > > > > > > > > > > > > > > _length > > > > > > > > > > > > > > > field, if not compressed with the klass pointer. > > > > > > > > > > > > The mark word can also contain a forwarding pointer used > during GC, so > > > > > > can't be 32 bits. > > > > > > > > > > > > The compression that we use allows for 32G because we shift > into the > > > > > > > > > > > > > > > least > > > > > > > > > > > > > > > significant bits - the algorithm is (ptr-heap_base)>>3. > > > > > > > > > > > > Coleen > > > > > > > > > > > > > > > > > > > > > > > > Vladimir Kozlov wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Dan, > > > > > > > > > > > > > > Only the mark word is 64 bits. The klass pointer is 32-bits but > > > > > > > in the current implementation the gap after klass is not used. > > > > > > > > > > > > > > I am working on to use the gap for a field or array's length. > > > > > > > > > > > > > > The mark word may contain a 64-bits tread pointer (for Biased > > > > > > > > > > > > > > > > > > > > > > Locking). > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > Vladimir > > > > > > > > > > > > > > Dan Grove wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi- > > > > > > > > > > > > > > > > I talked some with the Nikolay Igotti about compressed > oops in > > > > > > > > OpenJDK7. He tells me that the mark word and class > pointer remain 64 > > > > > > > > bits when compressed oops are being used. It seems that > this leaves > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > fair amount of the bloat in place when moving from 32->64 > bits. > > > > > > > > > > > > > > > > I'm interesting in deprecating 32-bit VM's at my employer > at some > > > > > > > > point. Doing this is going to require that 64-bit VM's > have as > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > little > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > bloat as possible. Has there been any consideration of > making the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > mark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > word and class pointer 32 bits in cases where the VM fits > within > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4GB? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It seems like this would be a major win. A second benefit > here is > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > the "add and shift" currently required on dereference of > compressed > > > > > > > > oops could be eliminated in cases where the VM fit inside > 4GB. > > > > > > > > > > > > > > > > Dan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From dgrove at google.com Thu May 8 09:22:26 2008 From: dgrove at google.com (Dan Grove) Date: Thu, 8 May 2008 09:22:26 -0700 Subject: compressed oops and 64-bit header words In-Reply-To: <48231868.803@sun.com> References: <481F55CF.2090102@sun.com> <481F5E11.9080803@sun.com> <481F8E9F.8000208@sun.com> <481FA4E0.7000603@sun.com> <48231868.803@sun.com> Message-ID: Thanks Vladimir - I didn't realize that the extra 32 bits were being used for a field. This is work that we're considering doing - mostly, I wanted to hear feedback, and find out whether you were already doing this. So the real question from my standpoint is what we're missing when we think about this, and whether it's viable at all. Dan On Thu, May 8, 2008 at 8:12 AM, Vladimir Kozlov wrote: > Dan, > > It is not 2 64-bits words, it is 1 and half :) > since klass is 32-bits and we use other 32-bits for a field. > So the overhead is only 4 bytes. Also don't forget that > all objects are aligned to 8 bytes in the heap even > in 32-bits VM. So the average overhead will be less. > > I want to be clear that it is not that we totally against > your suggestion. It is resources we need to implement it > which we don't have currently. > On other hand, VM is open source now so you or your colleges > can do it and help us all. > > Thanks, > Vladimir > > Dan Grove wrote: >> >> Thanks Vladimir. I'm still worried about the memory bloat from having >> (effectively) 2 64-bit words in the object header, rather than 2 32-bit >> words. If we consider an average (non-array) object size around 30-40 bytes, >> this is a significant overhead. It seems that if users were willing to >> declare that they were running inside a 4GB virtual address space (and in my >> case, users would be willing to do in order to avoid memory bloat), we >> should be able to do this. >> >> On linux, I believe that if the process were running with a "ulimit -v >> XXXX" shell, we could make guarantees that all address would fit in 32 bits, >> even for a 64-bit VM. Do you agree that this would make sense? >> >> Dan >> >> 2008/5/5 Vladimir Kozlov > >: >> > Dan, >> > >> > Thank you for the paper. >> > I think, the benefit they have with the compressed header comes >> > mostly from a compressed vtable pointer. Which in our VM corresponds >> > to a klass pointer which is also compressed. >> > So in this sense we also have compressed header. >> > >> > I can not say what the performance benefit we have now with >> > compressed oops since the generated code for a klass pointer >> > load/stores currently is not what we would like to have >> > (and we are working to improve it). >> > >> > I doubt that the compressed markword will give big difference. >> > But I may be wrong. >> > >> > >> > >> > Thanks, >> > Vladimir >> > >> > Dan Grove wrote: >> > >> > > Hi Colleen- >> > > >> > > I'm not worried about the shift instruction - I agree that it's >> > > unlikely to matter. What I am worried about is have the standard >> > > object header have 2 64-bit words in (well, 1 64-bit word, 1 32-bit >> > > word, and 32 bits of pad). >> > > >> > > What I'm worried about is the increase in memory footprint and its >> > > impact on performance. I was pointed to >> > > >> http://ieeexplore.ieee.org/iel5/9012/28612/01281667.pdf?arnumber=1281667 >> > > , which (conveniently) breaks out the performance impact of >> > > compressing the header versus compressing references versus both. >> > > >> > > So what I would really be interested would be a way to have both the >> > > pointers/words in the header and the oops be 32 bits. I think this >> > > would be a good win, when coupled with the extra registers when using >> > > the 64-bit ABI. >> > > >> > > Dan >> > > >> > > On Mon, May 5, 2008 at 3:47 PM, Coleen Phillimore >> > > > wrote: >> > > >> > > > Hi, >> > > > It made sense when I first read it but in order to have 32 bit >> pointers in >> > > > #3, I can't imagine not having to encode and decode them by some >> heap base >> > > > in order to dereference these pointers, so the only difference >> between #2 >> > > > and #3 is the shift instruction to get to 32G. We didn't believe >> that the >> > > > shift causes much of a performance penalty so we didn't implement >> it this >> > > > way. We would like to measure this at some point though, and if it >> is >> > > > faster could add this mode fairly easily. >> > > > >> > > > thanks! >> > > > Coleen >> > > > >> > > > >> > > > >> > > > Dan Grove wrote: >> > > > >> > > > >> > > > > Thanks Colleen and Vladimir- >> > > > > >> > > > > What I'm wondering is whether there could be a third mode: >> > > > > >> > > > > 1. > 32GB - uses uncompressed pointers >> > > > > 2. (something less than 4GB) < Xmx < 32GB - uses compressed >> pointers >> > > > > (along with 64-bit mark word), 64-bit ABI >> > > > > 3. whole app fits in 4GB - uses 32-bit pointers in heap, but >> 64-bit ABI. >> > > > > >> > > > > The idea here is that I'd prefer to pay no penalty over 32-bit >> when my >> > > > > app runs in 64-bit mode and the app fits in 4GB of memory (my >> reason >> > > > > for this is that I want to support our JNI libraries only in >> 64-bit >> > > > > mode, and deprecate the 32-bit JNI libraries). >> > > > > >> > > > > Does this make any sense to you? >> > > > > >> > > > > Dan >> > > > > >> > > > > On Mon, May 5, 2008 at 12:20 PM, Coleen Phillimore - Sun >> Microsystems >> > > > > > >> wrote: >> > > > > >> > > > > >> > > > > >> > > > > > Actually, we are using the gap for a field and array length in >> the code >> > > > > > now, but the code Vladimir showed me makes the allocation code >> a lot >> > > > > > >> > > > > >> > > > cleaner >> > > > >> > > > > >> > > > > > for the instance field case. >> > > > > > >> > > > > > In the array case in 64 bits, compressing the _klass pointer >> into 32 >> > > > > > >> > > > > >> > > > bits >> > > > >> > > > > >> > > > > > allows us to move the _length field into the other 32 bits, >> which >> > > > > > >> > > > > >> > > > because of >> > > > >> > > > > >> > > > > > alignment saves 64 bits. There was a 32 bit alignment gap after >> the >> > > > > > >> > > > > >> > > > _length >> > > > >> > > > > >> > > > > > field, if not compressed with the klass pointer. >> > > > > > >> > > > > > The mark word can also contain a forwarding pointer used during >> GC, so >> > > > > > can't be 32 bits. >> > > > > > >> > > > > > The compression that we use allows for 32G because we shift >> into the >> > > > > > >> > > > > >> > > > least >> > > > >> > > > > >> > > > > > significant bits - the algorithm is (ptr-heap_base)>>3. >> > > > > > >> > > > > > Coleen >> > > > > > >> > > > > > >> > > > > > >> > > > > > Vladimir Kozlov wrote: >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > > Dan, >> > > > > > > >> > > > > > > Only the mark word is 64 bits. The klass pointer is 32-bits >> but >> > > > > > > in the current implementation the gap after klass is not >> used. >> > > > > > > >> > > > > > > I am working on to use the gap for a field or array's length. >> > > > > > > >> > > > > > > The mark word may contain a 64-bits tread pointer (for Biased >> > > > > > > >> > > > > > >> > > > > >> > > > Locking). >> > > > >> > > > > >> > > > > > >> > > > > > > Thanks, >> > > > > > > Vladimir >> > > > > > > >> > > > > > > Dan Grove wrote: >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > Hi- >> > > > > > > > >> > > > > > > > I talked some with the Nikolay Igotti about compressed oops >> in >> > > > > > > > OpenJDK7. He tells me that the mark word and class pointer >> remain 64 >> > > > > > > > bits when compressed oops are being used. It seems that >> this leaves >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > a >> > > > >> > > > > >> > > > > > >> > > > > > > >> > > > > > > > fair amount of the bloat in place when moving from 32->64 >> bits. >> > > > > > > > >> > > > > > > > I'm interesting in deprecating 32-bit VM's at my employer >> at some >> > > > > > > > point. Doing this is going to require that 64-bit VM's have >> as >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > little >> > > > >> > > > > >> > > > > > >> > > > > > > >> > > > > > > > bloat as possible. Has there been any consideration of >> making the >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > mark >> > > > >> > > > > >> > > > > > >> > > > > > > >> > > > > > > > word and class pointer 32 bits in cases where the VM fits >> within >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > 4GB? >> > > > >> > > > > >> > > > > > >> > > > > > > >> > > > > > > > It seems like this would be a major win. A second benefit >> here is >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > that >> > > > >> > > > > >> > > > > > >> > > > > > > >> > > > > > > > the "add and shift" currently required on dereference of >> compressed >> > > > > > > > oops could be eliminated in cases where the VM fit inside >> 4GB. >> > > > > > > > >> > > > > > > > Dan >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > >> > > >> > >> > From linuxhippy at gmail.com Fri May 9 03:54:59 2008 From: linuxhippy at gmail.com (Clemens Eisserer) Date: Fri, 9 May 2008 12:54:59 +0200 Subject: compressed oops and 64-bit header words In-Reply-To: References: <481F55CF.2090102@sun.com> <481F5E11.9080803@sun.com> <481F8E9F.8000208@sun.com> <481FA4E0.7000603@sun.com> <48231868.803@sun.com> Message-ID: <194f62550805090354u1d23depb5b40428a8d86749@mail.gmail.com> > Thanks Vladimir - I didn't realize that the extra 32 bits were being > used for a field. This is work that we're considering doing - mostly, > I wanted to hear feedback, and find out whether you were already doing > this. > > So the real question from my standpoint is what we're missing when we > think about this, and whether it's viable at all. Wouldn't it be great if google would dedicate resources to Hotspot's development? Thanks god hotspot does not have a public interface, who knows wether it would lead to another Android ;) lg Clemens > > Dan > > On Thu, May 8, 2008 at 8:12 AM, Vladimir Kozlov wrote: >> Dan, >> >> It is not 2 64-bits words, it is 1 and half :) >> since klass is 32-bits and we use other 32-bits for a field. >> So the overhead is only 4 bytes. Also don't forget that >> all objects are aligned to 8 bytes in the heap even >> in 32-bits VM. So the average overhead will be less. >> >> I want to be clear that it is not that we totally against >> your suggestion. It is resources we need to implement it >> which we don't have currently. >> On other hand, VM is open source now so you or your colleges >> can do it and help us all. >> >> Thanks, >> Vladimir >> >> Dan Grove wrote: >>> >>> Thanks Vladimir. I'm still worried about the memory bloat from having >>> (effectively) 2 64-bit words in the object header, rather than 2 32-bit >>> words. If we consider an average (non-array) object size around 30-40 bytes, >>> this is a significant overhead. It seems that if users were willing to >>> declare that they were running inside a 4GB virtual address space (and in my >>> case, users would be willing to do in order to avoid memory bloat), we >>> should be able to do this. >>> >>> On linux, I believe that if the process were running with a "ulimit -v >>> XXXX" shell, we could make guarantees that all address would fit in 32 bits, >>> even for a 64-bit VM. Do you agree that this would make sense? >>> >>> Dan >>> >>> 2008/5/5 Vladimir Kozlov >> >: >>> > Dan, >>> > >>> > Thank you for the paper. >>> > I think, the benefit they have with the compressed header comes >>> > mostly from a compressed vtable pointer. Which in our VM corresponds >>> > to a klass pointer which is also compressed. >>> > So in this sense we also have compressed header. >>> > >>> > I can not say what the performance benefit we have now with >>> > compressed oops since the generated code for a klass pointer >>> > load/stores currently is not what we would like to have >>> > (and we are working to improve it). >>> > >>> > I doubt that the compressed markword will give big difference. >>> > But I may be wrong. >>> > >>> > >>> > >>> > Thanks, >>> > Vladimir >>> > >>> > Dan Grove wrote: >>> > >>> > > Hi Colleen- >>> > > >>> > > I'm not worried about the shift instruction - I agree that it's >>> > > unlikely to matter. What I am worried about is have the standard >>> > > object header have 2 64-bit words in (well, 1 64-bit word, 1 32-bit >>> > > word, and 32 bits of pad). >>> > > >>> > > What I'm worried about is the increase in memory footprint and its >>> > > impact on performance. I was pointed to >>> > > >>> http://ieeexplore.ieee.org/iel5/9012/28612/01281667.pdf?arnumber=1281667 >>> > > , which (conveniently) breaks out the performance impact of >>> > > compressing the header versus compressing references versus both. >>> > > >>> > > So what I would really be interested would be a way to have both the >>> > > pointers/words in the header and the oops be 32 bits. I think this >>> > > would be a good win, when coupled with the extra registers when using >>> > > the 64-bit ABI. >>> > > >>> > > Dan >>> > > >>> > > On Mon, May 5, 2008 at 3:47 PM, Coleen Phillimore >>> > > > wrote: >>> > > >>> > > > Hi, >>> > > > It made sense when I first read it but in order to have 32 bit >>> pointers in >>> > > > #3, I can't imagine not having to encode and decode them by some >>> heap base >>> > > > in order to dereference these pointers, so the only difference >>> between #2 >>> > > > and #3 is the shift instruction to get to 32G. We didn't believe >>> that the >>> > > > shift causes much of a performance penalty so we didn't implement >>> it this >>> > > > way. We would like to measure this at some point though, and if it >>> is >>> > > > faster could add this mode fairly easily. >>> > > > >>> > > > thanks! >>> > > > Coleen >>> > > > >>> > > > >>> > > > >>> > > > Dan Grove wrote: >>> > > > >>> > > > >>> > > > > Thanks Colleen and Vladimir- >>> > > > > >>> > > > > What I'm wondering is whether there could be a third mode: >>> > > > > >>> > > > > 1. > 32GB - uses uncompressed pointers >>> > > > > 2. (something less than 4GB) < Xmx < 32GB - uses compressed >>> pointers >>> > > > > (along with 64-bit mark word), 64-bit ABI >>> > > > > 3. whole app fits in 4GB - uses 32-bit pointers in heap, but >>> 64-bit ABI. >>> > > > > >>> > > > > The idea here is that I'd prefer to pay no penalty over 32-bit >>> when my >>> > > > > app runs in 64-bit mode and the app fits in 4GB of memory (my >>> reason >>> > > > > for this is that I want to support our JNI libraries only in >>> 64-bit >>> > > > > mode, and deprecate the 32-bit JNI libraries). >>> > > > > >>> > > > > Does this make any sense to you? >>> > > > > >>> > > > > Dan >>> > > > > >>> > > > > On Mon, May 5, 2008 at 12:20 PM, Coleen Phillimore - Sun >>> Microsystems >>> > > > > > >>> wrote: >>> > > > > >>> > > > > >>> > > > > >>> > > > > > Actually, we are using the gap for a field and array length in >>> the code >>> > > > > > now, but the code Vladimir showed me makes the allocation code >>> a lot >>> > > > > > >>> > > > > >>> > > > cleaner >>> > > > >>> > > > > >>> > > > > > for the instance field case. >>> > > > > > >>> > > > > > In the array case in 64 bits, compressing the _klass pointer >>> into 32 >>> > > > > > >>> > > > > >>> > > > bits >>> > > > >>> > > > > >>> > > > > > allows us to move the _length field into the other 32 bits, >>> which >>> > > > > > >>> > > > > >>> > > > because of >>> > > > >>> > > > > >>> > > > > > alignment saves 64 bits. There was a 32 bit alignment gap after >>> the >>> > > > > > >>> > > > > >>> > > > _length >>> > > > >>> > > > > >>> > > > > > field, if not compressed with the klass pointer. >>> > > > > > >>> > > > > > The mark word can also contain a forwarding pointer used during >>> GC, so >>> > > > > > can't be 32 bits. >>> > > > > > >>> > > > > > The compression that we use allows for 32G because we shift >>> into the >>> > > > > > >>> > > > > >>> > > > least >>> > > > >>> > > > > >>> > > > > > significant bits - the algorithm is (ptr-heap_base)>>3. >>> > > > > > >>> > > > > > Coleen >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > Vladimir Kozlov wrote: >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > > Dan, >>> > > > > > > >>> > > > > > > Only the mark word is 64 bits. The klass pointer is 32-bits >>> but >>> > > > > > > in the current implementation the gap after klass is not >>> used. >>> > > > > > > >>> > > > > > > I am working on to use the gap for a field or array's length. >>> > > > > > > >>> > > > > > > The mark word may contain a 64-bits tread pointer (for Biased >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > Locking). >>> > > > >>> > > > > >>> > > > > > >>> > > > > > > Thanks, >>> > > > > > > Vladimir >>> > > > > > > >>> > > > > > > Dan Grove wrote: >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > > Hi- >>> > > > > > > > >>> > > > > > > > I talked some with the Nikolay Igotti about compressed oops >>> in >>> > > > > > > > OpenJDK7. He tells me that the mark word and class pointer >>> remain 64 >>> > > > > > > > bits when compressed oops are being used. It seems that >>> this leaves >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > a >>> > > > >>> > > > > >>> > > > > > >>> > > > > > > >>> > > > > > > > fair amount of the bloat in place when moving from 32->64 >>> bits. >>> > > > > > > > >>> > > > > > > > I'm interesting in deprecating 32-bit VM's at my employer >>> at some >>> > > > > > > > point. Doing this is going to require that 64-bit VM's have >>> as >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > little >>> > > > >>> > > > > >>> > > > > > >>> > > > > > > >>> > > > > > > > bloat as possible. Has there been any consideration of >>> making the >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > mark >>> > > > >>> > > > > >>> > > > > > >>> > > > > > > >>> > > > > > > > word and class pointer 32 bits in cases where the VM fits >>> within >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > 4GB? >>> > > > >>> > > > > >>> > > > > > >>> > > > > > > >>> > > > > > > > It seems like this would be a major win. A second benefit >>> here is >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > that >>> > > > >>> > > > > >>> > > > > > >>> > > > > > > >>> > > > > > > > the "add and shift" currently required on dereference of >>> compressed >>> > > > > > > > oops could be eliminated in cases where the VM fit inside >>> 4GB. >>> > > > > > > > >>> > > > > > > > Dan >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > > >>> > > >>> > >>> >> > From Coleen.Phillimore at Sun.COM Fri May 9 06:57:44 2008 From: Coleen.Phillimore at Sun.COM (Coleen Phillimore) Date: Fri, 09 May 2008 09:57:44 -0400 Subject: compressed oops and 64-bit header words In-Reply-To: References: <481F55CF.2090102@sun.com> <481F5E11.9080803@sun.com> <481F8E9F.8000208@sun.com> <481FA4E0.7000603@sun.com> <48231868.803@sun.com> Message-ID: <48245858.9000402@sun.com> Dan, I think what you're envisioning is a bit different from what we did with compressed oops. What we did was for Java heap sizes 32G or less, we'd keep the base of the Java heap in order to compress pointers in the Java heap into 32 bits. The VM executable still occupies a 64 bit address space, so all other pointers in process are 64 bit wide. We only compress when storing into Java heap but need to decompress coming out of Java heap, in order to use them. The mark word, which is the first word of the header may contain pointers to the stack in the case of locks or to C heap in the case of biased locking (I believe it's the Thread* pointer). The mark word also contains forwarding pointers to be used during GC. These could be encoded and decoded to fit into 32 bits since they're heap pointers but I think that would make GC really really slow. I'm working on some code that encodes the FreeChunk size into the 64 bits in the header mark word for concurrent mark sweep gc. So we're using that mark word for a lot of things. In 32 bits, we've pretty much tapped that word out for bits. If you want to use 32 bit pointers within a 64 bit ABI, you could hack the linker to only load the process in the 32 bit address space and then extend the pointers to 64 bit when needed when going to the 64 bit ABI. I've worked on such a system and hope nobody yells at me but my opinion was that it was a god-awful mess. google: "xtaso". Thanks, Coleen Dan Grove wrote: > Thanks Vladimir - I didn't realize that the extra 32 bits were being > used for a field. This is work that we're considering doing - mostly, > I wanted to hear feedback, and find out whether you were already doing > this. > > So the real question from my standpoint is what we're missing when we > think about this, and whether it's viable at all. > > Dan > > On Thu, May 8, 2008 at 8:12 AM, Vladimir Kozlov wrote: > >> Dan, >> >> It is not 2 64-bits words, it is 1 and half :) >> since klass is 32-bits and we use other 32-bits for a field. >> So the overhead is only 4 bytes. Also don't forget that >> all objects are aligned to 8 bytes in the heap even >> in 32-bits VM. So the average overhead will be less. >> >> I want to be clear that it is not that we totally against >> your suggestion. It is resources we need to implement it >> which we don't have currently. >> On other hand, VM is open source now so you or your colleges >> can do it and help us all. >> >> Thanks, >> Vladimir >> >> Dan Grove wrote: >> >>> Thanks Vladimir. I'm still worried about the memory bloat from having >>> (effectively) 2 64-bit words in the object header, rather than 2 32-bit >>> words. If we consider an average (non-array) object size around 30-40 bytes, >>> this is a significant overhead. It seems that if users were willing to >>> declare that they were running inside a 4GB virtual address space (and in my >>> case, users would be willing to do in order to avoid memory bloat), we >>> should be able to do this. >>> >>> On linux, I believe that if the process were running with a "ulimit -v >>> XXXX" shell, we could make guarantees that all address would fit in 32 bits, >>> even for a 64-bit VM. Do you agree that this would make sense? >>> >>> Dan >>> >>> 2008/5/5 Vladimir Kozlov >> >: >>> > Dan, >>> > >>> > Thank you for the paper. >>> > I think, the benefit they have with the compressed header comes >>> > mostly from a compressed vtable pointer. Which in our VM corresponds >>> > to a klass pointer which is also compressed. >>> > So in this sense we also have compressed header. >>> > >>> > I can not say what the performance benefit we have now with >>> > compressed oops since the generated code for a klass pointer >>> > load/stores currently is not what we would like to have >>> > (and we are working to improve it). >>> > >>> > I doubt that the compressed markword will give big difference. >>> > But I may be wrong. >>> > >>> > >>> > >>> > Thanks, >>> > Vladimir >>> > >>> > Dan Grove wrote: >>> > >>> > > Hi Colleen- >>> > > >>> > > I'm not worried about the shift instruction - I agree that it's >>> > > unlikely to matter. What I am worried about is have the standard >>> > > object header have 2 64-bit words in (well, 1 64-bit word, 1 32-bit >>> > > word, and 32 bits of pad). >>> > > >>> > > What I'm worried about is the increase in memory footprint and its >>> > > impact on performance. I was pointed to >>> > > >>> http://ieeexplore.ieee.org/iel5/9012/28612/01281667.pdf?arnumber=1281667 >>> > > , which (conveniently) breaks out the performance impact of >>> > > compressing the header versus compressing references versus both. >>> > > >>> > > So what I would really be interested would be a way to have both the >>> > > pointers/words in the header and the oops be 32 bits. I think this >>> > > would be a good win, when coupled with the extra registers when using >>> > > the 64-bit ABI. >>> > > >>> > > Dan >>> > > >>> > > On Mon, May 5, 2008 at 3:47 PM, Coleen Phillimore >>> > > > wrote: >>> > > >>> > > > Hi, >>> > > > It made sense when I first read it but in order to have 32 bit >>> pointers in >>> > > > #3, I can't imagine not having to encode and decode them by some >>> heap base >>> > > > in order to dereference these pointers, so the only difference >>> between #2 >>> > > > and #3 is the shift instruction to get to 32G. We didn't believe >>> that the >>> > > > shift causes much of a performance penalty so we didn't implement >>> it this >>> > > > way. We would like to measure this at some point though, and if it >>> is >>> > > > faster could add this mode fairly easily. >>> > > > >>> > > > thanks! >>> > > > Coleen >>> > > > >>> > > > >>> > > > >>> > > > Dan Grove wrote: >>> > > > >>> > > > >>> > > > > Thanks Colleen and Vladimir- >>> > > > > >>> > > > > What I'm wondering is whether there could be a third mode: >>> > > > > >>> > > > > 1. > 32GB - uses uncompressed pointers >>> > > > > 2. (something less than 4GB) < Xmx < 32GB - uses compressed >>> pointers >>> > > > > (along with 64-bit mark word), 64-bit ABI >>> > > > > 3. whole app fits in 4GB - uses 32-bit pointers in heap, but >>> 64-bit ABI. >>> > > > > >>> > > > > The idea here is that I'd prefer to pay no penalty over 32-bit >>> when my >>> > > > > app runs in 64-bit mode and the app fits in 4GB of memory (my >>> reason >>> > > > > for this is that I want to support our JNI libraries only in >>> 64-bit >>> > > > > mode, and deprecate the 32-bit JNI libraries). >>> > > > > >>> > > > > Does this make any sense to you? >>> > > > > >>> > > > > Dan >>> > > > > >>> > > > > On Mon, May 5, 2008 at 12:20 PM, Coleen Phillimore - Sun >>> Microsystems >>> > > > > > >>> wrote: >>> > > > > >>> > > > > >>> > > > > >>> > > > > > Actually, we are using the gap for a field and array length in >>> the code >>> > > > > > now, but the code Vladimir showed me makes the allocation code >>> a lot >>> > > > > > >>> > > > > >>> > > > cleaner >>> > > > >>> > > > > >>> > > > > > for the instance field case. >>> > > > > > >>> > > > > > In the array case in 64 bits, compressing the _klass pointer >>> into 32 >>> > > > > > >>> > > > > >>> > > > bits >>> > > > >>> > > > > >>> > > > > > allows us to move the _length field into the other 32 bits, >>> which >>> > > > > > >>> > > > > >>> > > > because of >>> > > > >>> > > > > >>> > > > > > alignment saves 64 bits. There was a 32 bit alignment gap after >>> the >>> > > > > > >>> > > > > >>> > > > _length >>> > > > >>> > > > > >>> > > > > > field, if not compressed with the klass pointer. >>> > > > > > >>> > > > > > The mark word can also contain a forwarding pointer used during >>> GC, so >>> > > > > > can't be 32 bits. >>> > > > > > >>> > > > > > The compression that we use allows for 32G because we shift >>> into the >>> > > > > > >>> > > > > >>> > > > least >>> > > > >>> > > > > >>> > > > > > significant bits - the algorithm is (ptr-heap_base)>>3. >>> > > > > > >>> > > > > > Coleen >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > Vladimir Kozlov wrote: >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > > Dan, >>> > > > > > > >>> > > > > > > Only the mark word is 64 bits. The klass pointer is 32-bits >>> but >>> > > > > > > in the current implementation the gap after klass is not >>> used. >>> > > > > > > >>> > > > > > > I am working on to use the gap for a field or array's length. >>> > > > > > > >>> > > > > > > The mark word may contain a 64-bits tread pointer (for Biased >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > Locking). >>> > > > >>> > > > > >>> > > > > > >>> > > > > > > Thanks, >>> > > > > > > Vladimir >>> > > > > > > >>> > > > > > > Dan Grove wrote: >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > > Hi- >>> > > > > > > > >>> > > > > > > > I talked some with the Nikolay Igotti about compressed oops >>> in >>> > > > > > > > OpenJDK7. He tells me that the mark word and class pointer >>> remain 64 >>> > > > > > > > bits when compressed oops are being used. It seems that >>> this leaves >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > a >>> > > > >>> > > > > >>> > > > > > >>> > > > > > > >>> > > > > > > > fair amount of the bloat in place when moving from 32->64 >>> bits. >>> > > > > > > > >>> > > > > > > > I'm interesting in deprecating 32-bit VM's at my employer >>> at some >>> > > > > > > > point. Doing this is going to require that 64-bit VM's have >>> as >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > little >>> > > > >>> > > > > >>> > > > > > >>> > > > > > > >>> > > > > > > > bloat as possible. Has there been any consideration of >>> making the >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > mark >>> > > > >>> > > > > >>> > > > > > >>> > > > > > > >>> > > > > > > > word and class pointer 32 bits in cases where the VM fits >>> within >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > 4GB? >>> > > > >>> > > > > >>> > > > > > >>> > > > > > > >>> > > > > > > > It seems like this would be a major win. A second benefit >>> here is >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > that >>> > > > >>> > > > > >>> > > > > > >>> > > > > > > >>> > > > > > > > the "add and shift" currently required on dereference of >>> compressed >>> > > > > > > > oops could be eliminated in cases where the VM fit inside >>> 4GB. >>> > > > > > > > >>> > > > > > > > Dan >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > > >>> > > >>> > >>> >>> From Vladimir.Kozlov at Sun.COM Wed May 14 20:53:07 2008 From: Vladimir.Kozlov at Sun.COM (Vladimir Kozlov) Date: Wed, 14 May 2008 20:53:07 -0700 Subject: Request for reviews (L): 6695810: null oop passed to encode_heap_oop_not_null Message-ID: <482BB3A3.4080102@sun.com> Note: diffs contains changes (in node.cpp and c2_globals.hpp) for 6701887 which I will push first. http://webrev.invokedynamic.info/kvn/6695810/index.html Fixed 6695810: null oop passed to encode_heap_oop_not_null These changes include fixes for problems found in Escape Analysis and Compressed Oops implementations. Also they include additional optimizations for Compressed Oops: - use the 32-bits gap after klass in a object for a narrow oop field, - use the 32-bits gap after klass in a object for boxing objects value (except Long and Double), - use heapOopSize for instanceKlass::_nonstatic_field_size value instead of wordSize, - add LoadNKlass and CMoveN nodes and use CmpN and ConN nodes and add correspondent platform specific assembler instructions to generate narrow oops (32-bits) compares for oop type and oop NULL checks. Reviewed by: Fix verified (y/n): y, failed tests and generated code Other testing: JPRT, CTW, nsk tests, refworkload From Keith.McGuigan at Sun.COM Wed May 21 08:02:58 2008 From: Keith.McGuigan at Sun.COM (Keith McGuigan) Date: Wed, 21 May 2008 11:02:58 -0400 Subject: setting of globals values dependent on release Message-ID: <483439A2.3060308@sun.com> Hi, I thought we have some code in the VM which will set some global variable values to different values depending on which release we're running in (jdk7 vs. jdk6). Can someone point me to that code (or just give me the name of a variable which uses this)? I can't seem to find it on my own... Thanks. -- - Keith From Paul.Hohensee at Sun.COM Wed May 21 08:06:10 2008 From: Paul.Hohensee at Sun.COM (Paul Hohensee) Date: Wed, 21 May 2008 11:06:10 -0400 Subject: setting of globals values dependent on release In-Reply-To: <483439A2.3060308@sun.com> References: <483439A2.3060308@sun.com> Message-ID: <48343A62.2060702@sun.com> I don't think we ever implemented it. Xiaobin's the RE, I think (can't remember the bugid). So far, there's no difference between jdk6 and jdk7 as far as the vm's concerned. Have you found something? Paul Keith McGuigan wrote: > Hi, > > I thought we have some code in the VM which will set some global > variable values to different values depending on which release we're > running in (jdk7 vs. jdk6). Can someone point me to that code (or > just give me the name of a variable which uses this)? I can't seem to > find it on my own... > > Thanks. > > -- > - Keith From Keith.McGuigan at Sun.COM Wed May 21 08:11:05 2008 From: Keith.McGuigan at Sun.COM (Keith McGuigan) Date: Wed, 21 May 2008 11:11:05 -0400 Subject: setting of globals values dependent on release In-Reply-To: <48343A62.2060702@sun.com> References: <483439A2.3060308@sun.com> <48343A62.2060702@sun.com> Message-ID: <48343B89.4020907@sun.com> Paul Hohensee wrote: > I don't think we ever implemented it. Xiaobin's the RE, I think (can't > remember the bugid). So far, there's no difference between jdk6 and > jdk7 as far as the vm's concerned. Have you found something? Oh ok... I remember seeing some discussion about it, and I thought it was in. I guess not. Yes, we need to allow/disallow classfile version 51 for jdk7/jdk6 respectively. -- - Keith From Paul.Hohensee at Sun.COM Wed May 21 08:22:44 2008 From: Paul.Hohensee at Sun.COM (Paul Hohensee) Date: Wed, 21 May 2008 11:22:44 -0400 Subject: setting of globals values dependent on release In-Reply-To: <48343B89.4020907@sun.com> References: <483439A2.3060308@sun.com> <48343A62.2060702@sun.com> <48343B89.4020907@sun.com> Message-ID: <48343E44.7090001@sun.com> If you want to take over from Xiaobin, feel free. He's committed to core library work for the next few months. Paul Keith McGuigan wrote: > Paul Hohensee wrote: >> I don't think we ever implemented it. Xiaobin's the RE, I think (can't >> remember the bugid). So far, there's no difference between jdk6 and >> jdk7 as far as the vm's concerned. Have you found something? > > Oh ok... I remember seeing some discussion about it, and I thought it > was in. I guess not. > > Yes, we need to allow/disallow classfile version 51 for jdk7/jdk6 > respectively. > > -- > - Keith From Keith.McGuigan at Sun.COM Wed May 21 08:50:38 2008 From: Keith.McGuigan at Sun.COM (Keith McGuigan) Date: Wed, 21 May 2008 11:50:38 -0400 Subject: setting of globals values dependent on release In-Reply-To: <48343A62.2060702@sun.com> References: <483439A2.3060308@sun.com> <48343A62.2060702@sun.com> Message-ID: <483444CE.3040001@sun.com> Ok, at least I found where we keep the JDK version, in the (suprise) JDK_Version class in java.hpp. That has all the info I'll need and I'll use that. So I'm all set now, thanks! -- - Keith From volker.simonis at gmail.com Wed May 21 10:01:08 2008 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 21 May 2008 19:01:08 +0200 Subject: setting of globals values dependent on release In-Reply-To: <483444CE.3040001@sun.com> References: <483439A2.3060308@sun.com> <48343A62.2060702@sun.com> <483444CE.3040001@sun.com> Message-ID: But as far as I can see, the highest supported class file version in the VM is defined as a macro in classFileParser.cpp: #define JAVA_MAX_SUPPORTED_VERSION 50 So the JDK_Version class will not help in this case. Or do you plan to replace the macro by a global variable? The question about the highest class file version a VM is allowed to support is interesting, because I only found the following information in the Java VM Specification (2ed ed.): "Only Sun can specify what range of (Class File Format) versions a Java virtual machine implementation conforming to a certain release level of the Java platform may support." However I could not find a place where SUN has explicitely specified these versions (except in the implementation of the HotSpot of course, which is quite informal). Furthermore, this requirement doesn't seem to be enforced by the TCK, because I recently successfully run the JCK tests for a 1.5 JDK with a 1.6 VM which supports class file versions up to 50 although Java 1.5 should only support class files up to version 49. Regards, Volker On 5/21/08, Keith McGuigan wrote: > > Ok, at least I found where we keep the JDK version, in the (suprise) > JDK_Version class in java.hpp. That has all the info I'll need and I'll use > that. So I'm all set now, thanks! > > -- > - Keith > From stephen.bohne at sun.com Wed May 21 11:43:21 2008 From: stephen.bohne at sun.com (stephen.bohne at sun.com) Date: Wed, 21 May 2008 18:43:21 +0000 Subject: hg: jdk7/hotspot-rt/hotspot: 6 new changesets Message-ID: <20080521184332.EB3C8284B9@hg.openjdk.java.net> Changeset: b5489bb705c9 Author: ysr Date: 2008-05-06 15:37 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/b5489bb705c9 6662086: 6u4+, 7b11+: CMS never clears referents when -XX:+ParallelRefProcEnabled Summary: Construct the relevant CMSIsAliveClosure used by CMS during parallel reference processing with the correct span. It had incorrectly been constructed with an empty span, a regression introduced in 6417901. Reviewed-by: jcoomes ! src/share/vm/gc_implementation/concurrentMarkSweep/cmsOopClosures.hpp ! src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp ! src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.hpp Changeset: e3729351c946 Author: iveresov Date: 2008-05-09 16:34 +0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/e3729351c946 6697534: Premature GC and invalid lgrp selection with NUMA-aware allocator. Summary: Don't move tops of the chunks in ensure_parsibility(). Handle the situation with Solaris when a machine has a locality group with no memory. Reviewed-by: apetrusenko, jcoomes, ysr ! src/os/solaris/vm/os_solaris.cpp ! src/os/solaris/vm/os_solaris.hpp ! src/share/vm/gc_implementation/shared/mutableNUMASpace.cpp Changeset: f3de1255b035 Author: rasbold Date: 2008-05-07 08:06 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/f3de1255b035 6603011: RFE: Optimize long division Summary: Transform long division by constant into multiply Reviewed-by: never, kvn ! src/cpu/x86/vm/x86_64.ad ! src/share/vm/opto/classes.hpp ! src/share/vm/opto/divnode.cpp ! src/share/vm/opto/mulnode.cpp ! src/share/vm/opto/mulnode.hpp ! src/share/vm/opto/type.hpp ! src/share/vm/utilities/globalDefinitions.hpp Changeset: 7cce9e4e0f7c Author: rasbold Date: 2008-05-09 05:26 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/7cce9e4e0f7c Merge Changeset: 83c868b757c0 Author: jrose Date: 2008-05-14 00:41 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/83c868b757c0 6701024: SAJDI functionality is broken Summary: back out sa-related changes to 6652736, use concrete expressions for WKK names in the SA Reviewed-by: never, sundar ! agent/src/share/classes/sun/jvm/hotspot/memory/SystemDictionary.java ! src/share/vm/classfile/systemDictionary.hpp ! src/share/vm/runtime/vmStructs.cpp Changeset: 7a0a921a1a8c Author: rasbold Date: 2008-05-14 15:01 -0700 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/7a0a921a1a8c Merge From Keith.McGuigan at Sun.COM Thu May 22 06:16:32 2008 From: Keith.McGuigan at Sun.COM (Keith McGuigan) Date: Thu, 22 May 2008 09:16:32 -0400 Subject: request for review (XS) Message-ID: <48357230.8010807@sun.com> http://webrev.invokedynamic.info/kamg/6705523/ This fix has the classfiler parser accept classfiles with version 51 only when the VM is run in the context of JDK7 (or greater). For JDK6, version 50 is the max accepted classfile version. The change in sharedRuntime_sparc.cpp is an unrelated change to eliminate a compile warning. Thanks for any review. -- - Keith From martinrb at google.com Thu May 22 07:29:03 2008 From: martinrb at google.com (Martin Buchholz) Date: Thu, 22 May 2008 07:29:03 -0700 Subject: request for review (XS) In-Reply-To: <48357230.8010807@sun.com> References: <48357230.8010807@sun.com> Message-ID: <1ccfd1c10805220729l54b6d940w5a849daf88b2cae1@mail.gmail.com> Please examine the test jdk/test/java/lang/System/Versions.java and make sure it passes on JDK6 and JDK7, and that all system properties are correctly maintained. This test was specifically written to catch mismatches between hotspot and jdk versions. (I've long argued that hotspot integrators should guarantee that they don't break tests in java.lang or java.util, but I don't believe that has happened yet) Martin On Thu, May 22, 2008 at 6:16 AM, Keith McGuigan wrote: > http://webrev.invokedynamic.info/kamg/6705523/ > > This fix has the classfiler parser accept classfiles with version 51 only > when the VM is run in the context of JDK7 (or greater). For JDK6, version > 50 is the max accepted classfile version. > > The change in sharedRuntime_sparc.cpp is an unrelated change to eliminate a > compile warning. > > Thanks for any review. > > -- > - Keith > From Coleen.Phillimore at Sun.COM Thu May 22 10:30:15 2008 From: Coleen.Phillimore at Sun.COM (Coleen Phillimore - Sun Microsystems) Date: Thu, 22 May 2008 13:30:15 -0400 Subject: Please review 6687581: Make CMS work with compressed oops Message-ID: <4835ADA7.7020406@sun.com> Please email your comments to me regarding this bug fix, described below: Fixed 6687581: Make CMS work with compressed oops Make FreeChunk read markword instead of LSB in _klass pointer to indicate that it's a FreeChunk for compressed oops. Moved the size field first so that it is used consistently for 64 bits w/out compressed oops, and 32 bits to reduce ifdefs. Also needed to change set_klass to not zero the gap since CMS uses this to copy objects. Decoupled set_klass and set_klass_gap in the case of compressed oops. Changed FreeChunk and associated code in SA. Performance results looked very good(not provided). Thanks to jmasa and ramki for their analysis, discussion of the change and prereview. Webrev: http://webrev.invokedynamic.info/coleenp/6687581/ Fix verified: y Testing: nsk stress, sajdi and my sanity testlists with CMS+compressedoops. runthese refworkload/GCOld Reviewed-by: From Vladimir.Kozlov at Sun.COM Thu May 22 11:29:46 2008 From: Vladimir.Kozlov at Sun.COM (Vladimir Kozlov) Date: Thu, 22 May 2008 11:29:46 -0700 Subject: Please review 6687581: Make CMS work with compressed oops In-Reply-To: <4835ADA7.7020406@sun.com> References: <4835ADA7.7020406@sun.com> Message-ID: <4835BB9A.3040007@sun.com> Coleen, I looked on all changes except GC changes. In assembler_sparc.cpp and assembler_x86_64.cpp the method store_klass() is called from tlab_refill() method. You may need to call store_klass_gap() there to zero the gap. Other changes looks good. Thanks, Vladimir Coleen Phillimore - Sun Microsystems wrote: > > Please email your comments to me regarding this bug fix, described below: > > Fixed 6687581: Make CMS work with compressed oops > > Make FreeChunk read markword instead of LSB in _klass pointer to > indicate that it's a FreeChunk for compressed oops. Moved the size > field first so that it is used consistently for 64 bits w/out compressed > oops, and 32 bits to reduce ifdefs. > > Also needed to change set_klass to not zero the gap since CMS uses this > to copy objects. Decoupled set_klass and set_klass_gap in the case of > compressed oops. > > Changed FreeChunk and associated code in SA. > > Performance results looked very good(not provided). Thanks to jmasa and > ramki for their analysis, discussion of the change and prereview. > > Webrev: http://webrev.invokedynamic.info/coleenp/6687581/ > Fix verified: y > Testing: nsk stress, sajdi and my sanity testlists with > CMS+compressedoops. > runthese > refworkload/GCOld > Reviewed-by: From Vladimir.Kozlov at Sun.COM Thu May 22 11:34:48 2008 From: Vladimir.Kozlov at Sun.COM (Vladimir Kozlov) Date: Thu, 22 May 2008 11:34:48 -0700 Subject: Please review 6687581: Make CMS work with compressed oops In-Reply-To: <4835BB9A.3040007@sun.com> References: <4835ADA7.7020406@sun.com> <4835BB9A.3040007@sun.com> Message-ID: <4835BCC8.2060105@sun.com> Never mind. It is the array allocation and the length is stored into the gap. So it was the bug on x64 since the length is stored before the klass and store_klass() will overwrite it with 0. Thanks, Vladimir Vladimir Kozlov wrote: > Coleen, > > I looked on all changes except GC changes. > In assembler_sparc.cpp and assembler_x86_64.cpp > the method store_klass() is called from tlab_refill() method. > You may need to call store_klass_gap() there to zero the gap. > > Other changes looks good. > > Thanks, > Vladimir > > Coleen Phillimore - Sun Microsystems wrote: >> >> Please email your comments to me regarding this bug fix, described below: >> >> Fixed 6687581: Make CMS work with compressed oops >> >> Make FreeChunk read markword instead of LSB in _klass pointer to >> indicate that it's a FreeChunk for compressed oops. Moved the size >> field first so that it is used consistently for 64 bits w/out >> compressed oops, and 32 bits to reduce ifdefs. >> >> Also needed to change set_klass to not zero the gap since CMS uses >> this to copy objects. Decoupled set_klass and set_klass_gap in the >> case of compressed oops. >> >> Changed FreeChunk and associated code in SA. >> >> Performance results looked very good(not provided). Thanks to jmasa >> and ramki for their analysis, discussion of the change and prereview. >> >> Webrev: http://webrev.invokedynamic.info/coleenp/6687581/ >> Fix verified: y >> Testing: nsk stress, sajdi and my sanity testlists with >> CMS+compressedoops. >> runthese >> refworkload/GCOld >> Reviewed-by: From Coleen.Phillimore at Sun.COM Thu May 22 11:28:12 2008 From: Coleen.Phillimore at Sun.COM (Coleen Phillimore - Sun Microsystems) Date: Thu, 22 May 2008 14:28:12 -0400 Subject: Please review 6687581: Make CMS work with compressed oops In-Reply-To: <4835BB9A.3040007@sun.com> References: <4835ADA7.7020406@sun.com> <4835BB9A.3040007@sun.com> Message-ID: <4835BB3C.4010004@sun.com> Thanks Vladimir. The tlab_refill method in the assemblers are creating a typeArrayOop, so are storing the length into the klass gap (ie there's only a klass gap for instanceOops which would be useful to assert for). I think it was wrong before and we were writing over the length on one of the platforms. Coleen Vladimir Kozlov wrote: > Coleen, > > I looked on all changes except GC changes. > In assembler_sparc.cpp and assembler_x86_64.cpp > the method store_klass() is called from tlab_refill() method. > You may need to call store_klass_gap() there to zero the gap. > > Other changes looks good. > > Thanks, > Vladimir > > Coleen Phillimore - Sun Microsystems wrote: >> >> Please email your comments to me regarding this bug fix, described >> below: >> >> Fixed 6687581: Make CMS work with compressed oops >> >> Make FreeChunk read markword instead of LSB in _klass pointer to >> indicate that it's a FreeChunk for compressed oops. Moved the size >> field first so that it is used consistently for 64 bits w/out >> compressed oops, and 32 bits to reduce ifdefs. >> >> Also needed to change set_klass to not zero the gap since CMS uses >> this to copy objects. Decoupled set_klass and set_klass_gap in the >> case of compressed oops. >> >> Changed FreeChunk and associated code in SA. >> >> Performance results looked very good(not provided). Thanks to jmasa >> and ramki for their analysis, discussion of the change and prereview. >> >> Webrev: http://webrev.invokedynamic.info/coleenp/6687581/ >> Fix verified: y >> Testing: nsk stress, sajdi and my sanity testlists with >> CMS+compressedoops. >> runthese >> refworkload/GCOld >> Reviewed-by: From Vladimir.Kozlov at Sun.COM Thu May 22 11:45:40 2008 From: Vladimir.Kozlov at Sun.COM (Vladimir Kozlov) Date: Thu, 22 May 2008 11:45:40 -0700 Subject: Please review 6687581: Make CMS work with compressed oops In-Reply-To: <4835BCC8.2060105@sun.com> References: <4835ADA7.7020406@sun.com> <4835BB9A.3040007@sun.com> <4835BCC8.2060105@sun.com> Message-ID: <4835BF54.90502@sun.com> Vladimir Kozlov wrote: > Never mind. It is the array allocation and the length is stored > into the gap. > > So it was the bug on x64 since the length is stored before the klass > and store_klass() will overwrite it with 0. It may be the cause of 6696264: assert("narrow oop can never be zero"). If GC sees 0 length it can assume that the next object is start after this array header and as result it can get 0 klass. Thanks, Vladimir > > Thanks, > Vladimir > > Vladimir Kozlov wrote: >> Coleen, >> >> I looked on all changes except GC changes. >> In assembler_sparc.cpp and assembler_x86_64.cpp >> the method store_klass() is called from tlab_refill() method. >> You may need to call store_klass_gap() there to zero the gap. >> >> Other changes looks good. >> >> Thanks, >> Vladimir >> >> Coleen Phillimore - Sun Microsystems wrote: >>> >>> Please email your comments to me regarding this bug fix, described >>> below: >>> >>> Fixed 6687581: Make CMS work with compressed oops >>> >>> Make FreeChunk read markword instead of LSB in _klass pointer to >>> indicate that it's a FreeChunk for compressed oops. Moved the size >>> field first so that it is used consistently for 64 bits w/out >>> compressed oops, and 32 bits to reduce ifdefs. >>> >>> Also needed to change set_klass to not zero the gap since CMS uses >>> this to copy objects. Decoupled set_klass and set_klass_gap in the >>> case of compressed oops. >>> >>> Changed FreeChunk and associated code in SA. >>> >>> Performance results looked very good(not provided). Thanks to jmasa >>> and ramki for their analysis, discussion of the change and prereview. >>> >>> Webrev: http://webrev.invokedynamic.info/coleenp/6687581/ >>> Fix verified: y >>> Testing: nsk stress, sajdi and my sanity testlists with >>> CMS+compressedoops. >>> runthese >>> refworkload/GCOld >>> Reviewed-by: From keith.mcguigan at sun.com Fri May 23 06:12:30 2008 From: keith.mcguigan at sun.com (keith.mcguigan at sun.com) Date: Fri, 23 May 2008 13:12:30 +0000 Subject: hg: jdk7/hotspot-rt/hotspot: 6705523: Fix for 6695506 will violate spec when used in JDK6 Message-ID: <20080523131235.6AF14285E5@hg.openjdk.java.net> Changeset: 6b648fefb395 Author: kamg Date: 2008-05-22 13:03 -0400 URL: http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/6b648fefb395 6705523: Fix for 6695506 will violate spec when used in JDK6 Summary: Make max classfile version number dependent on JDK version Reviewed-by: acorn, never ! src/cpu/sparc/vm/sharedRuntime_sparc.cpp ! src/share/vm/classfile/classFileParser.cpp ! src/share/vm/runtime/java.hpp From volker.simonis at gmail.com Tue May 27 07:22:55 2008 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 27 May 2008 16:22:55 +0200 Subject: request for review (XS) In-Reply-To: <1ccfd1c10805220729l54b6d940w5a849daf88b2cae1@mail.gmail.com> References: <48357230.8010807@sun.com> <1ccfd1c10805220729l54b6d940w5a849daf88b2cae1@mail.gmail.com> Message-ID: That's indeed a nice test and the only question that arises is why it isn't part of the JCK-Testsuite? Regards, Volker On 5/22/08, Martin Buchholz wrote: > Please examine the test > jdk/test/java/lang/System/Versions.java > and make sure it passes on JDK6 and JDK7, > and that all system properties are correctly maintained. > This test was specifically written to catch mismatches > between hotspot and jdk versions. > > (I've long argued that hotspot integrators should guarantee > that they don't break tests in java.lang or java.util, but I don't > believe that has happened yet) > > > Martin > > > > On Thu, May 22, 2008 at 6:16 AM, Keith McGuigan wrote: > > http://webrev.invokedynamic.info/kamg/6705523/ > > > > This fix has the classfiler parser accept classfiles with version 51 only > > when the VM is run in the context of JDK7 (or greater). For JDK6, version > > 50 is the max accepted classfile version. > > > > The change in sharedRuntime_sparc.cpp is an unrelated change to eliminate a > > compile warning. > > > > Thanks for any review. > > > > -- > > - Keith > > > From martinrb at google.com Wed May 28 13:35:32 2008 From: martinrb at google.com (Martin Buchholz) Date: Wed, 28 May 2008 13:35:32 -0700 Subject: request for review (XS) In-Reply-To: References: <48357230.8010807@sun.com> <1ccfd1c10805220729l54b6d940w5a849daf88b2cae1@mail.gmail.com> Message-ID: <1ccfd1c10805281335o4db50ebbk36b943956671dfcc@mail.gmail.com> On Tue, May 27, 2008 at 7:22 AM, Volker Simonis wrote: > That's indeed a nice test and the only question that arises is why it > isn't part of the JCK-Testsuite? Mostly for social/organizational reasons, many tests that logically belong in the JCK are only to be found in the regression/unit test suite. Partly it's by intent that JCK tests should not be written by engineers writing the implementation; they are too liable to write white-box tests due to excessive intimacy with the code. But excessive intimacy is not always a bad thing... Martin