From Jerry.Waldorf at Sun.COM Thu Sep 27 23:25:42 2007 From: Jerry.Waldorf at Sun.COM (Jerry Waldorf) Date: Thu, 27 Sep 2007 23:25:42 -0700 Subject: GC and HeapSize questions Message-ID: <46FC9E66.9090804@sun.com> I have been working with the BPEL engine team in our CAPS product group. The issue that they trying to address is around memory uses of business process instance variables that are part of a large number of instances of processes running concurrently. For example, it is possible in a single process to have 100,000 instances each consuming 100,000 bytes of data. That is 10,000,000,000 bytes of stuff. In a regular unix (or windows) process using C if you held all of this data in memory and let the operating system page out the "old" stuff, then having a really large memory process should not be a problem. Just keep it all in memory. The operating system can probably do just as good a job of figuring out what is old and what is new based on LRU than the programmer can. In fact it can probably outperform it because it can do the paging at a much lower, more efficient level in the kernel. With java we have the benefit of the garbage collector. And there is some overhead that the GC has when you have a very large heap that is close to fully allocated. The question is how much is this overhead and would it be worth the extra effort of coding some caching into your java application. Or would it be better to just allocate a really large heap and let java and the operating system manage the paging for you. My guess is that it would be hard for the developer to beat the OS and Java GC so it would be better to use a large amount of heap and let java gc take care of it for you, especially now that we have all of this cool generational stuff in the GC. The below is a very primitive test program that tries to measure the overhead that large heaps add to the GC. On a windows laptop with a 1.5 gig heap it appeared to add around 30% overhead to the GC. Does this sound right? Are there things that can be done to tune the GC to make it behave better in these cases? And is there any work being done to handle very large memory based java applications? /* * Main.java * * Created on Sep 27, 2007, 9:37:09 PM * * To change this template, choose Tools | Templates * and open the template in the editor. */ package javaapplication5; import java.util.ArrayList; /** * * @author jwaldorf */ public class Main { /** * @param args the command line arguments */ public static void main(String[] args) { for (int foo = 0; foo < 10; foo++) { int i = 0; ArrayList l = new ArrayList(); long count; long lstart = System.currentTimeMillis(); for (count = 0; count < 100000000; count++) { String s1 = new String("12345678901234567890123456789012345678901234567890"); } long lend = System.currentTimeMillis(); System.out.println("Low Mem total time = " + (lend - lstart)); lstart = System.currentTimeMillis(); for (count = 0; count < 100000000; count++) { double f = Math.cos(Math.sin(Math.PI) * 234.23432); } lend = System.currentTimeMillis(); System.out.println("Low mem total time non-mem = " + (lend - lstart)); try { // for (int z = 0; z < 3392000; z++) { while (true) { i++; String s = new String("foobar"); l.add(s); } } catch (Throwable t) { l.remove(1000); l.remove(1001); for (int c = 0; c < 100; c++) { l.remove(c); } System.out.println("Iterations = " + i); t.printStackTrace(); } lstart = System.currentTimeMillis(); for (count = 0; count < 100000000; count++) { String s1 = new String("12345678901234567890123456789012345678901234567890"); } lend = System.currentTimeMillis(); System.out.println("Full mem total time mem = " + (lend - lstart)); lstart = System.currentTimeMillis(); for (count = 0; count < 100000000; count++) { double f = Math.cos(Math.sin(Math.PI) * 234.23432); } lend = System.currentTimeMillis(); System.out.println("Full mem total time non-mem = " + (lend - lstart)); } } } -- Jerry Waldorf Chief Architect Software Infrastructure Sun Microsystems jerry.waldorf at sun.com -------------- next part -------------- A non-text attachment was scrubbed... Name: jerry.waldorf.vcf Type: text/x-vcard Size: 216 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20070927/b3b490c8/attachment.vcf From linuxhippy at gmail.com Fri Sep 28 05:16:43 2007 From: linuxhippy at gmail.com (Clemens Eisserer) Date: Fri, 28 Sep 2007 14:16:43 +0200 Subject: GC and HeapSize questions In-Reply-To: <46FC9E66.9090804@sun.com> References: <46FC9E66.9090804@sun.com> Message-ID: <194f62550709280516v6c0db1beh84ffbc8865729873@mail.gmail.com> Hi Jerry, > With java we have the benefit of the garbage collector. And there is > some overhead that the GC has when you have a very large heap that is > close to fully allocated. The question is how much is this overhead and > would it be worth the extra effort of coding some caching into your java > application. Or would it be better to just allocate a really large heap > and let java and the operating system manage the paging for you. My > guess is that it would be hard for the developer to beat the OS and Java > GC so it would be better to use a large amount of heap and let java gc > take care of it for you, especially now that we have all of this cool > generational stuff in the GC. Well the "overhead" a GC causes is really hard to classify because some a lot of this overhead you would also see in a C program where you would use malloc/free and in some areas a GC can even improve performance (e.g. better cache locality). Further the amount of overhead always depends on the application running and which memory requirements it has. I don't understand what you mean with "caching" :-/ However there are problems with large heaps which are swapped. At a full GC a lot of memory is accessed which is usually paged out - which means long pauses to wait for data from I/O. Running the concurrent mark&sweep collector can maybe help with this. > The below is a very primitive test program that tries to measure the > overhead that large heaps add to the GC. On a windows laptop with a 1.5 > gig heap it appeared to add around 30% overhead to the GC. Does this > sound right? Are there things that can be done to tune the GC to make > it behave better in these cases? And is there any work being done to > handle very large memory based java applications? Sorry but your benchmark is seriously flawed. Of course if your work is only to allocate Objects and add it to some lists you have a lot of GC overhead - because the only thing you do is allocating objects ... so if you stess the GC, and only the GC a lot of time will be used in it ;) So after all whats the problem the bpel team experiences? Do they experience large pauses at full GC, slow allocation or paging? What does running with gc logging turned on say? Good luck, lg Clemens From Y.S.Ramakrishna at Sun.COM Fri Sep 28 08:57:53 2007 From: Y.S.Ramakrishna at Sun.COM (Y.S.Ramakrishna at Sun.COM) Date: Fri, 28 Sep 2007 08:57:53 -0700 Subject: GC and HeapSize questions In-Reply-To: <46FC9E66.9090804@sun.com> References: <46FC9E66.9090804@sun.com> Message-ID: <46FD2481.3040008@Sun.COM> Hi Jerry -- With all extant GC's in HotSpot, if your heap is being poaged out then bad things happen whenever your JVM does a GC of the whole heap, and your performance will tank. There is work out there on letting GC deal efficiently with heaps that do not fit in main memory, but there is no current plan to include such technology in HotSpot (as far as i know). Having said that, as long as your heap is never paged out, we do have a parallel-concurrent collector that will collect large heaps fairly efficiently with low pause times. I have heard of users with large in-memory databases that run with heaps that are several 10's of GB. It might require a bit of tuning of the garbage collector to get optimal performance. You might want to read the documentation available under: http://java.sun.com/javase/technologies/hotspot/gc/index.jsp best. -- ramki. Jerry Waldorf wrote: > I have been working with the BPEL engine team in our CAPS product > group. The issue that they trying to address is around memory uses of > business process instance variables that are part of a large number of > instances of processes running concurrently. For example, it is > possible in a single process to have 100,000 instances each consuming > 100,000 bytes of data. That is 10,000,000,000 bytes of stuff. > > In a regular unix (or windows) process using C if you held all of this > data in memory and let the operating system page out the "old" stuff, > then having a really large memory process should not be a problem. Just > keep it all in memory. The operating system can probably do just as > good a job of figuring out what is old and what is new based on LRU than > the programmer can. In fact it can probably outperform it because it > can do the paging at a much lower, more efficient level in the kernel. > > With java we have the benefit of the garbage collector. And there is > some overhead that the GC has when you have a very large heap that is > close to fully allocated. The question is how much is this overhead and > would it be worth the extra effort of coding some caching into your java > application. Or would it be better to just allocate a really large heap > and let java and the operating system manage the paging for you. My > guess is that it would be hard for the developer to beat the OS and Java > GC so it would be better to use a large amount of heap and let java gc > take care of it for you, especially now that we have all of this cool > generational stuff in the GC. > > The below is a very primitive test program that tries to measure the > overhead that large heaps add to the GC. On a windows laptop with a 1.5 > gig heap it appeared to add around 30% overhead to the GC. Does this > sound right? Are there things that can be done to tune the GC to make > it behave better in these cases? And is there any work being done to > handle very large memory based java applications? > > /* > * Main.java > * > * Created on Sep 27, 2007, 9:37:09 PM > * > * To change this template, choose Tools | Templates > * and open the template in the editor. > */ > > package javaapplication5; > > import java.util.ArrayList; > > /** > * > * @author jwaldorf > */ > public class Main { > > /** > * @param args the command line arguments > */ > public static void main(String[] args) { > for (int foo = 0; foo < 10; foo++) { > int i = 0; > ArrayList l = new ArrayList(); > long count; > long lstart = System.currentTimeMillis(); > for (count = 0; count < 100000000; count++) { > String s1 = new > String("12345678901234567890123456789012345678901234567890"); > } > long lend = System.currentTimeMillis(); > > System.out.println("Low Mem total time = " + (lend - lstart)); > > lstart = System.currentTimeMillis(); > for (count = 0; count < 100000000; count++) { > double f = Math.cos(Math.sin(Math.PI) * 234.23432); > } > lend = System.currentTimeMillis(); > System.out.println("Low mem total time non-mem = " + (lend - > lstart)); > > try { > // for (int z = 0; z < 3392000; z++) { > while (true) { > i++; > String s = new String("foobar"); > l.add(s); > } > } catch (Throwable t) { > l.remove(1000); > l.remove(1001); > for (int c = 0; c < 100; c++) { > l.remove(c); > } > System.out.println("Iterations = " + i); > t.printStackTrace(); > } > lstart = System.currentTimeMillis(); > for (count = 0; count < 100000000; count++) { > String s1 = new > String("12345678901234567890123456789012345678901234567890"); > } > lend = System.currentTimeMillis(); > System.out.println("Full mem total time mem = " + (lend - > lstart)); > > lstart = System.currentTimeMillis(); > for (count = 0; count < 100000000; count++) { > double f = Math.cos(Math.sin(Math.PI) * 234.23432); > } > lend = System.currentTimeMillis(); > System.out.println("Full mem total time non-mem = " + (lend - > lstart)); > } > } > } > From denka.b at gmail.com Fri Sep 28 09:09:27 2007 From: denka.b at gmail.com (Denis Baranov) Date: Fri, 28 Sep 2007 09:09:27 -0700 Subject: GC and HeapSize questions In-Reply-To: <46FC9E66.9090804@sun.com> References: <46FC9E66.9090804@sun.com> Message-ID: There is a possibility to use off-heap caching via commercial or OS product to manage small heap size and, thus, small full GC times. Haven't used t myself, and did not read anywhere about somebody using it for this purpose, but makes sense. A price to pay is extra CPU that will be burned (de)serializing data, but with quad-cores, even modest machines have that extra CPU power to burn. If non-stop performance is that important... Denis. On 9/27/07, Jerry Waldorf wrote: > > I have been working with the BPEL engine team in our CAPS product > group. The issue that they trying to address is around memory uses of > business process instance variables that are part of a large number of > instances of processes running concurrently. For example, it is > possible in a single process to have 100,000 instances each consuming > 100,000 bytes of data. That is 10,000,000,000 bytes of stuff. > > In a regular unix (or windows) process using C if you held all of this > data in memory and let the operating system page out the "old" stuff, > then having a really large memory process should not be a problem. Just > keep it all in memory. The operating system can probably do just as > good a job of figuring out what is old and what is new based on LRU than > the programmer can. In fact it can probably outperform it because it > can do the paging at a much lower, more efficient level in the kernel. > > With java we have the benefit of the garbage collector. And there is > some overhead that the GC has when you have a very large heap that is > close to fully allocated. The question is how much is this overhead and > would it be worth the extra effort of coding some caching into your java > application. Or would it be better to just allocate a really large heap > and let java and the operating system manage the paging for you. My > guess is that it would be hard for the developer to beat the OS and Java > GC so it would be better to use a large amount of heap and let java gc > take care of it for you, especially now that we have all of this cool > generational stuff in the GC. > > The below is a very primitive test program that tries to measure the > overhead that large heaps add to the GC. On a windows laptop with a 1.5 > gig heap it appeared to add around 30% overhead to the GC. Does this > sound right? Are there things that can be done to tune the GC to make > it behave better in these cases? And is there any work being done to > handle very large memory based java applications? > > /* > * Main.java > * > * Created on Sep 27, 2007, 9:37:09 PM > * > * To change this template, choose Tools | Templates > * and open the template in the editor. > */ > > package javaapplication5; > > import java.util.ArrayList; > > /** > * > * @author jwaldorf > */ > public class Main { > > /** > * @param args the command line arguments > */ > public static void main(String[] args) { > for (int foo = 0; foo < 10; foo++) { > int i = 0; > ArrayList l = new ArrayList(); > long count; > long lstart = System.currentTimeMillis(); > for (count = 0; count < 100000000; count++) { > String s1 = new > String("12345678901234567890123456789012345678901234567890"); > } > long lend = System.currentTimeMillis(); > > System.out.println("Low Mem total time = " + (lend - lstart)); > > lstart = System.currentTimeMillis(); > for (count = 0; count < 100000000; count++) { > double f = Math.cos(Math.sin(Math.PI) * 234.23432); > } > lend = System.currentTimeMillis(); > System.out.println("Low mem total time non-mem = " + (lend - > lstart)); > > try { > // for (int z = 0; z < 3392000; z++) { > while (true) { > i++; > String s = new String("foobar"); > l.add(s); > } > } catch (Throwable t) { > l.remove(1000); > l.remove(1001); > for (int c = 0; c < 100; c++) { > l.remove(c); > } > System.out.println("Iterations = " + i); > t.printStackTrace(); > } > lstart = System.currentTimeMillis(); > for (count = 0; count < 100000000; count++) { > String s1 = new > String("12345678901234567890123456789012345678901234567890"); > } > lend = System.currentTimeMillis(); > System.out.println("Full mem total time mem = " + (lend - > lstart)); > > lstart = System.currentTimeMillis(); > for (count = 0; count < 100000000; count++) { > double f = Math.cos(Math.sin(Math.PI) * 234.23432); > } > lend = System.currentTimeMillis(); > System.out.println("Full mem total time non-mem = " + (lend > - lstart)); > } > } > } > > -- > Jerry Waldorf > Chief Architect > Software Infrastructure > Sun Microsystems > jerry.waldorf at sun.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20070928/dc71376f/attachment.html