Profiling the hotspot (further questions),
Dang Nhan Nguyen
nhann at chalmers.se
Wed Nov 16 15:07:39 UTC 2011
Thanks Charlie and Jon,
The guide did help me a lot, and I can do the profiling with Collector/Analyzer.
As I was doing, I have a couple of questions which I described below.
I used collector-analyzer in Solaris Studio Performance Analyzing to keep track of performance of CMS.
The I used Analyser GUI interface to analyze the recording result. In the interface, I used tab Functions and filter out all the library except for pthread and jvm library. The I look at the execution time of different methods of classes that related to CMS (CMSCollector, CMSConcMarkingTask....)
The result I got is ALMOST enough which showed me the time for phases in CMS collector, which were the concurrent marking phase (time that CMSConcMarkingTask.work() was executed), remark phase (CMSParRemarkTask.work() was executed) and sweep (CMSCollector.sweep())
But then I encounter a couple of issues:
1. The initial mark phase was not record. In all my measures, I found no record for CMSCollector::checkpointRootsInitialWork(), which is the method that performs initial marking phase. Why "collector" collect information of certain methods (e.g CMSCollector::collect_in_background(), sweep()...) but not the others?
How can I force the collect (of collector/analyzer) to record this method?
2. As with concurrent marking phases, is the inclusive CPU time (in Performance Analyzer) to execute CMSConcMarkingTask.work() is sufficient for concurrent marking phase. The reason for this question is due to my observations (please correct me if I'm wrong):
- concurrent marking will use GangWorker to perform the sub-tasks generated by CMSConcMarkingTask.
- In turn, this Gangworker will call CMSConcMarkingTask.work(i) to perform the sub-tasks.
- CMSConcMarkingTask.work(i) then calls CMSConcMarkingTask::do_scan_and_mark() in which a Par_MarkFromRootsClosure object is created and iterate over the MarkBitMap of CMS.
My concern is that if I consider time to execute CMSConcMarkingTask.work() (the inclusive CPU time) is the time spending on CMS Concurrent marking phase, is it sufficient in the respect that it includes all the works in concurernt marking phases?
Or should I include any other additional works done in this phase? If so, what are they?
> -----Original Message-----
> From: charlie hunt [mailto:charlie.hunt at oracle.com]
> Sent: Saturday, November 12, 2011 5:32 PM
> To: Dang Nhan Nguyen
> Cc: Jon Masamitsu; build-dev at openjdk.java.net; hotspot-gc-
> dev at openjdk.java.net
> Subject: Re: Profiling the hotspot,
> Hi Nhan,
> This is a bit long winded ... hopefully you'll find it useful though.
> If you are wanting to only view HotSpot method names in the profile and
> not also view HotSpot source code embedded in the profile and/or not
> profile with hardware/CPU counters, then you do not need to build
> anything special to see HotSpot method names with Oracle Solaris Studio
> Performance Analyzer, (aka collector analyzer).
> Btw, don't let "Solaris" embedded in the product name deter you.
> Although the official product name is "Oracle Solaris Studio", it is
> also available on Linux. Because the product was originally built for
> Solaris (both x86 and SPARC), its more mature on Solaris than Linux,
> (the Linux port was done a couple years ago -- fwiw, I almost always
> profile on Solaris since it is more mature and I can get hardware
> counter info "out of the box"). The Linux version requires a (fairly)
> recent kernel, i.e. Red Hat EL 5, SuSE ES 11 or Oracle Enterprise Linux
> 5 or more recent versions. Other variants of Linux may work, i.e.
> Ubuntu (I've run it on recent Ubuntu releases).
> If you want to see hardware/CPU counters info in the profile, then see
> the section below "Hardware/CPU Counter Profiling"
> If you want to see HotSpot source code embedded in the profile, then
> the section below "Embedded HotSpot Source Code".
> To limit the amount data collected in the profile, you can use
> -y <SIGNAL> option to toggle on and toggle off profile gathering.
> To help find what you are looking for in tracing GC thread(s) activity,
> once you look at the profile in Analyzer, I'd suggest to start by
> looking at the Timeline tab where you see a row for every thread in the
> JVM and Java application. Although each row doesn't list the thread
> name, (an enhancement I've asked for), you can click in a given thread
> row and look at the Call Stack panel to get a sense of which thread it
> is. The Timeline will also show when the thread was busy. You could
> use the Timeline and Call Stack panel to figure out what phase CMS is
> in. From there you can filter the profile data based on the time line
> info and for those CMS threads. That'll allow you to look specifically
> at the CMS threads and when they are running.
> There's an intro to how to use Performance Analyzer in the Java
> Performance book, but (unfortunately) it does not explicitly show your
> use case. It does, however, offer an intro of the basic concepts.
> There's also some really good documentation available at the Oracle
> Solaris Studio product web site:
> There is also some tutorials in the product distribution that can help
> get you familiar with the tool.
> Download page is:
> Hardware/CPU Counter Profiling
> If you want to see hardware/CPU counters info in the profile, then, on
> Linux, you have to patch the kernel. Fwiw, patching the kernel is not
> needed for Solaris (x86 or SPARC). If you want to see what hardware
> counters are available, a simple 'collect' command with no args will
> printout hardware counters that can be profiled. There's also a
> in the Java Performance book that describes how to profile with
> hardware/CPU counters.
> Embedded HotSpot Source Code
> If you want to see the HotSpot source code embedded in the profile,
> you need to do a custom build of HotSpot. There's a couple ways you
> accomplish what you need and what I list here is not necessary the best
> way, or the only way. Hopefully the instructions below are not out of
> date. These worked for me about 6 months ago. I'm sure there's others
> who can offer an easier approach or offer corrections, (I'm not a
> HotSpot build expert).
> This is what I've done on Linux:
> 1. After getting a copy of the HotSpot source code from the source code
> 2. Edit the <ws>/make/linux/makefiles/amd64.make file.
> 3. Append -g0 to the line, OPT_CFLAGS/compactingPermGenGen.o = -O1
> i.e. OPT_CFLAGS/compactingPermGenGen.o = -O1 -g0
> 4. Edit the <ws>/make/linux/makefiles/gcc.make file.
> 5. Append -g0 to the line, OPT_CFLAGS += -O3 i.e. OPT_CFLAGS += -O3 -
> 6. Append -gdwarf-2 to the line, OPT_CFLAGS += -fno-strict-aliasing
> i.e. OPT_CFLAGS += -fno-strict-aliasing -gdwarf-2
> 7. Append -g0 to the line, OPT_CFLAGS/NOOPT=-O0 i.e.
> OPT_CFLAGS/NOOPT=-O0 -g0
> 8. Append -g0 to the line, OPT_CFLAGS/mulnode.o += -O0 i.e.
> OPT_CFLAGS/mulnode.o += -O0 -g0
> 9. Change this line, DEBUG_CFLAGS += -gstabs, to DEBUG_CFLAGS += -
> 10. Edit the <ws>/make/linux/makefiles/product.make file.
> 11. Comment out these two lines:
> STRIP_LIBJVM = $(STRIP) -g $@ || exit 1;
> STRIP_AOUT = $(STRIP) -x $@ || exit 1;
> #STRIP_LIBJVM = $(STRIP) -g $@ || exit 1;
> #STRIP_AOUT = $(STRIP) -x $@ || exit 1;
> 12. Now build HotSpot using the "product" target and build for either
> 64-bit or 32-bit, i.e. to build 64-bit libjvm.so do, $ gmake LP64=1
> charlie ...
> On 11/12/11 01:44 AM, Jon Masamitsu wrote:
> > Take a look at Charlie's blog about the collector analyzer tools
> > http://blogs.oracle.com/charliebrown/entry/free_sun_studio_12_and
> > And the quick start guide
> > http://dsc.sun.com/solaris/articles/analyzer_qs.html
> > The profiling build target in hotspot was deleted because they
> > were not being used and I believe were not building correctly.
> > On 11/11/2011 7:23 PM, David Holmes wrote:
> >> As this mainly concerns GC profiling I've cc'ed the GC list.
> >> David
> >> On 12/11/2011 2:22 AM, Dang Nhan Nguyen wrote:
> >>> Hi,
> >>> I need to measure some performance (mostly running time) of GC
> >>> while running on multi-core platform on Linux. I want to break down
> >>> the cost of each phase in CMS and some related method. And I am
> >>> thinking of performance profiling tool to do that.
> >>> I have look at the build files and see that there used to be some
> >>> "profiled" target for profiling hotspot. However, it was removed
> >>> (don't know why).
> >>> Now, I am looking for a way to do the profiling. As my experience
> >>> with profiling the hotspot on Windows (using Visual Studio), the
> >>> generated of profiling data is huge (2-6GB) and in many cases, my
> >>> computer could not handle the analysis of profiling data. In
> >>> addition, hotspot's code and its build are quite complex, I want to
> >>> consult some opinions first before actually do it.
> >>> I am grateful if you can suggest a way to build the hotspot on
> >>> for profiling purpose.
> >>> Thank you,
> >>> Nhan Nguyen
More information about the build-dev