<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">JC,<div class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Apr 11, 2018, at 8:17 PM, JC Beyler <<a href="mailto:jcbeyler@google.com" class="">jcbeyler@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hi Karen,<br class=""><br class="">I put up a new webrev that is feature complete in my mind in terms of implementation. There could be a few tid-bits of optimizations here and there but I believe everything is now there in terms of features and there is the question of placement of collectors (I've now put it lower than what you talk about in this email thread).</div></div></blockquote>I believe that the primary goal of your JEP is to catch samples of allocation due to bytecodes. Given that, it makes sense to</div><div>put the collectors in the code generators, so I am ok with your leaving them where they are.</div><div><br class=""></div><div><div>And it would save us all a lot of cycles if rather than frequent webrev updates with subsets of the requested changes - if you</div><div>could wait until you’ve added all the changes people requested - then that would increase the signal to noise ratio.</div><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">The incremental webrev is here:<div class=""><a href="http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.12_13/" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.12_13/</a><br class=""></div><div class="">and the full webrev is here:</div><div class=""><a href="http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.13/" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.13/</a><br class=""></div><div class=""><br class=""></div><div class="">The incremental webrev contains the name change of TLAB fields as per the conversation going on in the GC thread and the Graal changes to make this webrev not break anything. It also contains a test for when VM and sampled events are enabled at the same time. </div><div class=""></div><div class=""><br class=""></div><div class="">I'll answer your questions inlined below:</div><div class=""><br class=""><div class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div dir="auto" style="word-wrap:break-word" class=""><br class=""><div class=""><br class=""></div><div class="">I have a couple of design questions before getting to the detailed code review:</div><div class=""><br class=""></div><div class="">1. jvmtiExport.cpp</div><div class="">You have simplified the JvmtiObjectAllocEventCollector to assume there is only a single object.</div><div class="">Do you have a test that allocates a multi-dimensional array?</div><div class="">I would expect that to have multiple subarrays  - and need the logic that you removed.</div></div></div></blockquote><div class=""><br class=""></div><div class="">I "think" you misread this. I did not change the implementation of <span style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline" class="">JvmtiObjectAllocEventCollector to assume only one object. Actually the implementation is still doing what it was doing initially but now JvmtiObjectAllocEventCollector is a main class with two inherited behaviors:</span></div><div class=""><span style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline" class="">   - JvmtiVMObjectAllocEventCollector is following the same logic as before</span></div><div class="">   - JvmtiSampledObjectAllocEventCollector has the logic of a single allocation during collection since it is placed that low. I'm thinking of perhaps separating the two classes now that the sampled case is so low and does not require the extra logic of handling a growable array.</div><div class=""><br class=""></div><div class="">I don't have a test that tests multi-dimensional array. I have not looked at what exactly VMObjectAllocEvents track but I did do an example to see:</div><div class=""><br class=""></div><div class="">- On a clean JDK, if I allocate:</div><div class="">     int[][][] myAwesomeTable = new int[10][10][10];</div><div class=""><br class=""></div><div class="">I get one single VMObject call back.</div></div></div></div></div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class=""><div class="gmail_quote"><div class=""><br class=""></div><div class="">- With my change, I get the same behavior.</div><div class=""><br class=""></div><div class="">So instead, I did an object containing an array and cloned it. With the clean JDK I get two VM callbacks with and without my change. <br class=""></div><div class=""><br class=""></div><div class="">I'll add a test in the next webrev to ensure correctness.</div><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div dir="auto" style="word-wrap:break-word" class=""><div class=""><br class=""></div><div class="">*** Please add a test in which you allocate a multi-dimensional array - and try it with your earlier version which</div><div class="">will show the full set of allocated objects</div></div></div></blockquote><div class=""><br class=""></div><div class="">As I said, this works because it is so low in the system. I don't know the internals of the allocation of a three-dimensional array but:</div><div class="">  - If I try to collect all allocations required for the <span style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline" class="">new int[10][10][10], I get more than a hundred allocation callbacks if the sample rate is 0, which is what I'd expect (getting a callback at each allocation, so what I expect)</span></div><div class=""> </div><div class="">So though I only collect one object, I get a callback for each if that is what we want in regard to the sampling rate.</div></div></div></div></div></div></div></blockquote>I’ll let you work this one out with Serguei - if he is ok with your change to not have a growable array and only report one object,</div><div>given that you are sampling, sounds like that is not a functional loss for you.</div><div><br class=""></div><div>And yes, please do add the test.</div><div><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class=""><div class="gmail_quote"><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div dir="auto" style="word-wrap:break-word" class=""><div class=""><br class=""></div><div class="">2. Tests - didn’t read them all - just ran into a hardcoded “10% error ensures a sanity test without becoming flaky”</div><div class="">do check with Serguei on this - that looks like a potential future test failure</div></div></div></blockquote><div class=""><br class=""></div><div class="">Because the sampling rate is a geometric variable around a mean of the sampling rate, any meaningful test is going to have to be statistical. Therefore, I do a bit of error acceptance to allow us to test for the real thing and not hack the code to have less "real" tests. This is what we do internally, let me know if you want me to do it otherwise.</div><div class=""><br class=""></div><div class="">Flaky is probably a wrong term or perhaps I need to better explain this. I'll change the comment in the tests to explain that potentially flakyness comes from the nature of the geometrical mean. Because we don't want too long running tests, it makes sense to me to have this error percentage.</div><div class=""><br class=""></div><div class="">Let me now answer the comments from the other email here as well so we have all answers and conversations in a single thread:</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div dir="auto" style="word-wrap:break-word" class=""><div class=""><div class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial" class="">        - Note there is one current caveat: if the agent requests the VMObjectAlloc event, the sampler defers to that event due to a limitation in its implementation (ie: I am not convinced I can safely send out that event with the VM collector enabled, I'll happily white board that).</div></div></div></div></blockquote>Please work that one out with Serguei and the serviceability folks.</div><div class=""><br class=""></div></div></div></div></blockquote><div class=""><br class=""></div><div class="">Agreed. I'll follow up with Serguei if this is a potential problem, I have to double check and ensure I am right that there is an issue. I see it as just a matter of life and not a problem for now. IF you do want both events, having a sample drop due to this limitation does not invalidate the system in my mind. I could be wrong about it though and would happily go over what I saw.</div><div class="">   </div><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div dir="auto" style="word-wrap:break-word" class=""><div class=""><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div class="gmail_quote"><div class=""><br class=""></div><div class="">So the Heap Sampling Monitoring System used to have more methods. It made sense to have them in a separate category. I now have moved it to the memory category to be consistent and grouped there. I also removed that link btw. </div></div></div></blockquote></div></div></blockquote>Thanks.<br class=""></div></div></div></blockquote><div class=""><br class=""></div><div class="">Actually it did seem weird to put it there since there was only Allocate/Deallocate, so for now the method is still again in its own category once again. If someone has a better spot, let me know.</div><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div dir="auto" style="word-wrap:break-word" class=""><div class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div class=""><div class="gmail_quote"><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><div class=""><br class=""></div><div class="">I was trying to figure out a way to put the collectors farther down the call stack so as to both catch more</div><div class="">cases and to reduce the maintenance burden - i.e. if you were to add a new code generator, e.g. Graal -</div><div class="">if it were to go through an existing interface, that might be a place to already have a collector.</div><div class=""><br class=""></div><div class="">I do not know the Graal sources - I did look at jvmci/jvmciRuntime.cpp - and it appears that there</div><div class="">are calls to instanceKlass::new_instance, oopFactory::new_typeArray/new_ObjArray and ArrayKlass::multi-allocate,</div><div class="">so one possibility would be to put hooks in those calls which would catch many? (I did not do a thorough search)</div><div class="">of the slowpath calls for the bytecodes, and then check the fast paths in detail.</div></div></div></blockquote><div class=""><br class=""></div><div class="">I'll come to a major issue with the collector and its placement in the next paragraph.</div></div></div></div></blockquote></div></div></div></div></blockquote>Still not clear on why you did not move the collectors into instanceKlass::new_instance and oopFactory::newtypeArray/newObjArray</div><div class="">and ArrayKlass::multi-allocate.<br class=""></div></div></div></blockquote></div></div></div></div></div></div></blockquote>As I said above, I am ok with your leaving the collectors in the code generators since that is your focus.<br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class=""><div class="gmail_quote"><div class=""><br class=""></div><div class="">I think what was happening is that the collectors would wrap the objects in handles but references to the originally allocated object would be on the stack still and would not get updated if required. Due to that issue, I believe was getting weird bugs. </div><div class=""><br class=""></div><div class="">Because of this, it seems that any VM collector enabled has to guarantee that either:</div><div class="">   - its path to destruction (and thus posting of events) has no means of triggering a GC that would move things around (as long as you are in VM code you should be fine I believe)</div></div></div></div></div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class=""><div class="gmail_quote"><div class="">   - if GC is occuring, the objects in its internal array are not somewhere on the stack without a handle around them to be able to be moved if need by from a GC operation.</div></div></div></div></div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class=""><div class="gmail_quote"><div class=""><br class=""></div><div class="">I'm not convinced this holds in the multithreaded with sampling and VM collection cases.</div></div></div></div></div></div></div></blockquote>I will let you and Serguei work out whether you have sufficient test coverage for the multithreaded cases.</div><div><br class=""></div><div>thanks,</div><div>Karen<br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class=""><div class="gmail_quote"><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div dir="auto" style="word-wrap:break-word" class=""><div class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div class=""><div class="gmail_quote"><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><div class=""><br class=""></div><div class="">I had wondered if it made sense to move the hooks even farther down, into CollectedHeap:obj_allocate and array_allocate.</div><div class="">I do not think so. First reason is that for multidimensional arrays, ArrayKlass::multi_allocate the outer dimension array would</div><div class="">have an event before storing the inner sub-arrays and I don’t think we want that exposed, so that won’t work for arrays.</div></div></div></blockquote><div class=""><br class=""></div><div class="">So the major difficulty is that the steps of collection do this:</div><div class=""><br class=""></div><div class="">- An object gets allocated and is decided to be sampled</div><div class="">- The original pointer placement (where it resides originally in memory) is passed to the collector</div><div class="">- Now one important thing of note:</div><div class="">    (a) In the VM code, until the point where the oop is going to be returned, GC is not yet aware of it</div></div></div></div></blockquote></div></div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div class=""><div class="gmail_quote"><div class="">    (b) so the collector can't yet send it out to the user via JVMTI otherwise, the agent could put a weak reference for example</div><div class=""><br class=""></div><div class="">I'm a bit fuzzy on this and maybe it's just that there would be more heavy lifting to make this possible but my initial tests seem to show problems when attempting this in the obj_allocate area.</div></div></div></div></blockquote></div></div></div></div></blockquote>Not sure what you are seeing here - </div><div class=""><br class=""></div><div class=""><div class="">Let me state it the way I understand it.</div><div class="">1) You can collect the object into internal metadata at allocation point - which you already had</div><div class="">Note: see comment in JvmtiExport.cpp: </div><div class="">// In the case of the sampled object collector, we don’t want to perform the</div><div class="">// oops_do because the object in the collector is still stored in registers </div><div class="">// on the VM stack</div><div class="">   - so GC will find these objects as roots, once we allow GC to run, which should be after the header is initialized</div><div class=""><br class=""></div><div class="">Totally agree with you that you can not post the event in the source code in which you allocate the memory - keep reading</div><div class=""><br class=""></div><div class="">2) event posting:</div><div class="">   - you want to ensure that the object has been fully initialized</div><div class="">   - the object needs to have the header set up - not just the memory allocated - so that applies to all objects</div><div class="">   (and that is done in a caller of the allocation code - so it can’t be done at the location which does the memory allocation)</div><div class="">   - the example I pointed out was the multianewarray case - all the subarrays need to be allocated</div><div class=""><br class=""></div><div class="">- please add a test case for multi-array, so as you experiment with where to post the event, you ensure that you can</div><div class="">access the subarrays (e.g. a 3D array of length 5 has 5 2D arrays as subarrays)</div></div></div></div></blockquote><div class=""><br class=""></div><div class="">Technically it's more than just initialized, it is the fact that you cannot perform a callback about an object if any object of that thread is being held by a collector and also in a register/stack space without protections.</div><div class=""><br class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div dir="auto" style="word-wrap:break-word" class=""><div class=""><div class=""></div><div class="">  - prior to setting up the object header information, GC would not know about the object</div><div class="">    - was this  by chance the issue you ran into?</div></div></div></div></blockquote><div class=""><br class=""></div><div class="">No I believe the issue I was running into was above where an object on the stack was pointing to an oop that got moved during a GC due to that thread doing an event callback.</div><div class=""> </div><div class="">Thanks for your help,</div><div class="">Jc</div><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div dir="auto" style="word-wrap:break-word" class=""><div class=""><div class=""><br class=""></div><div class="">3) event posting</div><div class="">   - when you post the event to JVMTI</div><div class="">   - in JvmtiObjectAllocEventMark: sets _jobj (object)to_jobject(obj), which creates JNIHandles::make_local(_thread, obj)</div><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div class=""><div class="gmail_quote"><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><div class=""><br class=""></div><div class="">The second reason is that I strongly suspect the scope you want is bytecodes only. I think once you have added hooks</div><div class="">to all the fast paths and slow paths that this will be pushing the performance overhead constraints you proposed and</div><div class="">you won’t want to see e.g. internal allocations. </div></div></div></blockquote><div class=""><br class=""></div><div class="">Yes agreed, allocations from bytecodes are mostly our concern generally :)</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><div class=""> <br class=""></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><div class=""></div><div class="">But I think you need to experiment with the set of allocations (or possible alternative sets of allocations) you want recorded.</div><div class=""><br class=""></div><div class="">The hooks I see today include:</div><div class="">Interpreter: (looking at x86 as a sample)</div><div class="">  - slowpath in InterpreterRuntime</div><div class="">  - fastpath tlab allocation - your new threshold check handles that</div></div></div></blockquote><div class=""><br class=""></div><div class="">Agreed<br class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><div class="">  - allow_shared_alloc (GC specific): for _new isn’t handled</div></div></div></blockquote><div class=""><br class=""></div><div class="">Where is that exactly? I can check why we are not catching it?</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><div class=""><br class=""></div><div class="">C1</div><div class="">  I don’t see changes in c1_Runtime.cpp</div><div class="">  note: you also want to look for the fast path</div></div></div></blockquote><div class=""><br class=""></div><div class="">I added the calls to c1_Runtime in the latest webrev, but was still going through testing before pushing it out. I had waited on this one a bit. Fast path would be handled by the threshold check no?</div><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><div class=""><br class=""></div><div class="">C2: changes in opto/runtime.cpp for slow path</div><div class="">   did you also catch the fast path?</div></div></div></blockquote><div class=""><br class=""></div><div class="">Fast path gets handled by the same threshold check, no? Perhaps I've missed something (very likely)?</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><div class=""><br class=""></div><div class="">3. Performance - </div><div class="">After you get all the collectors added - you need to rerun the performance numbers. </div></div></div></blockquote><div class=""><br class=""></div><div class="">Agreed :)</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><div class=""><br class=""></div><div class="">thanks,</div><div class="">Karen</div></div><div class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Apr 5, 2018, at 2:15 PM, JC Beyler <<a href="mailto:jcbeyler@google.com" target="_blank" class="">jcbeyler@google.com</a>> wrote:</div><br class="gmail-m_2654971596882259531gmail-m_2792660692152568680m_2153986306415466254m_1207530620928143740Apple-interchange-newline"><div class=""><div dir="ltr" class="">Thanks Boris and Derek for testing it.<div class=""><br class=""></div><div class="">Yes I was trying to get a new version out that had the tests ported as well but got sidetracked while trying to add tests and two new features.</div><div class=""><br class=""></div><div class="">Here is the incremental webrev:</div><div class=""><br class=""></div><div class="">Here is the full webrev:</div><div class=""><a href="http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.11/</a><br class=""></div><div class=""><br class=""></div><div class="">Basically, the new tests assert this:</div><div class="">  - Only one agent can currently ask for the sampling, I'm currently seeing if I can push to a next webrev the multi-agent support to start doing a code freeze on this one</div><div class="">  - The event is not thread-enabled, meaning like the VMObjectAllocationEvent, it's an all or nothing event; same as the multi-agent, I'm going to see if a future webrev to add the support is a better idea to freeze this webrev a bit</div><div class=""><br class=""></div><div class="">There was another item that I added here and I'm unsure this webrev is stable in debug mode: I added an assertion system to ascertain that all paths leading to a TLAB slow path (and hence a sampling point) have a sampling collector ready to post the event if a user wants it. This might break a few thing in debug mode as I'm working through the kinks of that as well. However, in release mode, this new webrev passes all the tests in hotspot/jtreg/serviceability/jvmti/HeapMonitor.</div><div class=""><br class=""></div><div class="">Let me know what you think,</div><div class="">Jc</div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="">On Thu, Apr 5, 2018 at 4:56 AM Boris Ulasevich <<a href="mailto:boris.ulasevich@bell-sw.com" target="_blank" class="">boris.ulasevich@bell-sw.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi JC,<br class="">
<br class="">
   I have just checked on arm32: your patch compiles and runs ok.<br class="">
<br class="">
   As I can see, jtreg agentlib name "-agentlib:HeapMonitor" does not<br class="">
correspond to actual library name: libHeapMonitorTest.c -><br class="">
libHeapMonitorTest.so<br class="">
<br class="">
Boris<br class="">
<br class="">
On 04.04.2018 01:54, White, Derek wrote:<br class="">
> Thanks JC,<br class="">
><br class="">
> New patch applies cleanly. Compiles and runs (simple test programs) on<br class="">
> aarch64.<br class="">
><br class="">
>   * Derek<br class="">
><br class="">
> *From:* JC Beyler [mailto:<a href="mailto:jcbeyler@google.com" target="_blank" class="">jcbeyler@google.com</a>]<br class="">
> *Sent:* Monday, April 02, 2018 1:17 PM<br class="">
> *To:* White, Derek <<a href="mailto:Derek.White@cavium.com" target="_blank" class="">Derek.White@cavium.com</a>><br class="">
> *Cc:* Erik Österlund <<a href="mailto:erik.osterlund@oracle.com" target="_blank" class="">erik.osterlund@oracle.com</a>>;<br class="">
> <a href="mailto:serviceability-dev@openjdk.java.net" target="_blank" class="">serviceability-dev@openjdk.java.net</a>; hotspot-compiler-dev<br class="">
> <<a href="mailto:hotspot-compiler-dev@openjdk.java.net" target="_blank" class="">hotspot-compiler-dev@openjdk.java.net</a>><br class="">
> *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling<br class="">
><br class="">
> Hi Derek,<br class="">
><br class="">
> I know there were a few things that went in that provoked a merge<br class="">
> conflict. I worked on it and got it up to date. Sadly my lack of<br class="">
> knowledge makes it a full rebase instead of keeping all the history.<br class="">
> However, with a newly cloned jdk/hs you should now be able to use:<br class="">
><br class="">
> <a href="http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/</a><br class="">
><br class="">
> The change you are referring to was done with the others so perhaps you<br class="">
> were unlucky and I forgot it in a webrev and fixed it in another? I<br class="">
> don't know but it's been there and I checked, it is here:<br class="">
><br class="">
> <a href="http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html</a><br class="">
><br class="">
> I double checked that tlab_end_offset no longer appears in any<br class="">
> architecture (as far as I can tell :)).<br class="">
><br class="">
> Thanks for testing and let me know if you run into any other issues!<br class="">
><br class="">
> Jc<br class="">
><br class="">
> On Fri, Mar 30, 2018 at 4:24 PM White, Derek <<a href="mailto:Derek.White@cavium.com" target="_blank" class="">Derek.White@cavium.com</a><br class="">
> <mailto:<a href="mailto:Derek.White@cavium.com" target="_blank" class="">Derek.White@cavium.com</a>>> wrote:<br class="">
><br class="">
>     Hi Jc,<br class="">
><br class="">
>     I’ve been having trouble getting your patch to apply correctly. I<br class="">
>     may have based it on the wrong version.<br class="">
><br class="">
>     In any case, I think there’s a missing update to<br class="">
>     macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(),<br class="">
>     where “JavaThread::tlab_end_offset()” should become<br class="">
>     “JavaThread::tlab_current_end_offset()”.<br class="">
><br class="">
>     This should correspond to the other port’s changes in<br class="">
>     templateTable_<cpu>.cpp files.<br class="">
><br class="">
>     Thanks!<br class="">
>     - Derek<br class="">
><br class="">
>     *From:* hotspot-compiler-dev<br class="">
>     [mailto:<a href="mailto:hotspot-compiler-dev-bounces@openjdk.java.net" target="_blank" class="">hotspot-compiler-dev-bounces@openjdk.java.net</a><br class="">
>     <mailto:<a href="mailto:hotspot-compiler-dev-bounces@openjdk.java.net" target="_blank" class="">hotspot-compiler-dev-bounces@openjdk.java.net</a>>] *On Behalf<br class="">
>     Of *JC Beyler<br class="">
>     *Sent:* Wednesday, March 28, 2018 11:43 AM<br class="">
>     *To:* Erik Österlund <<a href="mailto:erik.osterlund@oracle.com" target="_blank" class="">erik.osterlund@oracle.com</a><br class="">
>     <mailto:<a href="mailto:erik.osterlund@oracle.com" target="_blank" class="">erik.osterlund@oracle.com</a>>><br class="">
>     *Cc:* <a href="mailto:serviceability-dev@openjdk.java.net" target="_blank" class="">serviceability-dev@openjdk.java.net</a><br class="">
>     <mailto:<a href="mailto:serviceability-dev@openjdk.java.net" target="_blank" class="">serviceability-dev@openjdk.java.net</a>>; hotspot-compiler-dev<br class="">
>     <<a href="mailto:hotspot-compiler-dev@openjdk.java.net" target="_blank" class="">hotspot-compiler-dev@openjdk.java.net</a><br class="">
>     <mailto:<a href="mailto:hotspot-compiler-dev@openjdk.java.net" target="_blank" class="">hotspot-compiler-dev@openjdk.java.net</a>>><br class="">
>     *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling<br class="">
><br class="">
>     Hi all,<br class="">
><br class="">
>     I've been working on deflaking the tests mostly and the wording in<br class="">
>     the JVMTI spec.<br class="">
><br class="">
>     Here is the two incremental webrevs:<br class="">
><br class="">
>     <a href="http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/</a><br class="">
><br class="">
>     <a href="http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/</a><br class="">
><br class="">
>     Here is the total webrev:<br class="">
><br class="">
>     <a href="http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/</a><br class="">
><br class="">
>     Here are the notes of this change:<br class="">
><br class="">
>        - Currently the tests pass 100 times in a row, I am working on<br class="">
>     checking if they pass 1000 times in a row.<br class="">
><br class="">
>        - The default sampling rate is set to 512k, this is what we use<br class="">
>     internally and having a default means that to enable the sampling<br class="">
>     with the default, the user only has to do a enable event/disable<br class="">
>     event via JVMTI (instead of enable + set sample rate).<br class="">
><br class="">
>        - I deprecated the code that was handling the fast path tlab<br class="">
>     refill if it happened since this is now deprecated<br class="">
><br class="">
>            - Though I saw that Graal is still using it so I have to see<br class="">
>     what needs to be done there exactly<br class="">
><br class="">
>     Finally, using the Dacapo benchmark suite, I noted a 1% overhead for<br class="">
>     when the event system is turned on and the callback to the native<br class="">
>     agent is just empty. I got a 3% overhead with a 512k sampling rate<br class="">
>     with the code I put in the native side of my tests.<br class="">
><br class="">
>     Thanks and comments are appreciated,<br class="">
><br class="">
>     Jc<br class="">
><br class="">
>     On Mon, Mar 19, 2018 at 2:06 PM JC Beyler <<a href="mailto:jcbeyler@google.com" target="_blank" class="">jcbeyler@google.com</a><br class="">
>     <mailto:<a href="mailto:jcbeyler@google.com" target="_blank" class="">jcbeyler@google.com</a>>> wrote:<br class="">
><br class="">
>         Hi all,<br class="">
><br class="">
>         The incremental webrev update is here:<br class="">
><br class="">
>         <a href="http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/</a><br class="">
><br class="">
>         The full webrev is here:<br class="">
><br class="">
>         <a href="http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/</a><br class="">
><br class="">
>         Major change here is:<br class="">
><br class="">
>            - I've removed the heapMonitoring.cpp code in favor of just<br class="">
>         having the sampling events as per Serguei's request; I still<br class="">
>         have to do some overhead measurements but the tests prove the<br class="">
>         concept can work<br class="">
><br class="">
>                 - Most of the tlab code is unchanged, the only major<br class="">
>         part is that now things get sent off to event collectors when<br class="">
>         used and enabled.<br class="">
><br class="">
>            - Added the interpreter collectors to handle interpreter<br class="">
>         execution<br class="">
><br class="">
>            - Updated the name from SetTlabHeapSampling to<br class="">
>         SetHeapSampling to be more generic<br class="">
><br class="">
>            - Added a mutex for the thread sampling so that we can<br class="">
>         initialize an internal static array safely<br class="">
><br class="">
>            - Ported the tests from the old system to this new one<br class="">
><br class="">
>         I've also updated the JEP and CSR to reflect these changes:<br class="">
><br class="">
>         <a href="https://bugs.openjdk.java.net/browse/JDK-8194905" rel="noreferrer" target="_blank" class="">https://bugs.openjdk.java.net/browse/JDK-8194905</a><br class="">
><br class="">
>         <a href="https://bugs.openjdk.java.net/browse/JDK-8171119" rel="noreferrer" target="_blank" class="">https://bugs.openjdk.java.net/browse/JDK-8171119</a><br class="">
><br class="">
>         In order to make this have some forward progress, I've removed<br class="">
>         the heap sampling code entirely and now rely entirely on the<br class="">
>         event sampling system. The tests reflect this by using a<br class="">
>         simplified implementation of what an agent could do:<br class="">
><br class="">
>         <a href="http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c</a><br class="">
><br class="">
>         (Search for anything mentioning event_storage).<br class="">
><br class="">
>         I have not taken the time to port the whole code we had<br class="">
>         originally in heapMonitoring to this. I hesitate only because<br class="">
>         that code was in C++, I'd have to port it to C and this is for<br class="">
>         tests so perhaps what I have now is good enough?<br class="">
><br class="">
>         As far as testing goes, I've ported all the relevant tests and<br class="">
>         then added a few:<br class="">
><br class="">
>             - Turning the system on/off<br class="">
><br class="">
>             - Testing using various GCs<br class="">
><br class="">
>             - Testing using the interpreter<br class="">
><br class="">
>             - Testing the sampling rate<br class="">
><br class="">
>             - Testing with objects and arrays<br class="">
><br class="">
>             - Testing with various threads<br class="">
><br class="">
>         Finally, as overhead goes, I have the numbers of the system off<br class="">
>         vs a clean build and I have 0% overhead, which is what we'd<br class="">
>         want. This was using the Dacapo benchmarks. I am now preparing<br class="">
>         to run a version with the events on using dacapo and will report<br class="">
>         back here.<br class="">
><br class="">
>         Any comments are welcome :)<br class="">
><br class="">
>         Jc<br class="">
><br class="">
>         On Thu, Mar 8, 2018 at 4:00 PM JC Beyler <<a href="mailto:jcbeyler@google.com" target="_blank" class="">jcbeyler@google.com</a><br class="">
>         <mailto:<a href="mailto:jcbeyler@google.com" target="_blank" class="">jcbeyler@google.com</a>>> wrote:<br class="">
><br class="">
>             Hi all,<br class="">
><br class="">
>             I apologize for the delay but I wanted to add an event<br class="">
>             system and that took a bit longer than expected and I also<br class="">
>             reworked the code to take into account the deprecation of<br class="">
>             FastTLABRefill.<br class="">
><br class="">
>             This update has four parts:<br class="">
><br class="">
>             A) I moved the implementation from Thread to<br class="">
>             ThreadHeapSampler inside of Thread. Would you prefer it as a<br class="">
>             pointer inside of Thread or like this works for you? Second<br class="">
>             question would be would you rather have an association<br class="">
>             outside of Thread altogether that tries to remember when<br class="">
>             threads are live and then we would have something like:<br class="">
><br class="">
>             ThreadHeapSampler::get_sampling_size(this_thread);<br class="">
><br class="">
>             I worry about the overhead of this but perhaps it is not too<br class="">
>             too bad?<br class="">
><br class="">
>             B) I also have been working on the Allocation event system<br class="">
>             that sends out a notification at each sampled event. This<br class="">
>             will be practical when wanting to do something at the<br class="">
>             allocation point. I'm also looking at if the whole<br class="">
>             heapMonitoring code could not reside in the agent code and<br class="">
>             not in the JDK. I'm not convinced but I'm talking to Serguei<br class="">
>             about it to see/assess :)<br class="">
><br class="">
>                 - Also added two tests for the new event subsystem<br class="">
><br class="">
>             C) Removed the slow_path fields inside the TLAB code since<br class="">
>             now FastTLABRefill is deprecated<br class="">
><br class="">
>             D) Updated the JVMTI documentation and specification for the<br class="">
>             methods.<br class="">
><br class="">
>             So the incremental webrev is here:<br class="">
><br class="">
>             <a href="http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/</a><br class="">
><br class="">
>             and the full webrev is here:<br class="">
><br class="">
>             <a href="http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10</a><br class="">
><br class="">
>             I believe I have updated the various JIRA issues that track<br class="">
>             this :)<br class="">
><br class="">
>             Thanks for your input,<br class="">
><br class="">
>             Jc<br class="">
><br class="">
>             On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler<br class="">
>             <<a href="mailto:jcbeyler@google.com" target="_blank" class="">jcbeyler@google.com</a> <mailto:<a href="mailto:jcbeyler@google.com" target="_blank" class="">jcbeyler@google.com</a>>> wrote:<br class="">
><br class="">
>                 Hi Erik,<br class="">
><br class="">
>                 I inlined my answers, which the last one seems to answer<br class="">
>                 Robbin's concerns about the same thing (adding things to<br class="">
>                 Thread).<br class="">
><br class="">
>                 On Wed, Feb 14, 2018 at 2:51 AM, Erik Österlund<br class="">
>                 <<a href="mailto:erik.osterlund@oracle.com" target="_blank" class="">erik.osterlund@oracle.com</a><br class="">
>                 <mailto:<a href="mailto:erik.osterlund@oracle.com" target="_blank" class="">erik.osterlund@oracle.com</a>>> wrote:<br class="">
><br class="">
>                     Hi JC,<br class="">
><br class="">
>                     Comments are inlined below.<br class="">
><br class="">
>                     On 2018-02-13 06:18, JC Beyler wrote:<br class="">
><br class="">
>                         Hi Erik,<br class="">
><br class="">
>                         Thanks for your answers, I've now inlined my own<br class="">
>                         answers/comments.<br class="">
><br class="">
>                         I've done a new webrev here:<br class="">
><br class="">
>                         <a href="http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/</a><br class="">
>                         <<a href="http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.08/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.08/</a>><br class="">
><br class="">
>                         The incremental is here:<br class="">
><br class="">
>                         <a href="http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/</a><br class="">
>                         <<a href="http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/</a>><br class="">
><br class="">
>                         Note to all:<br class="">
><br class="">
>                            - I've been integrating changes from<br class="">
>                         Erin/Serguei/David comments so this webrev<br class="">
>                         incremental is a bit an answer to all comments<br class="">
>                         in one. I apologize for that :)<br class="">
><br class="">
>                         On Mon, Feb 12, 2018 at 6:05 AM, Erik Österlund<br class="">
>                         <<a href="mailto:erik.osterlund@oracle.com" target="_blank" class="">erik.osterlund@oracle.com</a><br class="">
>                         <mailto:<a href="mailto:erik.osterlund@oracle.com" target="_blank" class="">erik.osterlund@oracle.com</a>>> wrote:<br class="">
><br class="">
>                             Hi JC,<br class="">
><br class="">
>                             Sorry for the delayed reply.<br class="">
><br class="">
>                             Inlined answers:<br class="">
><br class="">
><br class="">
><br class="">
>                             On 2018-02-06 00:04, JC Beyler wrote:<br class="">
><br class="">
>                                 Hi Erik,<br class="">
><br class="">
>                                 (Renaming this to be folded into the<br class="">
>                                 newly renamed thread :))<br class="">
><br class="">
>                                 First off, thanks a lot for reviewing<br class="">
>                                 the webrev! I appreciate it!<br class="">
><br class="">
>                                 I updated the webrev to:<br class="">
>                                 <a href="http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/</a><br class="">
>                                 <<a href="http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.05a/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.05a/</a>><br class="">
><br class="">
>                                 And the incremental one is here:<br class="">
>                                 <a href="http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/</a><br class="">
>                                 <<a href="http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.04_05a/" rel="noreferrer" target="_blank" class="">http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.04_05a/</a>><br class="">
><br class="">
>                                 It contains:<br class="">
>                                 - The change for since from 9 to 11 for<br class="">
>                                 the jvmti.xml<br class="">
>                                 - The use of the OrderAccess for initialized<br class="">
>                                 - Clearing the oop<br class="">
><br class="">
>                                 I also have inlined my answers to your<br class="">
>                                 comments. The biggest question<br class="">
>                                 will come from the multiple *_end<br class="">
>                                 variables. A bit of the logic there<br class="">
>                                 is due to handling the slow path refill<br class="">
>                                 vs fast path refill and<br class="">
>                                 checking that the rug was not pulled<br class="">
>                                 underneath the slowpath. I<br class="">
>                                 believe that a previous comment was that<br class="">
>                                 TlabFastRefill was going to<br class="">
>                                 be deprecated.<br class="">
><br class="">
>                                 If this is true, we could revert this<br class="">
>                                 code a bit and just do a : if<br class="">
>                                 TlabFastRefill is enabled, disable this.<br class="">
>                                 And then deprecate that when<br class="">
>                                 TlabFastRefill is deprecated.<br class="">
><br class="">
>                                 This might simplify this webrev and I<br class="">
>                                 can work on a follow-up that<br class="">
>                                 either: removes TlabFastRefill if Robbin<br class="">
>                                 does not have the time to do<br class="">
>                                 it or add the support to the assembly<br class="">
>                                 side to handle this correctly.<br class="">
>                                 What do you think?<br class="">
><br class="">
>                             I support removing TlabFastRefill, but I<br class="">
>                             think it is good to not depend on that<br class="">
>                             happening first.<br class="">
><br class="">
><br class="">
>                         I'm slowly pushing on the FastTLABRefill<br class="">
>                         (<a href="https://bugs.openjdk.java.net/browse/JDK-8194084" rel="noreferrer" target="_blank" class="">https://bugs.openjdk.java.net/browse/JDK-8194084</a>),<br class="">
>                         I agree on keeping both separate for now though<br class="">
>                         so that we can think of both differently<br class="">
><br class="">
>                                 Now, below, inlined are my answers:<br class="">
><br class="">
>                                 On Fri, Feb 2, 2018 at 8:44 AM, Erik<br class="">
>                                 Österlund<br class="">
>                                 <<a href="mailto:erik.osterlund@oracle.com" target="_blank" class="">erik.osterlund@oracle.com</a><br class="">
>                                 <mailto:<a href="mailto:erik.osterlund@oracle.com" target="_blank" class="">erik.osterlund@oracle.com</a>>> wrote:<br class="">
><br class="">
>                                     Hi JC,<br class="">
><br class="">
>                                     Hope I am reviewing the right<br class="">
>                                     version of your work. Here goes...<br class="">
><br class="">
>                                     src/hotspot/share/gc/shared/collectedHeap.inline.hpp:<br class="">
><br class="">
>                                        159<br class="">
>                                       AllocTracer::send_allocation_outside_tlab(klass, result, size *<br class="">
>                                     HeapWordSize, THREAD);<br class="">
>                                        160<br class="">
>                                        161<br class="">
>                                       THREAD->tlab().handle_sample(THREAD, result, size);<br class="">
>                                        162     return result;<br class="">
>                                        163   }<br class="">
><br class="">
>                                     Should not call tlab()->X without<br class="">
>                                     checking if (UseTLAB) IMO.<br class="">
><br class="">
>                                 Done!<br class="">
><br class="">
><br class="">
>                             More about this later.<br class="">
><br class="">
>                                     src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp:<br class="">
><br class="">
>                                     So first of all, there seems to<br class="">
>                                     quite a few ends. There is an "end",<br class="">
>                                     a "hard<br class="">
>                                     end", a "slow path end", and an<br class="">
>                                     "actual end". Moreover, it seems<br class="">
>                                     like the<br class="">
>                                     "hard end" is actually further away<br class="">
>                                     than the "actual end". So the "hard end"<br class="">
>                                     seems like more of a "really<br class="">
>                                     definitely actual end" or something.<br class="">
>                                     I don't<br class="">
>                                     know about you, but I think it looks<br class="">
>                                     kind of messy. In particular, I don't<br class="">
>                                     feel like the name "actual end"<br class="">
>                                     reflects what it represents,<br class="">
>                                     especially when<br class="">
>                                     there is another end that is behind<br class="">
>                                     the "actual end".<br class="">
><br class="">
>                                        413 HeapWord*<br class="">
>                                     ThreadLocalAllocBuffer::hard_end() {<br class="">
>                                        414   // Did a fast TLAB refill<br class="">
>                                     occur?<br class="">
>                                        415   if (_slow_path_end != _end) {<br class="">
>                                        416     // Fix up the actual end<br class="">
>                                     to be now the end of this TLAB.<br class="">
>                                        417     _slow_path_end = _end;<br class="">
>                                        418     _actual_end = _end;<br class="">
>                                        419   }<br class="">
>                                        420<br class="">
>                                        421   return _actual_end +<br class="">
>                                     alignment_reserve();<br class="">
>                                        422 }<br class="">
><br class="">
>                                     I really do not like making getters<br class="">
>                                     unexpectedly have these kind of side<br class="">
>                                     effects. It is not expected that<br class="">
>                                     when you ask for the "hard end", you<br class="">
>                                     implicitly update the "slow path<br class="">
>                                     end" and "actual end" to new values.<br class="">
><br class="">
>                                 As I said, a lot of this is due to the<br class="">
>                                 FastTlabRefill. If I make this<br class="">
>                                 not supporting FastTlabRefill, this goes<br class="">
>                                 away. The reason the system<br class="">
>                                 needs to update itself at the get is<br class="">
>                                 that you only know at that get if<br class="">
>                                 things have shifted underneath the tlab<br class="">
>                                 slow path. I am not sure of<br class="">
>                                 really better names (naming is hard!),<br class="">
>                                 perhaps we could do these<br class="">
>                                 names:<br class="">
><br class="">
>                                 - current_tlab_end       // Either the<br class="">
>                                 allocated tlab end or a sampling point<br class="">
>                                 - last_allocation_address  // The end of<br class="">
>                                 the tlab allocation<br class="">
>                                 - last_slowpath_allocated_end  // In<br class="">
>                                 case a fast refill occurred the<br class="">
>                                 end might have changed, this is to<br class="">
>                                 remember slow vs fast past refills<br class="">
><br class="">
>                                 the hard_end method can be renamed to<br class="">
>                                 something like:<br class="">
>                                 tlab_end_pointer()        // The end of<br class="">
>                                 the lab including a bit of<br class="">
>                                 alignment reserved bytes<br class="">
><br class="">
>                             Those names sound better to me. Could you<br class="">
>                             please provide a mapping from the old names<br class="">
>                             to the new names so I understand which one<br class="">
>                             is which please?<br class="">
><br class="">
>                             This is my current guess of what you are<br class="">
>                             proposing:<br class="">
><br class="">
>                             end -> current_tlab_end<br class="">
>                             actual_end -> last_allocation_address<br class="">
>                             slow_path_end -> last_slowpath_allocated_end<br class="">
>                             hard_end -> tlab_end_pointer<br class="">
><br class="">
>                         Yes that is correct, that was what I was proposing.<br class="">
><br class="">
>                             I would prefer this naming:<br class="">
><br class="">
>                             end -> slow_path_end // the end for taking a<br class="">
>                             slow path; either due to sampling or refilling<br class="">
>                             actual_end -> allocation_end // the end for<br class="">
>                             allocations<br class="">
>                             slow_path_end -> last_slow_path_end // last<br class="">
>                             address for slow_path_end (as opposed to<br class="">
>                             allocation_end)<br class="">
>                             hard_end -> reserved_end // the end of the<br class="">
>                             reserved space of the TLAB<br class="">
><br class="">
>                             About setting things in the getter... that<br class="">
>                             still seems like a very unpleasant thing to<br class="">
>                             me. It would be better to inspect the call<br class="">
>                             hierarchy and explicitly update the ends<br class="">
>                             where they need updating, and assert in the<br class="">
>                             getter that they are in sync, rather than<br class="">
>                             implicitly setting various ends as a<br class="">
>                             surprising side effect in a getter. It looks<br class="">
>                             like the call hierarchy is very small. With<br class="">
>                             my new naming convention, reserved_end()<br class="">
>                             would presumably return _allocation_end +<br class="">
>                             alignment_reserve(), and have an assert<br class="">
>                             checking that _allocation_end ==<br class="">
>                             _last_slow_path_allocation_end, complaining<br class="">
>                             that this invariant must hold, and that a<br class="">
>                             caller to this function, such as<br class="">
>                             make_parsable(), must first explicitly<br class="">
>                             synchronize the ends as required, to honor<br class="">
>                             that invariant.<br class="">
><br class="">
><br class="">
>                         I've renamed the variables to how you preferred<br class="">
>                         it except for the _end one. I did:<br class="">
><br class="">
>                         current_end<br class="">
><br class="">
>                         last_allocation_address<br class="">
><br class="">
>                         tlab_end_ptr<br class="">
><br class="">
>                         The reason is that the architecture dependent<br class="">
>                         code use the thread.hpp API and it already has<br class="">
>                         tlab included into the name so it becomes<br class="">
>                         tlab_current_end (which is better that<br class="">
>                         tlab_current_tlab_end in my opinion).<br class="">
><br class="">
>                         I also moved the update into a separate method<br class="">
>                         with a TODO that says to remove it when<br class="">
>                         FastTLABRefill is deprecated<br class="">
><br class="">
>                     This looks a lot better now. Thanks.<br class="">
><br class="">
>                     Note that the following comment now needs updating<br class="">
>                     accordingly in threadLocalAllocBuffer.hpp:<br class="">
><br class="">
>                        41 //            Heap sampling is performed via<br class="">
>                     the end/actual_end fields.<br class="">
><br class="">
>                        42 //            actual_end contains the real end<br class="">
>                     of the tlab allocation,<br class="">
><br class="">
>                        43 //            whereas end can be set to an<br class="">
>                     arbitrary spot in the tlab to<br class="">
><br class="">
>                        44 //            trip the return and sample the<br class="">
>                     allocation.<br class="">
><br class="">
>                        45 //            slow_path_end is used to track<br class="">
>                     if a fast tlab refill occured<br class="">
><br class="">
>                        46 //            between slowpath calls.<br class="">
><br class="">
>                     There might be other comments too, I have not looked<br class="">
>                     in detail.<br class="">
><br class="">
>                 This was the only spot that still had an actual_end, I<br class="">
>                 fixed it now. I'll do a sweep to double check other<br class="">
>                 comments.<br class="">
><br class="">
><br class="">
><br class="">
>                                 Not sure it's better but before updating<br class="">
>                                 the webrev, I wanted to try<br class="">
>                                 to get input/consensus :)<br class="">
><br class="">
>                                 (Note hard_end was always further off<br class="">
>                                 than end).<br class="">
><br class="">
>                                     src/hotspot/share/prims/jvmti.xml:<br class="">
><br class="">
>                                     10357       <capabilityfield<br class="">
>                                     id="can_sample_heap" since="9"><br class="">
>                                     10358         <description><br class="">
>                                     10359           Can sample the heap.<br class="">
>                                     10360           If this capability<br class="">
>                                     is enabled then the heap sampling<br class="">
>                                     methods<br class="">
>                                     can be called.<br class="">
>                                     10361         </description><br class="">
>                                     10362       </capabilityfield><br class="">
><br class="">
>                                     Looks like this capability should<br class="">
>                                     not be "since 9" if it gets integrated<br class="">
>                                     now.<br class="">
><br class="">
>                                 Updated now to 11, crossing my fingers :)<br class="">
><br class="">
>                                     src/hotspot/share/runtime/heapMonitoring.cpp:<br class="">
><br class="">
>                                        448       if<br class="">
>                                     (is_alive->do_object_b(value)) {<br class="">
>                                        449         // Update the oop to<br class="">
>                                     point to the new object if it is still<br class="">
>                                     alive.<br class="">
>                                        450         f->do_oop(&(trace.obj));<br class="">
>                                        451<br class="">
>                                        452         // Copy the old<br class="">
>                                     trace, if it is still live.<br class="">
>                                        453<br class="">
>                                       _allocated_traces->at_put(curr_pos++, trace);<br class="">
>                                        454<br class="">
>                                        455         // Store the live<br class="">
>                                     trace in a cache, to be served up on<br class="">
>                                     /heapz.<br class="">
>                                        456<br class="">
>                                       _traces_on_last_full_gc->append(trace);<br class="">
>                                        457<br class="">
>                                        458         count++;<br class="">
>                                        459       } else {<br class="">
>                                        460         // If the old trace<br class="">
>                                     is no longer live, add it to</blockquote></div></div></blockquote></div></div></div></blockquote></div></div></div></blockquote></div></div></div></div></blockquote></div></div></div></blockquote></div></div></div></div></div>
</div></blockquote></div><br class=""></div></body></html>