Request for review (M): 7016112 CMS: crash during promotion testing
joel.franck at oracle.com
Tue Jul 5 04:56:26 PDT 2011
On 06/23/2011 03:46 AM, Coleen Phillimore wrote:
> http://cr.openjdk.java.net/~brutisso/7016112/webrev.01/src/share/vm/utilities/quicksort.cpp.html <http://cr.openjdk.java.net/%7Ebrutisso/7016112/webrev.01/src/share/vm/utilities/quicksort.cpp.html>
> 46 bool aIsOdd = a % 2 == 1;
> 47 bool bIsOdd = b % 2 == 1;
Sorry for being late, but this does not work for negative numbers. Also
all the tests are on arrays of positive numbers. Perhaps this does not
matter that much because it is testing code ...
JVM Sustaining Engineering
> Can you add () around the bit operations? Since the order precedence of
> these operators is always surprising (at least to me).
> I like the way you set the pattern for the internal testing framework.
> Thanks for getting this started.
> When we move these methodOops out of permgen, we might want to go back
> to stdlib::quicksort. It would be less code for us to maintain.
> Otherwise, this looks pretty straightforward and I like this fix.
> On 6/21/2011 5:32 PM, Bengt Rutisson wrote:
>> Hi again,
>> For completeness. Here is the graph for sorting maximum length arrays
>> on Linux x64 (run on my laptop). These runs show that my
>> implementation takes twice as long as the stdlib version. I am not
>> happy about that, but I don't know how much effort it is worth to
>> optimize for this case.
>> On 2011-06-21 14:50, Bengt Rutisson wrote:
>>> Hi Runtime and GC,
>>> Sending this review request to both groups. I fixed a GC bug, but the
>>> changes are in runtime code.
>>> The bug that I fixed is this one:
>>> 7016112 CMS: crash during promotion testing
>>> And here is the webrev:
>>> The investigation to find the root of the crashes reported in 7016112
>>> was quite lengthy. It is not that easy to read the CR and figure out
>>> what is going on, so here is some background:
>>> When we load classes we store references to the methods of a class in
>>> an object array. When we have loaded all methods we sort the object
>>> array to allow binary search within the array. To do this sort we use
>>> stdlib::qsort(), which is the standard library quicksort implementation.
>>> If we are using CMS we might be doing concurrent marking while we are
>>> sorting the object array. The object array can be found by the
>>> concurrent marking code and it may start iterating over the array
>>> while we are sorting it. The problem is that on Windows the
>>> stdlib::qsort() is not implemented to do atomic updates of elements
>>> in the array that it sorts. Instead it does a byte-by-byte swap when
>>> it needs to swap two values. That is an easy way to implement
>>> different element widths, but it means that at some point in time one
>>> element may contain a few bytes from the element above or below. If
>>> this happens at the same time as the marking code is trying to read
>>> that element, we will be marking some random address and not the
>>> method that was supposed to be marked.
>>> On Solaris and Linux the stdlib::qsort() implementations try to swap
>>> as wide data types as possible so this issue should not occur there.
>>> On the other hand we have no guarantees that this will always be how
>>> stdlib::qsort() is implemented.
>>> After some discussions about different ways of solving this we came
>>> to the conclusion that the simplest way is to implement our own
>>> quicksort that operates on the correct pointer width (oop or narrowOop).
>>> So, this is what I have done to fix this bug.
>>> Also, it is likely that this problem will go away when the perm gen
>>> removal project is finished. Right now it looks like we will not be
>>> tracing and marking methods at all after that change.
>>> * Questions *
>>> - Should we keep the bubble sort that is done before calling
>>> quicksort in methodOopDesc::sort_methods() ?
>>> - Should we keep the idempotent option or should we try to always use
>>> idempotent sorting (see performance test below)?
>>> - What is the best way to handle unit tests? I added a flag called
>>> ExecuteInternalVMTests to run unit tests. This is in line with the
>>> existing ErrorHandlerTest flag. My thought is that we can use this
>>> same flag for other unit tests than just the quicksort tests. Would
>>> be good if we could get these tests executed by JPRT as well. I
>>> simply run these with "java -XX:+ExecuteInternalVMTests -version".
>>> * Testing *
>>> Did the obvious testing: Ran JPRT with the changes in the webrev and
>>> ran the failing nsk test from the bug
>>> (nsk/sysdict/vm/stress/jck12a/sysdictj12a008) repeatedly for sevaral
>>> days without failing.
>>> I created some unit tests for the quicksort implementation and they
>>> all pass.
>>> I also made a build that sorts both with my own quicksort and with
>>> stdlib::qsort and then compares that the arrays have the same sort
>>> order. Ran this through JPRT and it seems to work on all platforms.
>>> That run also included the unit tests. If anybody wants to see how
>>> this testing was done, there is a separate webrev for that build. The
>>> interesting code is in methodOop.cpp:
>>> * Performance *
>>> I am a bit unsure how to get any relevant performance data for this
>>> change. What I have done so far is to create a class that has 65535
>>> methods (which is the maximum - the length of the method array is u2)
>>> and I have measured how long it takes to sort this method array. The
>>> methods have random names but every hundredth method has 4 overloaded
>>> version. This makes sure that there are some duplicates in the array.
>>> For now I have run this on my Windows x64 work station with 4 cpus
>>> and on a Solaris Sparc machine with 2 cpus (uname says: SunOS
>>> sthsparc24 5.10 Generic_139555-08 sun4us sparc FJSV,GPUZC-M Solaris).
>>> I am attaching graphs for the results. The Y-axis has time in nano
>>> seconds. Judging by this my quicksort is a bit faster on Windows and
>>> a bit slower on Solaris. But the interesting thing is that the
>>> idempotent version is faster than the default behavior on Windows and
>>> on par with the default on Solaris. I assume that this is due to the
>>> fact that some stores can be avoided if we don't do swap of
>>> duplicates. This means that (at least on Windows) swap is more
>>> expensive than compare. If this is true we should probably remove the
>>> special treatment of idempotent.
>>> I could run this on more machines, but I am not sure how relevant
>>> this type of data is. Most method arrays will be much shorter and
>>> compared to reading the class from a file the sort will be in the noise.
>>> Long email...I hope I covered most of the issues here. Let me know if
>>> you have any questions.
More information about the hotspot-runtime-dev