<AWT Dev> [8] Review request for 8015730: PIT: On Linux, OGL=true and fbobject=false leads to deadlock or infinite loop

Artem Ananiev artem.ananiev at oracle.com
Wed Jul 3 09:35:26 PDT 2013

On 7/3/2013 7:46 PM, Anthony Petrov wrote:
> On 07/03/2013 06:47 PM, Artem Ananiev wrote:
>> On 7/1/2013 8:50 PM, Anthony Petrov wrote:
>>> Thanks for the additional information, Anton. Since this fix simply
>>> reverts the behavior in GLXSurfaceData.c back to the pre-8005607 era, it
>>> could probably be considered a good interim solution for the problem.
>>> I'd like to hear Artem's opinion on this, though. Should we file a P4
>>> bug to investigate the issue further so that in the future we could
>>> avoid calling XSync() w/o the AWTLock?
>> I raised exactly the same concerns about calling XSync and waiting on
>> the AWT lock, when we discussed this issue offline with Anton. Before
>> the fix for 8005607, XSync() was called without AWT lock, which is not
>> what the current fix is: now XSync() is not called at all. So behavior
>> will be different than it was before 8005607.
> I believe there's a misunderstanding. With the current fix (8015730)
> XSync is being called:
> src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c
>>  398     XSync(awt_display, False);

OK, got it.

> All other usages of the EXEC_WITH_XERROR_HANDLER() pass TRUE to the
> RESTORE_XERROR_HANDLER(), and hence call XSync(), too (in
> XErrorHandlerUtil.java) with or without the current fix...
>> Anton, did you investigate, why XSync() was in the native macros before
>> 8005607? Note that all the XErrorHandler code in XAWT worked fine
>> without XSync() before that fix, so why does the native Java2D require
>> this?
> This isn't correct. XSYnc() is really required when using an
> XErrorHandler to ensure that the request that may generate an error is
> sent to the X server. While this doesn't ensure if the request gets
> processed, it's nevertheless better (and works in most practical cases)
> than going w/o an XSync() call.

I looked to the webrev for 8005607 once again and found XSync() was 
there before that fix. And you're right, it's required to flush Xlib quene.

OK, so I currently don't see a way to improve the current fix. It may be 
possible to rewrite OGL render queue synchronization, but it's out of 
scope of this fix.



> --
> best regards,
> Anthony
>> Thanks,
>> Artem
>>> --
>>> best regards,
>>> Anthony
>>> On 07/01/2013 07:24 PM, Anton Litvinov wrote:
>>>> Hello Anthony,
>>>> Thank you for the review of this fix. I would like to remark that this
>>>> deadlock is a regression of the fix for the bug 8005607, and in the
>>>> code
>>>> of the file "jdk/src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c"
>>>> before 8005607 fix, where the previous XError handling mechanism not
>>>> involving "sun.awt.X11.XErrorHandlerUtil" class was used, native
>>>> "XSync()" function was called without acquiring of AWT lock. So a fix
>>>> for the current bug with a deadlock just reverted a part of the fix
>>>> 8005607 which enforced taking AWT lock from the function
>>>> "Java_sun_java2d_opengl_GLXSurfaceData_initPbuffer" in the file
>>>> "jdk/src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c". Answers to
>>>> your questions are provided below.
>>>> 1. "AWT EventQueue" holds AWT lock and waits till "Java2D Queue
>>>> Flusher"
>>>> thread finishes its job, because in the method
>>>> "sun.java2d.opengl.OGLSurfaceData.initSurface(final int width, final
>>>> int
>>>> height)" execution of "initSurfaceNow(int width, int height)" is
>>>> dispatched to "Java2D Queue Flusher". Before this dispatching in the
>>>> method "initSurface" AWT lock is taken by the lines
>>>>      308        OGLRenderQueue rq = OGLRenderQueue.getInstance();
>>>>      309        rq.lock();
>>>> and then with held AWT lock "AWT EventQueue" thread starts waiting on
>>>> the second lock "sun.java2d.opengl.OGLRenderQueue.flusher" in the
>>>> method
>>>> "sun.java2d.opengl.OGLRenderQueue.QueueFlusher.flushNow()"
>>>>      181                    wait();
>>>> 2. Yes, I investigated the option of waiting on AWT lock instead of
>>>> "sun.java2d.opengl.OGLRenderQueue.flusher" lock in the class
>>>> "sun.java2d.opengl.OGLRenderQueue", but this is impossible, because
>>>> access to the always running thread "Java2D Queue Flusher" should be
>>>> synchronized on some lock other than AWT lock, otherwise there will
>>>> be a
>>>> performance degradation, because it will be trying to get AWT lock each
>>>> 100 milliseconds. As I understood a possible solution for this problem
>>>> can be not locking on AWT lock before dispatching execution of any code
>>>> to "Java2D Queue Flusher" or complete refactoring of locking mechanism
>>>> in the class "sun.java2d.opengl.OGLRenderQueue". Since the current bug
>>>> blocks SQE from running any tests involving OpenGL and does not
>>>> allow to
>>>> run any Java GUI application with enabled OpenGL rendering on Linux OS,
>>>> I suppose the variant of refactoring is not acceptable. That is why as
>>>> the most secure solution I decided just to call XSync() from
>>>> "jdk/src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c" as it was
>>>> before the fix for 8005607.
>>>> Thank you,
>>>> Anton
>>>> On 7/1/2013 3:11 PM, Anthony Petrov wrote:
>>>>> Hi Anton,
>>>>> I'm not sure if this a good fix since it enabled the GL thread to call
>>>>> Xlib APIs w/o acquiring the AWTLock. This may not present a problem
>>>>> currently since we know exactly when this method is called and that
>>>>> another thread is holding the lock and isn't calling other X11
>>>>> functions at the moment. But I doubt this knowledge will be widely
>>>>> known and remembered in the future, and if another thread starts
>>>>> calling X11 routines, we'll get into trouble...
>>>>> Why would another thread (the AWT EventQueue if I got the problem
>>>>> right) hold the AWTLock and wait till the GL thread finishes its job?
>>>>> I'd assume it should release the lock for the period of waiting. This
>>>>> would allow the GL thread to acquire the lock and perform the XSync()
>>>>> call w/o any potential issues. Have you investigated this option?
>>>>> --
>>>>> best regards,
>>>>> Anthony
>>>>> On 06/28/2013 09:16 PM, Anton Litvinov wrote:
>>>>>> Hello,
>>>>>> Could you please review the following fix for a bug, which consists
>>>>>> in a
>>>>>> deadlock provoked by concurrency between AWT-EventQueue and Java2D
>>>>>> Queue
>>>>>> Flusher for taking AWT lock, when OpenGL rendering is enabled.
>>>>>> Bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8015730
>>>>>> Webrev: http://cr.openjdk.java.net/~alitvinov/8015730/webrev.00
>>>>>> The fix allows the code from the native function
>>>>>> "Java_sun_java2d_opengl_GLXSurfaceData_initPbuffer" of the file
>>>>>> "jdk/src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c" execute
>>>>>> all
>>>>>> XError handling procedures using "sun.awt.X11.XErrorHandlerUtil"
>>>>>> class
>>>>>> without acquiring AWT lock. It is the only available solution for
>>>>>> this
>>>>>> problem, because the current design of
>>>>>> "sun.java2d.opengl.OGLRenderQueue" class does not allow to take AWT
>>>>>> lock
>>>>>> in Java2D Queue Flusher thread without reaching a deadlock, since all
>>>>>> calls to the method
>>>>>> "sun.java2d.opengl.OGLRenderQueue.flushAndInvokeNow(Runnable r)" are
>>>>>> guarded by AWT lock.
>>>>>> Thank you,
>>>>>> Anton

More information about the awt-dev mailing list