From linuxhippy at gmail.com Sat Mar 1 03:05:16 2008 From: linuxhippy at gmail.com (Clemens Eisserer) Date: Sat, 1 Mar 2008 12:05:16 +0100 Subject: [OpenJDK 2D-Dev] DRAFT PROPOSAL: Implement XRender pipeline for Java2D In-Reply-To: <47C89CA3.9060603@Sun.COM> References: <194f62550802290947o600adab5x66e2fd0c1026eac@mail.gmail.com> <47C89CA3.9060603@Sun.COM> Message-ID: <194f62550803010305q196ff8d8g3525f57ca6130312@mail.gmail.com> Hi Dmitri, > this looks like a good project. > We currently have no solution for the remote X > case, XRender pipeline may be a good answer. XRender would also have many advantages when used on a local X-Server... Scaling, transformations, antialiasing, composition ... :) > The requirements are simple: be at least as fast, stable > and produce very similar rendering as the X11 pipeline =) > It also should work on at least the same platforms > as openjdk - meaning Solaris and Linux. Of course, these are also my goals. In general it should be faster than the X11 pipeline, and also faster than the OpenGL pipeline for simple primitives. > Some notes - if this project is to be integrated into > the openjdk at some point as the default pipeline on unix > (with the current X11 pipeline as backup) it will have to meet > very stringent requirements in terms of performance, > stability and rendering quality. I don't think that I can reach integration-quality within the few months the challenge is running, at least I would not have the heart to do so in such a short timeframe (it should have been extensive tested and benchmarked before integration). But of course my goal is to produce something working very well, which can be used out-of-the box by almost everybody with a modern X-Server. However as you know I am very interested to get this integrated in the long-term, so I won't stand still after the challenge. > I can be your contact point for the project, I'll > point you to other people if needed. But unfortunately > we're all super-busy now, so don't expect immediate > responses.. That would be really great, thanks a lot :) Thanks a lot for beeing that patient, lg Clemens 2008/3/1, Dmitri Trembovetski : > > Hi Clemens, > > We currently have no solution for the remote X > case, XRender pipeline may be a good answer. > > Some notes - if this project is to be integrated into > the openjdk at some point as the default pipeline on unix > (with the current X11 pipeline as backup) it will have to meet > very stringent requirements in terms of performance, > stability and rendering quality. > > The requirements are simple: be at least as fast, stable > and produce very similar rendering as the X11 pipeline =) > > It also should work on at least the same platforms > as openjdk - meaning Solaris and Linux. > > I can be your contact point for the project, I'll > point you to other people if needed. But unfortunately > we're all super-busy now, so don't expect immediate > responses.. > > Thanks, > > Dmitri > > > Clemens Eisserer wrote: > > Hi, > > > > Also sorry for cross-posting, I thought it wouldn't be a bad idea to > > let you guys on the 2d-dev list know about my proposal. Sorry for the > > traffic. > > > > I would be very happy about suggestions and ideas howto enhance the > > proposal as well as maybe some patronage at > > challenge-discuss at openjdk.java.net. > > I have to admit the best that I could imagine is something like a > > supporter directly from Sun, like what "Charles Oliver Nutter" did > > for the Better-native-Interface-Proposal ;) > > > > Thanks for reading, lg Clemens > > > > --------------------------------------------------------------------------------------- > > 1.) Introduction > > The more advanced features of Java2D are used in a lot new key > > technologies like JavaFX, Nimbus, SVG, Arith and even native Swing > > themes like the GtkLookAndFeel and without any doubt their use will > > grow further. > > Since its introduction, Java2D supports very advanced rendering > > operations, although only very basic operations were accelerated by > > hardware, because the underlying APIs were very poor. > > However software-rendering leads to low performance and bad user > > experience in this case. > > > > Modern 3D graphic APIs, like OpenGL or Direct3D, allow to map most of > > Java2D's functionality without the need for expensive read-backs > > and/or software rendering. > > To make that functionality available to the Java platform two new > > ?pipelines" where created: > > The OpenGL pipeline for non Windows platforms, and the Direct3D > > pipeline for the Microsoft Windows operating systems only. > > However OpenGL on Unix in general has some disadvantages compared to > > X11 for typical desktop applications, because of poor drivers its > > often not possible to use that new functionality and Java falls back > > to the default X11 rendering pipeline. > > > > An Xrender backend for Java2D would provide almost the same features > > as the existing OpenGL pipeline, however would run on a much larger > > set of hardware and would integrate better into existing X11 based > > desktops. > > With the Xrender backend developers and users could rely on almost the > > same set of functionality accelerated by default on all platforms > > supported currently by OpenJDK, e.g. JavaFX animations could run at > > almost the same speed everywhere. > > > > > > 2.) Status Quo > > The existing X11 pipeline used on Unix systems is only capable of > > accelerating basic operations and falls back to software rendering > > even for often used operations like antialiasing (even for text!) or > > translucent images often causing expensive VRAM readbacks. > > Because of X11's very limited rendering capabilities Xrender was > > introduced back in 2001, and after a long delay first drivers are > > available which accelerate large areas of Xrender. > > > > XRender allows to accelerate most of Java2D's functionality, and it is > > considered to be the de-facto standard for advanced 2D graphics, > > already used by a large set of libraries and toolkits like Cairo, > > GTK+, QT4 and KDE4. > > Because of the broad and extensive use of Xrender there is currently > > major work ongoing to modify drivers for better Xrender performance. > > > > There is already an OpenGL backend for Java2D which is about as > > powerful as its Direct3D counterpart on Windows. > > Unlike on Windows where Microsoft starting with Windows Vista > > more or less deprecated all interfaces except Direct3D, and therefor > > forces GPU/driver manufacturers to create drivers well suited also for > > 2D rendering, on Unix good > > OpenGL drivers are still rare and only proprietary drivers are till > > now able to run the OpenGL pipeline. > > However those proprietary drivers are often optimized for heavy-weight 3D > > applications like games and are not really suited for rendering > > light-weight simple 2D applications. > > > > A small overview about the advantages/disadvantages Xrender has > > compared to OpenGL: > > > > + Lower per-primitive overhead: > > Because Xrender is compared to OpenGL a lot simpler, there is less > > work for the accelerating driver to do for each primitive rendering > > operation. This is especially important for complex shapes which are > > transformed to many rectangles which then are passed to the underlying > > graphic API. > > To make it short: Because Xrender focuses on only a small set of > > ?simple" operations, driver implementers can focus on making those few > > operations as fast as possible. > > > > + Support for higher-level primitives: > > Xrender has support for client-side font rendering, which means glyphs > > which are cached in vram can be shared across different applications. > > For OpenGL each application has to upload its glyphs into a private > > texture wasting large amounts of Vram. > > Because it was designed with 2D graphics in mind many features can be > > directly mapped to Xrender without the need of complex shaders. > > > > + Reliable AntiAliasing: > > Because Xrender specifies exactly how antialiasing has to be done, > > Java2D can rely on Xrender's antialiasing capabilities. The Direct3D > > and OpenGL pipelines render a alpha-mask in software, upload it to > > vram and only let the GPU only do the final composition step. > > Because of this, the amount of data transferred between the Xserver > > and the Java application would be reduced a lot, making antialising > > even usable over slow remote connections. > > > > + Better driver support: > > On Linux only proprietary OpenGL implementations provide enough > > functionality to let the OpenGL pipeline run. However even those > > proprietary drivers were/are buggy in the code-paths exercised by the > > OpenGL pipeline - sometimes new drivers even broke the OpenGL > > pipeline. > > For Xrender there are already many optimized drivers, both open-source > > (Intel, ATI, ...) and proprietary (Nvidia). Because of the high > > adaption rate of Xrender lately (KDE4, Cairo, GTK+, ...) it can be > > expected that soon more and more drivers will support Xrender > > acceleration. > > > > + Better desktop integration: > > Xrender applications are ?native" X11 applications, and therefor work > > well with composition managers. They also get visuals optimized for > > 2D, and do not waste Vram with additional buffers they get, although > > they are not used. > > > > + Better network transparency: > > Although OpenGL can also be made network-transparent via GLX, the > > OpenGL API was not designed to perform especially well in a network > > environment. > > X11/XRender relies on the Client-Server approach and therefor is > > optimized for exactly this case. > > > > - Not programmable: > > A disadvantage of Xrender is that its not as flexible as OpenGL. > > OpenGL allows to use shaders to implement features not provided by > > fixed functionality, the only solution for a Xrender based backend is > > to fall back to software rendering. > > > > > > 3.) Goals > > The goal of this proposal is to make use of Xrender for accelerating > > even advanced Java2D functionality. > > This would allow multimedia applications/content like JavaFX to take > > advantage of modern hardware, and would bring JDK6Update10 graphic > > features also to X11/Unix based platforms. > > > > Some of the most important enhancements compared to the X11 pipeline include: > > * Accelerated text rendering for normal, antialiased and subpixel > > antialiased text. > > * Support for translucent images > > * Composition (colors with alpha values, ...) > > * Antialiasing > > > > A long-term goal would be to make the existing X11 pipeline obsolete > > once the XRender backend has prooven to be stable, removing the burden > > of maintaining two, in their dedication similar, backends. > > > > > > 4.) Approach > > XRender has a lot in common with OpenGL/Direct3D and is a very > > low-level API which supports only a limited number of drawing > > primitives but it is very flexible in the way of operations it > > operates with those operations. Therefor the approach would be to > > create a brand new backend based on the shared code developed for the > > OpenGL/Direct3D pipelines in JDK6-u10. > > The advantage of this approach would be a ?clean start", as well as a > > lot shared code between OpenGL/Direct3D/XRender backends. > > The downside is that some highly optimized primitives which are > > supported by X11's core drawing (Arcs, Elipses, ..) maybe perform a > > little bit less well. Furthermore new code needs a lot of testing. > > > > > > 5.) Milestones > > 5.1) Create a skeleton which is able to use the shared OpenGL/Direct3D code > > 5.2) Implement only enough to functionality to make > > software-rendering-loops work. > > 5.3) Extend the created skeleton, accelerate basic operations -> Make > > typical Swing/Ocean applications work well. > > 5.4) Accelerate more advanced functionality like gradients, > > composition and antialising -> Accalerate most of Java2Demo, make the > > Nimbus LnF work well. > > 5.5) Extensive profiling and compatibility testing, using self > > written, Sun's jtreg and real- world tests, reporting performance and > > implementation problems back to driver programmers. Make sure large > > graphical applications work well. > > > > > > 6.) Developer > > Clemens Eisserer is Student at the Technical University of Vienna > > (Information Technology) and: > > * Has submitted patches and regression tests to jdk-collaboration > > (plugin, webstart) and OpenJDK (Java2D, Corelibs) > > * Implemented JLayeredPane for SwingWT: http://swingwt.sourceforge.net/ > > * Ported Kaffe's XAWT to GCJ: http://sf.net/projects/xawt > > * Is working on an open-source remote-swing project: > > http://juibrowser.sourceforge.net/ > > * Compiled and packaged a free JVM for the Nokia770 internet tablet. > > * Is working on enhancing GTK's theme API and later on adapting > > GtkLookAndFeel to take advantage of those enhancements: > > http://bugzilla.gnome.org/show_bug.cgi?id=515600 > > * Finished some commercial projects written in Java with focus on > > client-server architectures > > > > > > 7.) Links, References, ... > > http://keithp.com/~keithp/render/ > > http://www.nabble.com/Best-way-organizing-XRender-additions-for-the-X11-pipeline--td14152122.html > > http://www.phoronix.com/scan.php?page=article&item=934&num=1 > > http://bugs.sun.com/view_bug.do?bug_id=5086814 (RFE: Java 2D: consider > > Xrender for anti-aliased text on remote X11) > > http://bugs.sun.com/view_bug.do?bug_id=6307603 ([X11] Use RENDER > > extension for complex operations done in software) > From phil.race at sun.com Fri Mar 7 12:18:23 2008 From: phil.race at sun.com (phil.race at sun.com) Date: Fri, 07 Mar 2008 20:18:23 +0000 Subject: [OpenJDK 2D-Dev] hg: jdk7/2d/jdk: 6640532: Graphics.getFontMetrics() throws NullPointerException Message-ID: <20080307201835.DBD71264F5@hg.openjdk.java.net> Changeset: 52f4ad84d5f0 Author: prr Date: 2008-03-07 12:13 -0800 URL: http://hg.openjdk.java.net/jdk7/2d/jdk/rev/52f4ad84d5f0 6640532: Graphics.getFontMetrics() throws NullPointerException Summary: NIO usage needs to be robust against Thread.interrupt() Reviewed-by: tdv ! src/share/classes/sun/font/FontManager.java + test/java/awt/font/Threads/FontThread.java From roman.kennke at aicas.com Mon Mar 10 01:28:21 2008 From: roman.kennke at aicas.com (Roman Kennke) Date: Mon, 10 Mar 2008 09:28:21 +0100 Subject: [OpenJDK 2D-Dev] [PATCH] SurfaceManagerFactory In-Reply-To: <47C48232.50009@Sun.COM> References: <1204044836.7087.42.camel@mercury> <47C453C7.9060402@Sun.COM> <1204052914.7087.53.camel@mercury> <47C48232.50009@Sun.COM> Message-ID: <1205137701.13416.11.camel@mercury> Hi, > > This makes sense, because BufImgVolatileSurfaceManager is required for > > all targets. > > Yep, you're right, that should cover it. > > I'd wait for another reviewer and if there are no > objections we could integrate it. Any progress on this one? Cheers, Roman -- Dipl.-Inform. (FH) Roman Kennke, Software Engineer, http://kennke.org aicas Allerton Interworks Computer Automated Systems GmbH Haid-und-Neu-Stra?e 18 * D-76131 Karlsruhe * Germany http://www.aicas.com * Tel: +49-721-663 968-0 USt-Id: DE216375633, Handelsregister HRB 109481, AG Karlsruhe Gesch?ftsf?hrer: Dr. James J. Hunt From Dmitri.Trembovetski at Sun.COM Mon Mar 10 20:32:42 2008 From: Dmitri.Trembovetski at Sun.COM (Dmitri Trembovetski) Date: Mon, 10 Mar 2008 20:32:42 -0700 Subject: [OpenJDK 2D-Dev] [PATCH] SurfaceManagerFactory In-Reply-To: <1205137701.13416.11.camel@mercury> References: <1204044836.7087.42.camel@mercury> <47C453C7.9060402@Sun.COM> <1204052914.7087.53.camel@mercury> <47C48232.50009@Sun.COM> <1205137701.13416.11.camel@mercury> Message-ID: <47D5FD5A.6050805@Sun.COM> Roman, I'm sorry, I'm completely swamped with a high priority bug. I'll try to file a bug and a code review tomorrow. Dmitri Roman Kennke wrote: > Hi, > >>> This makes sense, because BufImgVolatileSurfaceManager is required for >>> all targets. >> Yep, you're right, that should cover it. >> >> I'd wait for another reviewer and if there are no >> objections we could integrate it. > > Any progress on this one? > > Cheers, Roman > From Dmitri.Trembovetski at Sun.COM Fri Mar 14 13:16:18 2008 From: Dmitri.Trembovetski at Sun.COM (Dmitri Trembovetski) Date: Fri, 14 Mar 2008 13:16:18 -0700 Subject: [OpenJDK 2D-Dev] [PATCH] SurfaceManagerFactory In-Reply-To: <47D5FD5A.6050805@Sun.COM> References: <1204044836.7087.42.camel@mercury> <47C453C7.9060402@Sun.COM> <1204052914.7087.53.camel@mercury> <47C48232.50009@Sun.COM> <1205137701.13416.11.camel@mercury> <47D5FD5A.6050805@Sun.COM> Message-ID: <47DADD12.3010507@Sun.COM> OK, I've submitted the bug (6675596) and applied the fix, and in the process of testing it. I can tell that someone didn't build on windows since the build fails there =) The fix should have deleted src/windows/classes/sun/java2d/SurfaceManagerFactory.java just like it did the solaris/ one. In general, if the fix touches shared code it is very advisable to build on all platforms - at least on 32-bit solaris/linux/windows. Thanks, Dmitri Dmitri Trembovetski wrote: > > Roman, I'm sorry, I'm completely swamped with a high > priority bug. I'll try to file a bug and a code review > tomorrow. > > Dmitri > > > Roman Kennke wrote: >> Hi, >> >>>> This makes sense, because BufImgVolatileSurfaceManager is required for >>>> all targets. >>> Yep, you're right, that should cover it. >>> >>> I'd wait for another reviewer and if there are no >>> objections we could integrate it. >> >> Any progress on this one? >> >> Cheers, Roman >> From roman.kennke at aicas.com Fri Mar 14 13:49:07 2008 From: roman.kennke at aicas.com (Roman Kennke) Date: Fri, 14 Mar 2008 21:49:07 +0100 Subject: [OpenJDK 2D-Dev] [PATCH] SurfaceManagerFactory In-Reply-To: <47DADD12.3010507@Sun.COM> References: <1204044836.7087.42.camel@mercury> <47C453C7.9060402@Sun.COM> <1204052914.7087.53.camel@mercury> <47C48232.50009@Sun.COM> <1205137701.13416.11.camel@mercury> <47D5FD5A.6050805@Sun.COM> <47DADD12.3010507@Sun.COM> Message-ID: <1205527747.9971.26.camel@mercury> Hi Dmitri, > OK, I've submitted the bug (6675596) and applied the > fix, and in the process of testing it. Great, thanks! > I can tell that someone didn't build on windows > since the build fails there =) The fix should have deleted > src/windows/classes/sun/java2d/SurfaceManagerFactory.java > just like it did the solaris/ one. Whoops, did I forget that? I was sure I deleted it locally, so somehow it didn't make it into the patch. Sorry for that. > In general, if the fix touches shared code it is > very advisable to build on all platforms - at > least on 32-bit solaris/linux/windows. The problem is, I don't have a Windows box (easily) available. I will have to setup one at work, including all the build machinery for OpenJDK. Thanks Dmitri, Roman -- Dipl.-Inform. (FH) Roman Kennke, Software Engineer, http://kennke.org aicas Allerton Interworks Computer Automated Systems GmbH Haid-und-Neu-Stra?e 18 * D-76131 Karlsruhe * Germany http://www.aicas.com * Tel: +49-721-663 968-0 USt-Id: DE216375633, Handelsregister HRB 109481, AG Karlsruhe Gesch?ftsf?hrer: Dr. James J. Hunt From Dmitri.Trembovetski at Sun.COM Fri Mar 14 13:55:42 2008 From: Dmitri.Trembovetski at Sun.COM (Dmitri Trembovetski) Date: Fri, 14 Mar 2008 13:55:42 -0700 Subject: [OpenJDK 2D-Dev] [PATCH] SurfaceManagerFactory In-Reply-To: <1205527747.9971.26.camel@mercury> References: <1204044836.7087.42.camel@mercury> <47C453C7.9060402@Sun.COM> <1204052914.7087.53.camel@mercury> <47C48232.50009@Sun.COM> <1205137701.13416.11.camel@mercury> <47D5FD5A.6050805@Sun.COM> <47DADD12.3010507@Sun.COM> <1205527747.9971.26.camel@mercury> Message-ID: <47DAE64E.2060409@Sun.COM> Hi Roman, Roman Kennke wrote: >> I can tell that someone didn't build on windows >> since the build fails there =) The fix should have deleted >> src/windows/classes/sun/java2d/SurfaceManagerFactory.java >> just like it did the solaris/ one. > > Whoops, did I forget that? I was sure I deleted it locally, so somehow > it didn't make it into the patch. Sorry for that. No problem, I should have spotted it during review anyway. I'm just not used to reviewing by looking at the raw diff output, we use that webrev tool for generating html for code reviews, much easier on the eyes. >> In general, if the fix touches shared code it is >> very advisable to build on all platforms - at >> least on 32-bit solaris/linux/windows. > > The problem is, I don't have a Windows box (easily) available. I will > have to setup one at work, including all the build machinery for > OpenJDK. I myself use vmware to compile on windows if needed- then you'd only need a windows license (and the compilers - although someone did compile the jdk with the free VS2008, I believe). Of course, if you'll be touching any hw-acceleration related stuff, you'll need a real box to test on, so you may as well get the box. Thanks, Dmitri From linuxhippy at gmail.com Sat Mar 22 16:42:17 2008 From: linuxhippy at gmail.com (Clemens Eisserer) Date: Sun, 23 Mar 2008 00:42:17 +0100 Subject: [OpenJDK 2D-Dev] Thread-Private RenderBuffers for RenderQueue? Message-ID: <194f62550803221642j16b3bfeaw394c9119f492d605@mail.gmail.com> Hello, 1.) While studying RenderQueue from the STR based 2d pipelines I notices synchronization happening for every command put in the queue. I don't know wether tuning this is worth anything (guess not), but are there cases which make thread-private RenderBuffers impossible (maybe with a limit of buffers allocated)? Many threads rendering to the same surface don't have any guarantees anyway when an operation is performed. Or is ReentrantLock so fast that it really does not make any sence worriyng about? 2.) I have JDK7b19, I am not completly sure wether the sync()-method is 100% thread-safe, or wether this is ok in that case. The field "theInstance" is written to in a synchronized-block (in getInstance()) but in sync() its accassed without synchronization. It may appear null (as far as I understand) in one thread while another has already set it. Don't know wether this could be a problem? Thanks, Clemens From linuxhippy at gmail.com Sat Mar 22 16:53:58 2008 From: linuxhippy at gmail.com (Clemens Eisserer) Date: Sun, 23 Mar 2008 00:53:58 +0100 Subject: [OpenJDK 2D-Dev] Thread-Private RenderBuffers for RenderQueue? In-Reply-To: <194f62550803221642j16b3bfeaw394c9119f492d605@mail.gmail.com> References: <194f62550803221642j16b3bfeaw394c9119f492d605@mail.gmail.com> Message-ID: <194f62550803221653i1f18d106s4e6443d04bd1be28@mail.gmail.com> Sorry for the confusion in 2. , I meant OGLRenderQueue.sync(). lg Clemens 2008/3/23, Clemens Eisserer : > Hello, > > 1.) While studying RenderQueue from the STR based 2d pipelines I > notices synchronization happening for every command put in the queue. > > I don't know wether tuning this is worth anything (guess not), but are > there cases which make thread-private RenderBuffers impossible (maybe > with a limit of buffers allocated)? Many threads rendering to the same > surface don't have any guarantees anyway when an operation is > performed. Or is ReentrantLock so fast that it really does not make > any sence worriyng about? > > 2.) I have JDK7b19, I am not completly sure wether the sync()-method > is 100% thread-safe, or wether this is ok in that case. > The field "theInstance" is written to in a synchronized-block (in > getInstance()) but in sync() its accassed without synchronization. It > may appear null (as far as I understand) in one thread while another > has already set it. Don't know wether this could be a problem? > > Thanks, Clemens > From Dmitri.Trembovetski at Sun.COM Mon Mar 24 09:44:51 2008 From: Dmitri.Trembovetski at Sun.COM (Dmitri Trembovetski) Date: Mon, 24 Mar 2008 09:44:51 -0700 Subject: [OpenJDK 2D-Dev] Thread-Private RenderBuffers for RenderQueue? In-Reply-To: <194f62550803221642j16b3bfeaw394c9119f492d605@mail.gmail.com> References: <194f62550803221642j16b3bfeaw394c9119f492d605@mail.gmail.com> Message-ID: <47E7DA83.8050700@Sun.COM> Hi Clemens, Clemens Eisserer wrote: > Hello, > > 1.) While studying RenderQueue from the STR based 2d pipelines I > notices synchronization happening for every command put in the queue. > > I don't know wether tuning this is worth anything (guess not), but are > there cases which make thread-private RenderBuffers impossible (maybe > with a limit of buffers allocated)? Many threads rendering to the same > surface don't have any guarantees anyway when an operation is > performed. Or is ReentrantLock so fast that it really does not make > any sence worriyng about? Since most applications do render from one thread (either the Event Queue like Swing apps, or some kind of dedicated rendering thread like games), the lock is indeed very fast, given biased locking and such. I would suggest not trying to optimize things - especially tricky ones which involve locking - until you have identified with some kind of tool that there's a problem. > 2.) I have JDK7b19, I am not completly sure wether the sync()-method > is 100% thread-safe, or wether this is ok in that case. > The field "theInstance" is written to in a synchronized-block (in > getInstance()) but in sync() its accassed without synchronization. It > may appear null (as far as I understand) in one thread while another > has already set it. Don't know wether this could be a problem? If it appears null during a sync() call, no harm is done (the sync is just ignored - which is fine given that the render queue hasn't been created yet, so there's nothing to sync), so this is not a problem. Thanks, Dmitri From linuxhippy at gmail.com Mon Mar 24 10:57:26 2008 From: linuxhippy at gmail.com (Clemens Eisserer) Date: Mon, 24 Mar 2008 18:57:26 +0100 Subject: [OpenJDK 2D-Dev] Thread-Private RenderBuffers for RenderQueue? In-Reply-To: <47E7DA83.8050700@Sun.COM> References: <194f62550803221642j16b3bfeaw394c9119f492d605@mail.gmail.com> <47E7DA83.8050700@Sun.COM> Message-ID: <194f62550803241057n167a8259w750a0075fbda2e04@mail.gmail.com> Hello, > Since most applications do render from one thread (either the > Event Queue like Swing apps, or some kind of dedicated rendering > thread like games), the lock is indeed very fast, given > biased locking and such. > > I would suggest not trying to optimize things - especially tricky > ones which involve locking - until you have > identified with some kind of tool that there's a problem. I did some benchmarking to find out the best design for my new pipeline, and these are the results I got: 10mio solid 1x1 rect, VolatileImage, server-compiler, Core2Duo-2ghz, Intel-945GM, Linux: 200ms no locking, no native call 650ms locking only 850ms native call, no locking 1350ms as currently implemented in X11Renderer I did rendering only from a single thread (however not the EDT), in this simple pipeline-overhead test the locking itself is almost as expensive as the "real" work (=native call), and far more expensive than an "empty" JNI call. However this was on a dual-core machine, on my single-core amd64 machine locking has much less influence. As far as I know biased locking is only implemented for monitors. Xorg ran on my 2nd core, and kept it with locking only 40% busy, without locking about 80%. However I have to admit there are probably much more important things to do than playing with things like that ;) > If it appears null during a sync() call, no harm is done (the > sync is just ignored - which is fine given that the render queue > hasn't been created yet, so there's nothing to sync), so this is > not a problem. But what does happen if it has already been created, but the thread calling sync() just does not see the updated "theInstance" value? Could there be any problem when sync()-calls are left out? lg Clemens From Dmitri.Trembovetski at Sun.COM Mon Mar 24 14:44:16 2008 From: Dmitri.Trembovetski at Sun.COM (Dmitri Trembovetski) Date: Mon, 24 Mar 2008 14:44:16 -0700 Subject: [OpenJDK 2D-Dev] Thread-Private RenderBuffers for RenderQueue? In-Reply-To: <194f62550803241057n167a8259w750a0075fbda2e04@mail.gmail.com> References: <194f62550803221642j16b3bfeaw394c9119f492d605@mail.gmail.com> <47E7DA83.8050700@Sun.COM> <194f62550803241057n167a8259w750a0075fbda2e04@mail.gmail.com> Message-ID: <47E820B0.60800@Sun.COM> Hi Clemens. Clemens Eisserer wrote: > Hello, > >> Since most applications do render from one thread (either the >> Event Queue like Swing apps, or some kind of dedicated rendering >> thread like games), the lock is indeed very fast, given >> biased locking and such. >> >> I would suggest not trying to optimize things - especially tricky >> ones which involve locking - until you have >> identified with some kind of tool that there's a problem. > > I did some benchmarking to find out the best design for my new > pipeline, and these are the results I got: > > 10mio solid 1x1 rect, VolatileImage, server-compiler, Core2Duo-2ghz, > Intel-945GM, Linux: > > 200ms no locking, no native call > 650ms locking only > 850ms native call, no locking > 1350ms as currently implemented in X11Renderer Did you mean OGLRenderer? The X11Renderer doesn't use single thread rendering model and thus doesn't need render queue. Note that on X11 the render queue lock is doubled as the lock against all X11 access - for both awt and 2d. We must lock around it because we all use the same display, and X11 is not multi-threaded (at least in the way we use it). This means that the lock is likely to be promoted to a heavyweight lock, which is why it is expensive. So the problem with having separate render buffers per thread is that at some point you will have to synchronize on SunToolkit.awtLock() anyway. > I did rendering only from a single thread (however not the EDT), in > this simple pipeline-overhead test the locking itself is almost as > expensive as the "real" work (=native call), and far more expensive > than an "empty" JNI call. > However this was on a dual-core machine, on my single-core amd64 > machine locking has much less influence. As far as I know biased > locking is only implemented for monitors. > Xorg ran on my 2nd core, and kept it with locking only 40% busy, > without locking about 80%. > > However I have to admit there are probably much more important things > to do than playing with things like that ;) You probably can explore ways to improve the current design, which only allows a single rendering queue. For example, we had discussed the possibility of extending the STR design to allow a rendering thread per destination. But again, on unix it will bump against the need to sync around X11 access. You can also play with having a render buffer per thread as you suggest, but your rendering thread will have to sync for reading from each render buffer - presumably on the same lock as the thread used to put stuff into that buffer. All doable, but risky and hard to assess the benefits before you have a working implementation. Just commenting out locks gives wrong impression, since the resulting code becomes incorrect and thus the benchmark results can't be trusted. Anyway, I would suggest that you look at optimizing this later. >> If it appears null during a sync() call, no harm is done (the >> sync is just ignored - which is fine given that the render queue >> hasn't been created yet, so there's nothing to sync), so this is >> not a problem. > But what does happen if it has already been created, but the thread > calling sync() just does not see the updated "theInstance" value? > Could there be any problem when sync()-calls are left out? If the thread calling sync() sees theInstance as null, this means that it could not have anything to sync. If there's no queue, it could not have put anything into that queue prior to calling sync(). The sync() can be safely ignored. Thanks, Dmitri From Dmitri.Trembovetski at Sun.COM Mon Mar 24 15:23:04 2008 From: Dmitri.Trembovetski at Sun.COM (Dmitri Trembovetski) Date: Mon, 24 Mar 2008 15:23:04 -0700 Subject: [OpenJDK 2D-Dev] Thread-Private RenderBuffers for RenderQueue? In-Reply-To: <47E820B0.60800@Sun.COM> References: <194f62550803221642j16b3bfeaw394c9119f492d605@mail.gmail.com> <47E7DA83.8050700@Sun.COM> <194f62550803241057n167a8259w750a0075fbda2e04@mail.gmail.com> <47E820B0.60800@Sun.COM> Message-ID: <47E829C8.40509@Sun.COM> Chris pointed to me that currently SunToolkit.lock() uses ReentrantLock which is supposed to have better characteristics than built-in Java synchronization under contention. So it would be interesting to see exactly what you were measuring, and how. Also, if you're doing any kind of Java2D performance testing I would encourage to use J2DBench as the benchmark (can be found in jdk/src/share/demo/J2DBench). You can plug in new tests if the existing ones don't match what you want to test. Thanks, Dmitri Dmitri Trembovetski wrote: > > Hi Clemens. > > Clemens Eisserer wrote: >> Hello, >> >>> Since most applications do render from one thread (either the >>> Event Queue like Swing apps, or some kind of dedicated rendering >>> thread like games), the lock is indeed very fast, given >>> biased locking and such. >>> >>> I would suggest not trying to optimize things - especially tricky >>> ones which involve locking - until you have >>> identified with some kind of tool that there's a problem. >> >> I did some benchmarking to find out the best design for my new >> pipeline, and these are the results I got: >> >> 10mio solid 1x1 rect, VolatileImage, server-compiler, Core2Duo-2ghz, >> Intel-945GM, Linux: >> >> 200ms no locking, no native call >> 650ms locking only >> 850ms native call, no locking >> 1350ms as currently implemented in X11Renderer > > Did you mean OGLRenderer? The X11Renderer doesn't use single > thread rendering model and thus doesn't need render queue. > > Note that on X11 the render queue lock is doubled as the lock against > all X11 access - for both awt and 2d. We must lock around it because > we all use the same display, and X11 is not multi-threaded (at > least in the way we use it). > This means that the lock is likely to be promoted to a heavyweight lock, > which is why it is expensive. > > So the problem with having separate render buffers per thread is that > at some point you will have to synchronize on SunToolkit.awtLock() > anyway. > >> I did rendering only from a single thread (however not the EDT), in >> this simple pipeline-overhead test the locking itself is almost as >> expensive as the "real" work (=native call), and far more expensive >> than an "empty" JNI call. >> However this was on a dual-core machine, on my single-core amd64 >> machine locking has much less influence. As far as I know biased >> locking is only implemented for monitors. >> Xorg ran on my 2nd core, and kept it with locking only 40% busy, >> without locking about 80%. >> >> However I have to admit there are probably much more important things >> to do than playing with things like that ;) > > You probably can explore ways to improve the current design, > which only allows a single rendering queue. For example, > we had discussed the possibility of extending the STR design > to allow a rendering thread per destination. But again, > on unix it will bump against the need to sync around X11 access. > > You can also play with having a render buffer per thread as > you suggest, but your rendering thread will have to sync for > reading from each render buffer - presumably on the same lock > as the thread used to put stuff into that buffer. > All doable, but risky and hard to assess the benefits before > you have a working implementation. Just commenting out > locks gives wrong impression, since the resulting code > becomes incorrect and thus the benchmark results can't be > trusted. > > Anyway, I would suggest that you look at optimizing > this later. > >>> If it appears null during a sync() call, no harm is done (the >>> sync is just ignored - which is fine given that the render queue >>> hasn't been created yet, so there's nothing to sync), so this is >>> not a problem. >> But what does happen if it has already been created, but the thread >> calling sync() just does not see the updated "theInstance" value? >> Could there be any problem when sync()-calls are left out? > > If the thread calling sync() sees theInstance as null, this means > that it could not have anything to sync. If there's no queue, > it could not have put anything into that queue prior to > calling sync(). The sync() can be safely ignored. > > Thanks, > Dmitri From Christopher.Campbell at Sun.COM Mon Mar 24 15:38:05 2008 From: Christopher.Campbell at Sun.COM (Chris Campbell) Date: Mon, 24 Mar 2008 15:38:05 -0700 Subject: [OpenJDK 2D-Dev] Thread-Private RenderBuffers for RenderQueue? In-Reply-To: <47E820B0.60800@Sun.COM> References: <194f62550803221642j16b3bfeaw394c9119f492d605@mail.gmail.com> <47E7DA83.8050700@Sun.COM> <194f62550803241057n167a8259w750a0075fbda2e04@mail.gmail.com> <47E820B0.60800@Sun.COM> Message-ID: On Mar 24, 2008, at 2:44 PM, Dmitri Trembovetski wrote: > > Hi Clemens. > > Clemens Eisserer wrote: >> Hello, >>> Since most applications do render from one thread (either the >>> Event Queue like Swing apps, or some kind of dedicated rendering >>> thread like games), the lock is indeed very fast, given >>> biased locking and such. >>> >>> I would suggest not trying to optimize things - especially tricky >>> ones which involve locking - until you have >>> identified with some kind of tool that there's a problem. >> I did some benchmarking to find out the best design for my new >> pipeline, and these are the results I got: >> 10mio solid 1x1 rect, VolatileImage, server-compiler, Core2Duo-2ghz, >> Intel-945GM, Linux: >> 200ms no locking, no native call >> 650ms locking only >> 850ms native call, no locking >> 1350ms as currently implemented in X11Renderer > BTW, Clemens, when reporting microbenchmark scores, it would be a big help if you could use J2DBench to generate such numbers. It takes care of running enough iterations to produce a statistically useful number, and J2DAnalyzer helps visualize the numbers in a consistent format (to compare relative numbers such as these). > Did you mean OGLRenderer? The X11Renderer doesn't use single > thread rendering model and thus doesn't need render queue. > > Note that on X11 the render queue lock is doubled as the lock against > all X11 access - for both awt and 2d. We must lock around it because > we all use the same display, and X11 is not multi-threaded (at > least in the way we use it). > This means that the lock is likely to be promoted to a heavyweight > lock, > which is why it is expensive. > That may have been the case in JDK 5, where we used the "synchronized" keyword to manage synchronization of access to X11 in X11Renderer and other AWT classes. But in JDK 6 you'll recall that we reimplemented this synchronization to use ReentrantLock instead, most importantly because it offers better performance under contention (as is often the case with the "AWT lock"). (Yes, "built-in" synchronization has largely caught up since then, due to biased locking and other optimizations, but ReentrantLock is still a nice lightweight solution.) For more on ReentrantLock, this article from Brian Goetz is still the best summary that I've ever come across: http://www.ibm.com/developerworks/java/library/j-jtp10264/ Oh, and hooray, I just came across the bug report that I wrote up when moving to ReentrantLock in JDK 6, which has lots of details on the matter: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6317330 Thanks, Chris > So the problem with having separate render buffers per thread is that > at some point you will have to synchronize on SunToolkit.awtLock() > anyway. > >> I did rendering only from a single thread (however not the EDT), in >> this simple pipeline-overhead test the locking itself is almost as >> expensive as the "real" work (=native call), and far more expensive >> than an "empty" JNI call. >> However this was on a dual-core machine, on my single-core amd64 >> machine locking has much less influence. As far as I know biased >> locking is only implemented for monitors. >> Xorg ran on my 2nd core, and kept it with locking only 40% busy, >> without locking about 80%. >> However I have to admit there are probably much more important things >> to do than playing with things like that ;) > > You probably can explore ways to improve the current design, > which only allows a single rendering queue. For example, > we had discussed the possibility of extending the STR design > to allow a rendering thread per destination. But again, > on unix it will bump against the need to sync around X11 access. > > You can also play with having a render buffer per thread as > you suggest, but your rendering thread will have to sync for > reading from each render buffer - presumably on the same lock > as the thread used to put stuff into that buffer. > All doable, but risky and hard to assess the benefits before > you have a working implementation. Just commenting out > locks gives wrong impression, since the resulting code > becomes incorrect and thus the benchmark results can't be > trusted. > > Anyway, I would suggest that you look at optimizing > this later. > >>> If it appears null during a sync() call, no harm is done (the >>> sync is just ignored - which is fine given that the render queue >>> hasn't been created yet, so there's nothing to sync), so this is >>> not a problem. >> But what does happen if it has already been created, but the thread >> calling sync() just does not see the updated "theInstance" value? >> Could there be any problem when sync()-calls are left out? > > If the thread calling sync() sees theInstance as null, this means > that it could not have anything to sync. If there's no queue, > it could not have put anything into that queue prior to > calling sync(). The sync() can be safely ignored. > > Thanks, > Dmitri From linuxhippy at gmail.com Tue Mar 25 11:06:52 2008 From: linuxhippy at gmail.com (Clemens Eisserer) Date: Tue, 25 Mar 2008 19:06:52 +0100 Subject: [OpenJDK 2D-Dev] Thread-Private RenderBuffers for RenderQueue? In-Reply-To: References: <194f62550803221642j16b3bfeaw394c9119f492d605@mail.gmail.com> <47E7DA83.8050700@Sun.COM> <194f62550803241057n167a8259w750a0075fbda2e04@mail.gmail.com> <47E820B0.60800@Sun.COM> Message-ID: <194f62550803251106t240622acj949b8c2875834076@mail.gmail.com> Hello Chris, Hello Dmitri, 1.) Thanks for mentioning J2DBench, I'll have a look at it. 2.) > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6317330 Thanks for mentioning it, I already had a look at it. 3.) http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6253009 Mentions a deadlock problem that can occur with a seperate lock for the RenderQueue. For my X11 pipeline it would be enough to ensure only one thread does access xlib, it does not have to be always the queue-flush thread. So if I would allow sync()/flushNow() on any thread, the problem would not exist, right? 4.) > If the thread calling sync() sees theInstance as null, this means > that it could not have anything to sync As far as I understand the JMM, it could be that thread1 already called getInstance() (which creates and sets theInstance()), but thread2 calls sync() - but sees null. Don't know wether a lost sync() could be a problem at all. 5.) > Anyway, I would suggest that you look at optimizing > this later Yes, that would be probably the best. I was just a bit worried which design I should choose. The JNI-overhead itself (35 cycles, Core2Duo) is so small, that I am not sure wether the whole Buffered Rendering is a win at all. I benchmarked the switch-statement which is used to decode the command-stream and on my Core2Duo. Only calling the switch in a loop already takes 20 cycles (which is quite reasonable keeping in mind the generated table-jump puzzles the pipeline). Add the overhead of stream-encoding, inter-thread communication and I guess it's also somewhere between 30-50 cycles per j2d-primitive. However if I could remove most of the locking, which at least on my machine seems to add a lot of overhead, this would justify the additional code. With thread-private buffers, and all threads allowed to flush the queue themself instead of relying on the queue-flush-thread to do it, it should be possible. Sorry for the traffic and thanks for your patience, lg Clemens From Dmitri.Trembovetski at Sun.COM Tue Mar 25 11:35:08 2008 From: Dmitri.Trembovetski at Sun.COM (Dmitri Trembovetski) Date: Tue, 25 Mar 2008 11:35:08 -0700 Subject: [OpenJDK 2D-Dev] Thread-Private RenderBuffers for RenderQueue? In-Reply-To: <194f62550803251106t240622acj949b8c2875834076@mail.gmail.com> References: <194f62550803221642j16b3bfeaw394c9119f492d605@mail.gmail.com> <47E7DA83.8050700@Sun.COM> <194f62550803241057n167a8259w750a0075fbda2e04@mail.gmail.com> <47E820B0.60800@Sun.COM> <194f62550803251106t240622acj949b8c2875834076@mail.gmail.com> Message-ID: <47E945DC.2030404@Sun.COM> Hi Clemens, Clemens Eisserer wrote: > 3.) http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6253009 > Mentions a deadlock problem that can occur with a seperate lock for > the RenderQueue. > For my X11 pipeline it would be enough to ensure only one thread does > access xlib, it does not have to be always the queue-flush thread. So > if I would allow sync()/flushNow() on any thread, the problem would > not exist, right? > > 4.) >> If the thread calling sync() sees theInstance as null, this means >> that it could not have anything to sync > As far as I understand the JMM, it could be that thread1 already > called getInstance() (which creates and sets theInstance()), but > thread2 calls sync() - but sees null. Don't know wether a lost sync() > could be a problem at all. It might be a problem in an unlikely scenario where an application uses a thread to only call sync() from it and never calling any getInstance(). > > 5.) >> Anyway, I would suggest that you look at optimizing >> this later > Yes, that would be probably the best. > > I was just a bit worried which design I should choose. > The JNI-overhead itself (35 cycles, Core2Duo) is so small, that I am > not sure wether the whole Buffered Rendering is a win at all. > I benchmarked the switch-statement which is used to decode the > command-stream and on my Core2Duo. Only calling the switch in a loop > already takes 20 cycles (which is quite reasonable keeping in mind the > generated table-jump puzzles the pipeline). Add the overhead of > stream-encoding, inter-thread communication and I guess it's also > somewhere between 30-50 cycles per j2d-primitive. > > However if I could remove most of the locking, which at least on my > machine seems to add a lot of overhead, this would justify the > additional code. > With thread-private buffers, and all threads allowed to flush the > queue themself instead of relying on the queue-flush-thread to do it, > it should be possible. You're welcome to implement the pipeline however you wish. One of the main reasons behind STR was to improve the stability of the OpenGL pipeline since OpenGL doesn't like to be accessed from multiple threads - the JNI overhead reduction was a welcome benefit. Same happen to be the case for the new Direct3D pipeline. If your pipeline doesn't have this restriction and you apparently are satisfied with JNI performance, don't use STR for your pipeline. After all, the current X11 pipeline doesn't use it, and you can still use a few tricks from it to reduce the JNI overhead (like doing all validation on the java level). Thanks, Dmitri From aph at redhat.com Fri Mar 28 05:29:30 2008 From: aph at redhat.com (Andrew Haley) Date: Fri, 28 Mar 2008 12:29:30 +0000 Subject: [OpenJDK 2D-Dev] Missing colour profiles Message-ID: <47ECE4AA.9020205@redhat.com> Just a heads-up: I'm working on Bug 6523403, Need to provide lcms library with PYCC and LINEAR_RGB OS ICC profiles. I'm also looking at the causes of Bug 6523402, Some quality problems with GRAY, PYCC and CIEXYZ color spaces with lcms library I have created a LINEAR_RGB profile using the same primaries as the lcms sRGB profile, and I have verified that it's compatible with the Kodak-provided one in the proprietary JDK. The PYCC profile will take a little longer, and I've started working on that. Andrew.