Scene graph performance
johan.vos at gluonhq.com
Tue Jul 26 11:30:51 UTC 2016
I agree with most that has been said here, but I want to add a few things.
I've seen a number of "performance issues" with JavaFX in customer
projects. Most of them are simply due to the application doing things on
the FX App Thread that shouldn't be done there.
Apart from these obvious things, it is indeed mainly the CSS and layout
processing that consumes most of the time. I spent lots of time working on
mobile performance, and in that context the CPU limitation is even more
pronounced, as the GPU on modern mobile devices is pretty good. I don't
have enough different devices, but to give an idea about numbers: when
running a typical app, the GPU load on my desktop is about the same as the
GPU load on my Nexus 5, but the CPU load on the Nexus is 5-10 times higher
than on desktop. I measure this by comparing numbers in the pulseLogger.
Hence, on mobile the limitation that only a single thread can do the
CSS/layout phase AND the rasterisation is really a bottleneck.
However, there are many tricks that can make it better. Contrary to e.g.
Android and iOS, the JavaFX API's still allow for lots of flexibility that
can be very helpful, but that also can easily kill performance. Caching is
one of those things. If you cache the right Node, you'll gain a lot. But if
you cache a Node that often becomes dirty, you screw performance.
There is no general solution for this, I believe. The way to address this
is to create components that work for specific usecases (e.g. cache the
content of a pane while animating, and don't cache it once it might change
again). This is what we do in the Gluon Mobile components.
Complex CSS and layout changes are CPU hungry as well. About the latter: be
careful about bounds that vary (e.g. start with a negative offset) without
any visual consequence, as this will increase the dirty region.
Profiling is the most important thing to do when your application is slow.
Find the bottleneck, fix it, go to the next bottleneck. Each application
has its own bottlenecks, but with enough time, I really believe all
applications can at least be improved.
One of the great things about JavaFX is the decoupling between the API and
the rendering pipeline. If someone writes a renderer that works completely
different from the existing ones, that is fine. All applications should be
able to run with this.
One of the things I'm thinking about is to see if/how Vulkan would be a
good alternative to OpenGL. But that would not speed up the single-threaded
bottleneck, so I applaud all initiatives in that area (e.g. do some
parallel CSS processing/)
On Fri, Jul 22, 2016 at 2:09 AM Scott Palmer <swpalmer at gmail.com> wrote:
> > On Jul 21, 2016, at 6:18 PM, Richard Bair <richard.bair at oracle.com>
> > Hi Steve,
> > It could be a benchmark problem, although I wouldn’t be surprised at all
> if the benchmark was exercising the platform in some way that was CPU
> limited. Assuming it is CPU limited (and not going multi-core), I think the
> problem really comes down to what Markus said:
> >> The limiting factor is the single-thread architecture of rather all
> parts of JavaFX. The only real difference you see between machines is not
> correlating with neither number of CPU cores nor GPU cores, but only with
> CPU frequency, roughly spoken. Short term fixes will only provide little
> improvement, by optimizing the critical execution path (i. e. produce hot
> spot histogram using a profiler), for example improvement clipping,
> caching, etc. Huge performance optimizations need an architectural change
> within JavaFX's "scenegraph-to-bitmapframe" (a.k.a. rendering) pipeline to
> use parallel execution in lots of places. Typical design patterns would be
> parallel iterations, work-stealing executors, fibers (a.k.a cooperative
> multi-threading, a.k.a CompletableFuture), and last but not least
> partitioned rendering (a.k.a tiled rendering).
> >> I am pretty sure you can add a lot more ideas to the list and produce
> great performance, scaling linearly with number of CPU cores / GPU cores,
> but this somes at a cost: Risk to introduce hard to track bugs, and needed
> >> If somebody has at least a lot of free spare time, I am pretty sure
> Kevin could easily provide a huge set of work items in this area. :-)
> > JavaFX was setup to be multi-threaded — in fact there are always at
> least 2 threads — the application / scene graph thread and the render
> thread. Going way, way back the goal was for multi-core
> computation/rasterizing on the NG side (Prism), but it didn’t get done for
> a variety of reasons. I couldn’t even say what kind of performance win/loss
> it would bring. I’m sure for some workloads it would be way better,but for
> many others it probably wouldn’t make any difference. A lot of other more
> pressing features had to be implemented first which would allow people to
> build apps at all on top of FX (like controls and effects and animations
> and so forth), and Prism has served us really well.
> > There are a few places we could play with fork/join to see if we can get
> performance boosts, all of which would be tricky and have to be done very
> carefully because they are part of highly tuned code paths:
> > Computing and applying CSS styles
> > Computing bounds
> > Computing layout
> ^ That. Computing layout and CSS is the most time consuming aspect of my
> I get pauses of more than a second or two in some cases for only a few
> thousand nodes.
> Recently JDK-8153329 was fixed which helped a little (e.g. 3-4 seconds to
> add a panel full of controls to my UI down to 2-3 seconds)… but it lead me
> to believe that these highly tuned code paths could use more tuning.
> Adding or remove nodes can be very expensive. I suspect changes in bounds
> can be very expensive. And it seems that Path drawing is expensive.
> That’s what would help my application the most.
More information about the openjfx-dev