[OpenJDK 2D-Dev] Pisces AA renderer performance improvement.
denilila at gmail.com
Wed Aug 24 03:36:38 UTC 2011
Resending: (the previous e-mail had a large webrev.zip attachmet. I am now
hosting the webrev on Dropbox:
On Tue, Aug 23, 2011 at 4:55 PM, Denis Lila <denilila at gmail.com> wrote:
> Hi Jim.
>> - I'd like to see the Renderer cache implemented in the RenderingEngine
>> code since it is outside the scope of the work that Renderer does.
>> Also, your table could fill up with partially used renderers if there
>> are ever any bugs that cause it to throw an exception before it calls
>> giveBack(). I'd like to see some code that tries to make sure they are
>> always given back even if errors happen (or, at least, they are removed
>> from the data structures if we feel that they aren't appropriate to
>> reuse after the error).
> I put it in PiscesRenderingEngine.java. I also implemented a cache eviction
> scheme so that it does consume all the memory. I tried to handle exceptions
> like the case you mentioned.
> Question: do you think it would be a good idea to create only
> renderers with heights
> that are powers of two? For example, right now if we get rendering
> requests for surfaces
> with heights 600, 700, 800 (in that order) we would create 3 renderers
> and put them
> in the cache. It might be better if we "rounded up" 600 -> 1024. Then we would
> only create one renderer for all 3 operations. This would save memory
> in this case.
>> - Renderer, line 158 - "while (--toadd >= 0)" allows the compiler to use
>> the condition codes from the decrement to satisfy the test.
>> - Why not just have 1 tracker that manages TILE_SIZE<<SUB_POS values
>> instead of TILE_SIZE trackers each of which manages SUB_POS values?
>> Unfortunately that means that EvenOdd would have to keep an array
>> (because 32*8 is more bits than any single primitive type we have), but
>> since we already need to have an array of the EvenOdd trackers, we
>> aren't any worse off and in fact we are ahead on storage because
>> currently you have 32*32 bits per set of trackers (plus overhead for 32
>> tracker objects).
>> - or and line don't really need a full 32-bits. They could share
>> storage, or they could be stored in byte arrays, or they could even
>> share a byte array.
>> - if you store last_subpixy instead of numys then you wouldn't have to
>> keep storing the new number back in the arrays - it would become a
>> read-only value.
>> - I think it would be worth it to amortize the removal of dead pointers.
>> All it would take would be to store a NULL in the spot and set a
>> boolean, then they could all be collapsed out in one pass at the end.
>> To get even more creative, if you sort them the other way around and
>> then scan from count=>0 then you could just copy the value "ptrs[i] =
>> ptrs[--count]" and let the insertion sort at the start of the next pixel
>> row deal with it.
> Done (except sorting the other way around. I'm not sure what would
> be gained by this).
>> - Another way to manage the merge sort would be to have a temp array of
>> SUB_PIX_Y values, quickly store all of the crossings into it either
>> forwards or backwards in a pair of tight loops (one for slope >0 and one
>> for slope <0) and then do a regular merge sort in a second tight loop.
> I already tried the "temp array" approach. It's slightly worse than the current
>> If you use the values at [ci,ci+toadd) as the temp array then one test
>> could skip the merge sort step entirely because the values would already
>> be sorted (and this is likely to happen). My thoughts were that 2
>> tighter loops might be faster than one more complicated loop.
> If we do this, we could indeed skip the merge step in some cases, but in cases
> where sorting is actually needed, we wouldn't be able to use merge sort
> anymore, unless we still introduced a merge buffer.
> I can't find anywhere to host the webrev, so I attached it. I hope that's ok.
More information about the 2d-dev