[OpenJDK 2D-Dev] Various fixes to pisces stroke widening code
james.graham at oracle.com
Wed Jul 21 03:46:16 UTC 2010
This is awesome! Thanks for doing this.
I agree with all of your comments below. Here are some thoughts on the
91 - is there a reason to copy the dash array? Theoretically if this
object were used in an arbitrary fashion it is good sense to protect
against data changing out from under it, but in practice we only ever
feed it the array that came from a BasicStroke object which was already
protected from arbitrary changes and is copied every time we fetch it so
this is one more unfortunately extra fetch that isn't needed in
practice. (Ideally we'd instrument an extractor for BasicStroke so we
could get the raw array without the defensive copy in getDashArray()...)
169 - if origLen reaches the end of the dash exactly (the "==" case)
then shouldn't you flip the dashOn variable? That would happen if you
drop through to the following code (i.e. change the test to just "<"),
but it would require a bit more calculation. If you leave the phase
equal to the dash size (which is what happens now in the "==" case) then
the next time through the if will fail and you will end up with a second
goTo to the same coordinates that the previous dash ended on before
flipping the dashOn variable and circling back - which is a waste of a
bunch of calculations. Or was there some bookkeeping issue that arose
when the "==" case dropped through?
248 - is the normalize parameter used yet? Not a complaint - I just
didn't see it getting used and wanted to make sure I didn't miss something.
257-261 - both Stroker and Dasher grab the t4 values into local
variables. Hotspot might cover up this sin if the T4 stays in the young
generation, but for a long path the T4 may move into an older generation
and take more to collect. Maybe it would be worth just passing in the 4
values to the Stroker and Dasher constructors rather than encapsulating
them in a T4 that is immediately stripped and ignored?
367 - you left a stats printing call in here.
I note some confusion below over comparing subpixel values to pixel
values. Perhaps a convention where all subpixel values would be in
variables with "sub" in their name or something like that would make it
more maintainable (and correct in the short term).
47,48 - any reason to move these fields around? (Not a complaint - just
wondering if you had a reason.)
129 - note this could overflow, but I imagine that the bounds came from
a device clip which means there would have to be some drawable created
that was larger than MAX_INT/SUB_POS_XY. If that is true then it
probably isn't worth worrying about, but a comment might help point that
157 - Personally I'd call close before I started working on the new
data, but that's just my coding layout style. Also, does close() always
do the "right thing" if no path has been created yet? For one thing it
looks like it always results in a call to "lineTo" no matter what which
I'm guessing "happens to" fall into the short-circuit for horizontal
lines in lineTo, but I haven't verified it. In fact I think I see a
problem in that firstOrientation is not set to 0 and so if you have 2
moveto's in a row then the second one will call close() and find that
the orientations don't match and increment flips. That isn't a critical
problem since flips is only used to allocate memory, but it would
allocate more than needed in that case. (not to mention emitting
unnecessary linetos) Suggestion - set firstOrientation to 0 in moveto
and test for firstOrientation==0 in close() so that close() doesn't add
any geometry to something that has no deflections yet.
171 - Aren't x0,y0 stored as subpix values? You would then be comparing
a subpix value to a non-subpix value. Perhaps if the subpix calls are
moved to the top of the function I think this should work OK?
194,197 - Shouldn't these be constants, or based on the SUB_POS_XY?
---- Warning: potential can of worms ----
BTW, I never traced this through to see why it was needed, but I've seen
a lot of renderers get afraid of vertices on pixel boundaries for
similar reasons, but most of them are wrong because they miss the fact
that the last coordinate on a line should be ignored by the scanline
counts because of the rule of "any pixel on a horizontal edge should be
outside the path if the area right below it is outside". Basically, we
fill from x0 up to but not including x1 and also from y0 up to but not
including y1. Thus, only y0 would trigger a scanline increment and y1
would never do so. If the following segment immediately heads up then
both of those segments would "end" on the same scanline and be ignored.
If the following segment continues down then the first does not
trigger a scanline crossing, but the second does. If we can make sure
the bookkeeping is done that way in the rest of the Renderer then we can
get rid of this fudge factor entirely. I discuss some of these changes
below, but in the end, 5 or 6 methods would need their math updated to
make this work right and I'm not sure if I've described it well enough
for you to make all the needed changes. Perhaps this could be a
follow-on project, but it would greatly simplify a lot of the code in here.
---- End can of worms ----
216 - if you save the sx0,sy0 before subpix'ing them then you don't have
to "unsubpix" them here. (Though you still need the subpix sy0 for line
209 - or you could just call subpix on it for that line and at least
you'd save 1 call to sub/unsub).
236,237 - here is what I was talking about above (can of worms). minY
and maxY should probably both be ceil'd and then maxY would be the first
scanline "not" processed and then the problem above about double
crossings at the scanlines wouldn't be an issue.
243 - "minY >= maxY" if the above is done (can of worms).
262 - "y < maxY" if the above is done (can of worms).
256,264 - casting to int is problematic. It truncates towards 0 which
means negatives are ceil'd and positives are floor'd. It would be best
to use floor here instead. On the other hand, if negative numbers are
always "off the left side of the drawable" then this is moot.
317,318 - blech (can of worms) - looks like more math designed for the
"minY up to and including maxY" approach - I far prefer the "minY up to,
but not including maxY" approach as mentioned above. ;-)
_endRendering(), setCrossingsExtents, computeCrossingsForEdge (already
mentioned above), computeBounds, and renderStrip() would all need to be
updated for that. How comfortable do you feel with that conversion?
597 - you deleted the call to arraycopy? D'oh! (Actually,
Arrays.newSize() is faster than "allocate and copy" so it couldn't hurt
to switch to it here. Also, expanding by 10% seems a little
conservative for large edge lists. I'd rather see "double, but never
more than 10*INIT_EDGES" or something like that. This may be more
"limited memory on mobile" thinking here.
611 - (can of worms) "ceil >= ceil" if the above scanline crossing
change is done. OK, enough for now on harping on the can of worms.
721 - Arrays.sort()
- I didn't get to this file yet. I'm going to send these comments along
so you can take a look at them and then review Stroker tomorrow...
Denis Lila wrote:
> Hello again.
> I know I promised this last week, and I'm sorry it's late, but some nasty
> bugs popped up. The webrev is at:
> I took this opportunity to make some changes that aren't related to floating
> point conversion, since this effort wasn't really directed at solving any one
> particular bug, but towards general refactoring and improving of Pisces (although
> it does solve a certain bug).
> I removed some write-only variables in Renderer and Stroker.
> I removed Dasher and Stroker's ability for the same object to be used with
> more than one output, transform, width, etc. Jim Graham said we should consider
> removing the no argument constructor for Dasher in a previous e-mail (which
> is equivalent to doing what I've done since you can't have this functionality
> without a no argument constructor and methods like setParameters and setOutput).
> I thought this was a good idea because this functionality was never being used,
> it clutters things up (among other things, it necessitates making the clients
> call setOutput and setParameters before the object can be used. This sort of
> thing is not a good idea since methods should be as self-contained as possible),
> and even if it is used, all it does is save us the creation of some objects. Since
> a huge number of these objects would never be needed and since they are not
> very expensive to create (Stroker is the biggest, and it has about 38 members),
> this is a premature optimization.
> (2) allowed me to declare a whole bunch of members final (things like output,
> lineWidth, transformation matrix entries and anything that can't change).
> I changed the crossing sorting algorithm in Renderer from bubble sort to
> insertion sort. Not a huge difference, but mine is shorter, it should perform
> slightly better, and I think it the algorithm is easier to understand.
> I inserted comments on things which used to confuse me. Please check these -
> when reading code, there is nothing worse than a wrong explanation in a comment.
> I removed the if(false &&... block in Renderer. I tried to make it work some
> time ago, but it was complicated and more trouble than it was worth - it would
> only be used when filling a rectangle, and not even a rectangle that was part of
> some path. The rectangle had to be the only thing being rendered. This is fast enough
> without being treated as a special case.
> I think that's it...
> As for testing: I tested this fairly thoroughly. I used dashed strokes, various
> affine transformations, lines longer than 2^16, and complicated GeneralPaths.
> Everything looks good.
> I also did some performance measurements. Everything is faster than it used to be,
> although AA rendering is still much slower than closed source java. One of the
> problems here is that when rendering large objects, the strips in Renderer:_endRendering
> will be small, and _endRenderer's complexity, not taking into account methods it calls,
> is O(numStrips*numEdges), because for every strip it has to go through what's left of the edge
> list. A better way to iterate through edges that are in a strip is needed.
> NOTE: I did not make changes (2) and (3) to Renderer.java because I don't have time
> today, and I wanted to put out something fairly polished as soon as possible. Also,
> I think Renderer needs quite a bit more refactoring than just that. In particular the
> way it stores and iterates through edges, scalines, and crossings needs to be improved.
> Right now it is too error-prone, and many variables have a far larger scope than
> they should.
> I would very much appreciate it if anyone could look over my changes, or comment
> on the last two paragraphs above.
> Thank you,
> ----- "Denis Lila" <dlila at redhat.com> wrote:
>> And, I just finished it. At least it compiled successfully. I'm sure
>> are a few runtime bugs. I'll try to have a webrev out by today.
>> ----- "Jim Graham" <james.graham at oracle.com> wrote:
>>> Hi Denis,
>>> float operations are pretty fast and they are usually done in a
>>> part of the processor so the compiler can schedule a lot of
>>> instructions to run in parallel with waiting for the results of the
>>> instruction. In the end, they often end up being "free" if there
>>> enough bookkeeping instructions (branches, fetching data, etc.) in
>>> amongst the data.
>>> Given how many alternate instructions you are pointing out for the
>>> point code it would be very likely that this would be a "win".
>>> The main reason that Pisces is implemented heavily in fixed point is
>>> that it was originally written for the mobile platform where there
>>> no FP instructions. We don't have to worry about that on the
>>> (any more).
>>> I strongly support you converting things to fp if you want to do
>>> On 7/12/2010 8:05 AM, Denis Lila wrote:
>>>> Is it ok if I do this? I already started working on it on Friday.
>>>> I think I should be done by tomorrow.
>>>> But yes, I agree that we should convert to floating point. As for
>>>> performance, there's also the fact that right now we're trading
>>>> one double multiplication for 2 casts to long, 1 long
>>>> 1 bit shift of a long, and 1 cast back to an int. I don't know
>>>> about how these are done in hardware, but it doesn't seem like
>>>> be faster than the double multiplication.
>>>> As for large coordinates, there's a bug report about it (one not
>>>> reported by me :) ) here:
>>>> I submitted a matching bug report on bugs.sun.com (ID 6967436),
>>>> can't find it when I search for it.
>>>> ----- "Jim Graham"<james.graham at oracle.com> wrote:
>>>>> Pisces could really stand to upgrade to floats/doubles
>>>>> several reasons:
>>>>> - FPU ops are likely as fast if not faster on modern hardware due
>>>>> parallel execution of FP instructions alongside regular
>>>>> - Don't have to worry about getting coordinates larger than 32K
>>>>> think this is well tested, but I imagine that Pisces will not
>>>>> it very gracefully).
>>>>> - Lots of code is greatly simplified not having to deal with the
>>>>> semantics of how to do fixed point multiplies, divides, and
>>>>> I meant to do this during the original integration, but I wanted
>>>>> the task done as quickly as possible so that we could have an
>>>>> source alternative to the closed Ductus code so I left that task
>>> for a
>>>>> later update. But, now that a lot of work is being done on the
>>>>> fix it up, it might be better to do the float conversion now so
>>>>> maintenance is easier and before we end up creating a lot of new
>>>>> point code.
>>>>> My plate is full right now, but hopefully I can interest someone
>>>>> doing a cleanup cycle? (Donning my Tom Sawyer hat... ;-)
More information about the 2d-dev