[OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

Denis Lila dlila at redhat.com
Wed Jul 21 16:12:25 UTC 2010

Hello Jim.

Thank you very much for taking the time to read through this.

> 91 - is there a reason to copy the dash array?  Theoretically if this 
> object were used in an arbitrary fashion it is good sense to protect 
> against data changing out from under it, but in practice we only ever 
> feed it the array that came from a BasicStroke object which was already 
> protected from arbitrary changes and is copied every time we fetch it so 
> this is one more unfortunately extra fetch that isn't needed in 
> practice.  (Ideally we'd instrument an extractor for BasicStroke so we
> could get the raw array without the defensive copy in
> getDashArray()...)

    You're right, I suppose there isn't. It's just the way it was
being done before, and it didn't occur to me to change it. I will,
in the next version of this patch.

> 169 - if origLen reaches the end of the dash exactly (the "==" case) 
> then shouldn't you flip the dashOn variable?  That would happen if you
> drop through to the following code (i.e. change the test to just "<"),
> but it would require a bit more calculation.  If you leave the phase 
> equal to the dash size (which is what happens now in the "==" case) then 
> the next time through the if will fail and you will end up with a second 
> goTo to the same coordinates that the previous dash ended on before 
> flipping the dashOn variable and circling back - which is a waste of a
> bunch of calculations.  Or was there some bookkeeping issue that arose 
> when the "==" case dropped through?

    You're right, I should. I can't just replace <= with == though,
because the results will be the same: in the equal case origLen will
become 0, and on the next iteration, the (origLen < dash[idx]-phase) will
be true, and we will do a goTo(x1,y1), which is what we just did in the
previous iteration (unless dash[idx] is 0, in which case the results 
will be even worse). The best solution to this is to just do a nested
check for the == case.

> 248 - is the normalize parameter used yet?  Not a complaint - I just 
> didn't see it getting used and wanted to make sure I didn't miss
> something.

    No, no yet. That's next week's work :)

> 257-261 - both Stroker and Dasher grab the t4 values into local 
> variables.  Hotspot might cover up this sin if the T4 stays in the young 
> generation, but for a long path the T4 may move into an older generation 
> and take more to collect.  Maybe it would be worth just passing in the 4 
> values to the Stroker and Dasher constructors rather than encapsulating 
> them in a T4 that is immediately stripped and ignored?

    Definitely. I never liked using the Transform4 class, and I almost
removed it in this patch (but didn't because I didn't want to change that much).
This is all the reason I need to remove it. Nothing ever uses Transform4 anyway,
except to just unpack the matrix entries, so all we get from it is slightly 
reduced parameter lists.

> 367 - you left a stats printing call in here.
Oops... sorry about that.

> I note some confusion below over comparing subpixel values to pixel 
> values.  Perhaps a convention where all subpixel values would be in 
> variables with "sub" in their name or something like that would make it 
> more maintainable (and correct in the short term).

    That's a good idea. I'll do that.

> 47,48 - any reason to move these fields around?  (Not a complaint - just 
> wondering if you had a reason.)

    No, no reason. It happened accidentally - I made many changes to this
file before settling on this, and at one point these variables weren't even there
(replaced by Math.floor and Math.ceil calls that I thought did equivalent things
to the various bit operations these are used for. After running into some bugs,
I reintroduced them to keep the changes as small as possible and eliminate these
as bug sources).

> 129 - note this could overflow, but I imagine that the bounds came from 
> a device clip which means there would have to be some drawable created 
> that was larger than MAX_INT/SUB_POS_XY.  If that is true then it 
> probably isn't worth worrying about, but a comment might help point that out.

    That's true. In fact, there are a few other places where ints could overflow
if the clip passed in to Renderer is large enough. I'll try to comment on them all.

> 157 - Personally I'd call close before I started working on the new 
> data, but that's just my coding layout style.  Also, does close() always 
> do the "right thing" if no path has been created yet?  For one thing it 
> looks like it always results in a call to "lineTo" no matter what which 
> I'm guessing "happens to" fall into the short-circuit for horizontal 
> lines in lineTo, but I haven't verified it.  In fact I think I see a 
> problem in that firstOrientation is not set to 0 and so if you have 2 
> moveto's in a row then the second one will call close() and find that 
> the orientations don't match and increment flips.  That isn't a critical 
> problem since flips is only used to allocate memory, but it would 
> allocate more than needed in that case. (not to mention emitting 
> unnecessary linetos)  Suggestion - set firstOrientation to 0 in moveto 
> and test for firstOrientation==0 in close() so that close() doesn't add 
> any geometry to something that has no deflections yet.

    Good idea - I'll do that.

> 171 - Aren't x0,y0 stored as subpix values?  You would then be comparing 
> a subpix value to a non-subpix value.  Perhaps if the subpix calls are 
> moved to the top of the function I think this should work OK?

    That's true, they are. This is very puzzling. If a horizontal line is
added, when the crossings for it are being computed, dxBydy should be NaN, and
wouldn't an error be thrown when we try to cast to an int in the call to addCrossing?

> 194,197 - Shouldn't these be constants, or based on the SUB_POS_XY?

    I suppose I should make a biasing constant. I don't think they should be based
on SUB_POS_XY though, because the biasing is done to subpixel coordinates so
there is no danger that if our supersampling is fine enough the biasing will 
make the coordinates jump over some scan line.

> 216 - if you save the sx0,sy0 before subpix'ing them then you don't have 
> to "unsubpix" them here.  (Though you still need the subpix sy0 for line 
> 209 - or you could just call subpix on it for that line and at least
> you'd save 1 call to sub/unsub).

    Ok. I'll just save subpixed and unsubpixed versions of sx0, sy0. That should
eliminate all sx0,sy0 related calls to tosubpix and topix except in moveTo.

> 256,264 - casting to int is problematic.  It truncates towards 0 which 
> means negatives are ceil'd and positives are floor'd.  It would be best 
> to use floor here instead.  On the other hand, if negative numbers are 
> always "off the left side of the drawable" then this is moot.

    That's why I left it at int casting. Do you still think I should change it
to floor?

> 597 - you deleted the call to arraycopy?  D'oh! (Actually, 
> Arrays.newSize() is faster than "allocate and copy" so it couldn't hurt 
> to switch to it here.  Also, expanding by 10% seems a little 
> conservative for large edge lists.  I'd rather see "double, but never 
> more than 10*INIT_EDGES" or something like that.  This may be more 
> "limited memory on mobile" thinking here.

    Ah, sorry about that. Before making the patch I was going through the files
to remove any println calls used for debugging and things of that nature. I
must have pushed dd one too many times.
Speaking of which, is there a good way to edit and build openJDK from eclipse? 
Then this sort of embarrassing error could be avoided (including the printStats() call).
    As for Arrays.newSize()... I can't find it here: 
Is this a new function added in 7?

> 721 - Arrays.sort()

    I thought about using this, but I did some measurements, and it turns out that 
Arrays.sort() is a bit slower if the portion of the array being sorted has fewer
than about 70 elements.
On the other hand, when there are fewer than 70 elements the difference is negligible,
and we only really need performance as "flips" becomes larger, in which case the difference
between insertion sort and quicksort starts growing pretty quickly.
Ok, I'll use Arrays.sort()

> How comfortable do you feel with that conversion? 

I'll try to do it and include it in a new patch along with, hopefully, a better way
to iterate through strips, and especially crossings. Right now all the iteration
state is saved in global variables. This is... not good. I spent far too much time last
week on bugs caused by this sort of thing. Ideally, any members that can't be put at 
the top of a class (like our edge and crossing data) should be put in classes of their own.

Do you have any ideas about how to iterate through edges in a strip without going through
every edge? I was thinking of maybe using some sort of tree to split the drawing surface,
but I haven't given it much thought.

Thanks again,

----- "Jim Graham" <james.graham at oracle.com> wrote:

> Hi Denis,
> This is awesome!  Thanks for doing this.
> I agree with all of your comments below.  Here are some thoughts on
> the 
> new code:

> ---- Warning: potential can of worms ----
> BTW, I never traced this through to see why it was needed, but I've seen 
> a lot of renderers get afraid of vertices on pixel boundaries for 
> similar reasons, but most of them are wrong because they miss the fact 
> that the last coordinate on a line should be ignored by the scanline 
> counts because of the rule of "any pixel on a horizontal edge should be 
> outside the path if the area right below it is outside".  Basically, we 
> fill from x0 up to but not including x1 and also from y0 up to but not 
> including y1.  Thus, only y0 would trigger a scanline increment and y1 
> would never do so.  If the following segment immediately heads up then 
> both of those segments would "end" on the same scanline and be ignored. 
>   If the following segment continues down then the first does not 
> trigger a scanline crossing, but the second does.  If we can make sure 
> the bookkeeping is done that way in the rest of the Renderer then we can 
> get rid of this fudge factor entirely.  I discuss some of these changes 
> below, but in the end, 5 or 6 methods would need their math updated to 
> make this work right and I'm not sure if I've described it well enough 
> for you to make all the needed changes.  Perhaps this could be a 
> follow-on project, but it would greatly simplify a lot of the code in
> here.
> ---- End can of worms ----

> 236,237 - here is what I was talking about above (can of worms).  minY 
> and maxY should probably both be ceil'd and then maxY would be the first 
> scanline "not" processed and then the problem above about double 
> crossings at the scanlines wouldn't be an issue.
> 243 - "minY >= maxY" if the above is done (can of worms).
> 262 - "y < maxY" if the above is done (can of worms).

> 317,318 - blech (can of worms) - looks like more math designed for the 
> "minY up to and including maxY" approach - I far prefer the "minY up to, 
> but not including maxY" approach as mentioned above.  ;-) 
> _endRendering(), setCrossingsExtents, computeCrossingsForEdge (already 
> mentioned above), computeBounds, and renderStrip() would all need to be 
> updated for that.  How comfortable do you feel with that conversion? 
> 611 - (can of worms) "ceil >= ceil" if the above scanline crossing 
> change is done.  OK, enough for now on harping on the can of worms.

> Stroker.java
> - I didn't get to this file yet.  I'm going to send these comments
> along 
> so you can take a look at them and then review Stroker tomorrow...
> 			...jim
> Denis Lila wrote:
> > Hello again.
> > 
> > I know I promised this last week, and I'm sorry it's late, but some
> nasty 
> > bugs popped up. The webrev is at: 
> > http://icedtea.classpath.org/~dlila/webrevs/floatingPoint/webrev/
> > 
> > I took this opportunity to make some changes that aren't related to
> floating
> > point conversion, since this effort wasn't really directed at
> solving any one
> > particular bug, but towards general refactoring and improving of
> Pisces (although
> > it does solve a certain bug).
> > 1. 
> >     I removed some write-only variables in Renderer and Stroker.
> > 
> > 2. 
> >     I removed Dasher and Stroker's ability for the same object to be
> used with
> > more than one output, transform, width, etc. Jim Graham said we
> should consider
> > removing the no argument constructor for Dasher in a previous e-mail
> (which
> > is equivalent to doing what I've done since you can't have this
> functionality
> > without a no argument constructor and methods like setParameters and
> setOutput).
> > I thought this was a good idea because this functionality was never
> being used,
> > it clutters things up (among other things, it necessitates making
> the clients 
> > call setOutput and setParameters before the object can be used. This
> sort of
> > thing is not a good idea since methods should be as self-contained
> as possible),
> > and even if it is used, all it does is save us the creation of some
> objects. Since
> > a huge number of these objects would never be needed and since they
> are not 
> > very expensive to create (Stroker is the biggest, and it has about
> 38 members), 
> > this is a premature optimization.
> > 
> > 3.
> >     (2) allowed me to declare a whole bunch of members final (things
> like output,
> > lineWidth, transformation matrix entries and anything that can't
> change).
> > 
> > 4.
> >     I changed the crossing sorting algorithm in Renderer from bubble
> sort to 
> > insertion sort. Not a huge difference, but mine is shorter, it
> should perform
> > slightly better, and I think it the algorithm is easier to
> understand.
> > 
> > 5.
> >     I inserted comments on things which used to confuse me. Please
> check these -
> > when reading code, there is nothing worse than a wrong explanation
> in a comment.
> > 
> > 6.
> >     I removed the if(false &&... block in Renderer. I tried to make
> it work some
> > time ago, but it was complicated and more trouble than it was worth
> - it would
> > only be used when filling a rectangle, and not even a rectangle that
> was part of
> > some path. The rectangle had to be the only thing being rendered.
> This is fast enough
> > without being treated as a special case.
> > 
> > I think that's it...
> > As for testing: I tested this fairly thoroughly. I used dashed
> strokes, various 
> > affine transformations, lines longer than 2^16, and complicated
> GeneralPaths.
> > Everything looks good.
> > 
> > I also did some performance measurements. Everything is faster than
> it used to be,
> > although AA rendering is still much slower than closed source java.
> One of the
> > problems here is that when rendering large objects, the strips in
> Renderer:_endRendering
> > will be small, and _endRenderer's complexity, not taking into
> account methods it calls,
> > is O(numStrips*numEdges), because for every strip it has to go
> through what's left of the edge 
> > list. A better way to iterate through edges that are in a strip is
> needed.
> > 
> > NOTE: I did not make changes (2) and (3) to Renderer.java because I
> don't have time
> > today, and I wanted to put out something fairly polished as soon as
> possible. Also,
> > I think Renderer needs quite a bit more refactoring than just that.
> In particular the 
> > way it stores and iterates through edges, scalines, and crossings
> needs to be improved.
> > Right now it is too error-prone, and many variables have a far
> larger scope than
> > they should.
> > 
> > I would very much appreciate it if anyone could look over my
> changes, or comment
> > on the last two paragraphs above.
> > 
> > Thank you, 
> > Denis. 
> > 
> > ----- "Denis Lila" <dlila at redhat.com> wrote:
> > 
> >> Hello.
> >>
> >> And, I just finished it. At least it compiled successfully. I'm
> sure
> >> there
> >> are a few runtime bugs. I'll try to have a webrev out by today.
> >>
> >> Regards,
> >> Denis.
> >>
> >> ----- "Jim Graham" <james.graham at oracle.com> wrote:
> >>
> >>> Hi Denis,
> >>>
> >>> float operations are pretty fast and they are usually done in a
> >>> separate
> >>> part of the processor so the compiler can schedule a lot of
> >>> bookkeeping
> >>> instructions to run in parallel with waiting for the results of
> the
> >> FP
> >>> instruction.  In the end, they often end up being "free" if there
> >> are
> >>> enough bookkeeping instructions (branches, fetching data, etc.)
> in
> >>> amongst the data.
> >>>
> >>> Given how many alternate instructions you are pointing out for
> the
> >>> fixed
> >>> point code it would be very likely that this would be a "win".
> >>>
> >>> The main reason that Pisces is implemented heavily in fixed point
> is
> >>> that it was originally written for the mobile platform where
> there
> >> are
> >>> no FP instructions.  We don't have to worry about that on the
> >> desktop
> >>> (any more).
> >>>
> >>> I strongly support you converting things to fp if you want to do
> >>> it...
> >>>
> >>> 			...jim
> >>>
> >>> On 7/12/2010 8:05 AM, Denis Lila wrote:
> >>>> Hello.
> >>>>
> >>>> Is it ok if I do this? I already started working on it on
> Friday.
> >>>> I think I should be done by tomorrow.
> >>>>
> >>>> But yes, I agree that we should convert to floating point. As
> for
> >>>> performance, there's also the fact that right now we're trading
> >>>> one double multiplication for 2 casts to long, 1 long
> >>> multiplication,
> >>>> 1 bit shift of a long, and 1 cast back to an int. I don't know
> >> much
> >>>> about how these are done in hardware, but it doesn't seem like
> >>> they'd
> >>>> be faster than the double multiplication.
> >>>>
> >>>> As for large coordinates, there's a bug report about it (one not
> >>>> reported by me :) ) here:
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=597227
> >>>> I submitted a matching bug report on bugs.sun.com (ID 6967436),
> >> but
> >>> I
> >>>> can't find it when I search for it.
> >>>>
> >>>> Denis.
> >>>>
> >>>> ----- "Jim Graham"<james.graham at oracle.com>  wrote:
> >>>>
> >>>>> Sigh...
> >>>>>
> >>>>> Pisces could really stand to upgrade to floats/doubles
> >> everywhere,
> >>> for
> >>>>> several reasons:
> >>>>>
> >>>>> - FPU ops are likely as fast if not faster on modern hardware
> due
> >>> to
> >>>>> parallel execution of FP instructions alongside regular
> >>> instructions.
> >>>>> - Don't have to worry about getting coordinates larger than 32K
> >> (I
> >>>>> don't
> >>>>> think this is well tested, but I imagine that Pisces will not
> >> deal
> >>>>> with
> >>>>> it very gracefully).
> >>>>>
> >>>>> - Lots of code is greatly simplified not having to deal with
> the
> >>>>> semantics of how to do fixed point multiplies, divides, and
> >>>>> conversions.
> >>>>>
> >>>>> I meant to do this during the original integration, but I
> wanted
> >>> to
> >>>>> get
> >>>>> the task done as quickly as possible so that we could have an
> >> open
> >>>>> source alternative to the closed Ductus code so I left that
> task
> >>> for a
> >>>>> later update.  But, now that a lot of work is being done on the
> >>> code
> >>>>> to
> >>>>> fix it up, it might be better to do the float conversion now so
> >>> that
> >>>>> the
> >>>>> maintenance is easier and before we end up creating a lot of
> new
> >>> fixed
> >>>>> point code.
> >>>>>
> >>>>> My plate is full right now, but hopefully I can interest
> someone
> >>> else
> >>>>> in
> >>>>> doing a cleanup cycle?  (Donning my Tom Sawyer hat... ;-)
> >>>>>
> >>>>> 			...jim

More information about the 2d-dev mailing list