[Rev 03] RFR: 8088198: Exception thrown from snapshot if dimensions are larger than max texture size

Nir Lisker nlisker at openjdk.java.net
Sat Jan 25 19:55:44 UTC 2020

On Fri, 24 Jan 2020 17:16:13 GMT, Frederic Thevenet <github.com+7450507+fthevenet at openjdk.org> wrote:

>> I don't, to be honest. 
>> The results for some dimensions  (not always the same) can vary pretty widely from one run to another, despite all my effort to repeat results and remove outliers.
>> Out of curiosity, I also tried to eliminate the GC as possible culprit by running it with epsilon, but it seems to make no significant difference.
>> I ran that test on a laptop with Integrated Intel graphics and no dedicated vram (Intel UHD Graphics 620), though, so this might be why. 
>> Maybe someone could try and run the bench on hardware with a discreet GPU?
> With regard as to why the tiling version is significantly slower, though, I do have a pretty good idea; as Kevin hinted, the pixel copy into a temporary buffer before copying into the final image is where most the extra time is spent.
> The reason why it is so much slower is a little bit of a pity, though; profiling a run of the benchmark shows that a lot of time is spent into `IntTo4ByteSameConverter::doConvert`. As it turns out, the reason for this is that, under Windows and the D3D pipeline anyway, the `WriteableImage` used to collate the tiles and the tiles returned from the RTTexture have different pixel formats (IntARGB for the tile and byteBGRA for the `WriteableImage`).
> So if we could use a `WriteableImage` with an IntARGB pixel format as the recipient for the snapshot (at least as long as no image was provided by the caller), I suspect that the copy would be much faster.
> Unfortunately it seems the only way to choose the pixel format for a `WritableImage` is to initialize it with a `PixelBuffer`, but then one can no longer use a `PixelWriter` to update it and it desn't seems to me that there is a way to safely access the `PixelBuffer` from an image's reference alone.
> I'm pretty new to this code base though (which is quite large; I haven't read it all quite yet... ;-), so hopefully there's a way to do that that has simply eluded me so far.

> profiling a run of the benchmark shows that a lot of time is spent into `IntTo4ByteSameConverter::doConvert`

This is a bit naive, but what if you parallelize the code there? I didn't test that this produces the correct result, but you can try to replace the loops with this:
IntStream.range(0, h).parallel().forEach(y -> {    
    IntStream.range(0, w).parallel().forEach(x -> {
        int pixel = srcarr[srcoff++];              
        dstarr[dstoff++] = (byte) (pixel      );   
        dstarr[dstoff++] = (byte) (pixel >>  8);   
        dstarr[dstoff++] = (byte) (pixel >> 16);   
        dstarr[dstoff++] = (byte) (pixel >> 24);   
    srcoff += srcscanints;                         
    dstoff += dstscanbytes;                        


PR: https://git.openjdk.java.net/jfx/pull/68

More information about the openjfx-dev mailing list