RFR 8080640: Reduce copying when reading JAR/ZIP entries

Staffan Friberg staffan.friberg at oracle.com
Fri May 22 20:15:52 UTC 2015

On 05/22/2015 11:51 AM, Xueming Shen wrote:
> On 05/22/2015 11:41 AM, Staffan Friberg wrote:
>> On 05/21/2015 11:00 AM, Staffan Friberg wrote:
>>> On 05/21/2015 09:48 AM, Staffan Friberg wrote:
>>>> On 05/20/2015 10:57 AM, Xueming Shen wrote:
>>>>> On 05/18/2015 06:44 PM, Staffan Friberg wrote:
>>>>>> Hi,
>>>>>> Wanted to get reviews and feedback on this performance 
>>>>>> improvement for reading from JAR/ZIP files during classloading by 
>>>>>> reducing unnecessary copying and reading the entry in one go 
>>>>>> instead of in small portions. This shows a significant 
>>>>>> improvement when reading a single entry and for a large 
>>>>>> application with 10k classes and 500+ JAR files it improved the 
>>>>>> startup time by 4%.
>>>>>> For more details on the background and performance results please 
>>>>>> see the RFE entry.
>>>>>> RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
>>>>>> WEBREV - http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0
>>>>>> Cheers,
>>>>>> Staffan
>>>>> Hi Staffan,
>>>>> If I did not miss something here, from your use scenario it 
>>>>> appears to me the only thing you really
>>>>> need here to help boost your performance is
>>>>>     byte[] ZipFile.getAllBytes(ZipEntry ze);
>>>>> You are allocating a byte[] at use side and wrapping it with a 
>>>>> ByteBuffer if the size is small enough,
>>>>> otherwise, you letting the ZipFile to allocate a big enough one 
>>>>> for you. It does not look like you
>>>>> can re-use that byte[] (has to be wrapped by the 
>>>>> ByteArrayInputStream and return), why do you
>>>>> need two different methods here? The logic would be much easier to 
>>>>> simply let the ZipFile to allocate
>>>>> the needed buffer with appropriate size, fill the bytes and 
>>>>> return, with a "OOME" if the entry size
>>>>> is bigger than 2g.
>>>>> The only thing we use from the input ze is its name, get the 
>>>>> size/csize from the jzentry, I don't think
>>>>> jzentry.csize/size can be "unknown", they are from the "cen" table.
>>>>> If the real/final use of the bytes is to wrap it with a 
>>>>> ByteArrayInputStream,why bother using ByteBuffer
>>>>> here? Shouldn't a direct byte[] with exactly the size of the entry 
>>>>> server better.
>>>>> -Sherman
>>>> Hi Sherman,
>>>> Thanks for the comments. I agree, was starting out with bytebuffer 
>>>> because I was hoping to be able to cache things where the buffer 
>>>> was being used, but since the buffer is past along further I 
>>>> couldn't figure out a clean way to do it.
>>>> Will rewrite it to simply just return a buffer, and only wrap it in 
>>>> the Resource class getByteBuffer.
>>>> What would be your thought on updating the ZipFile.getInputStream 
>>>> to return ByteArrayInputStream for small entries? Currently I do 
>>>> that work outside in two places and moving it would potentially 
>>>> speed up others reading small entries as well.
>>>> Thanks,
>>>> Staffan
>>> Just realized that my use of ByteArrayInputStream would miss Jar 
>>> verification if enabled so the way to go hear would be to add it if 
>>> possible to the ZipFile.getInputStream.
>>> //Staffan
>> Hi,
>> Here is an updated webrev which uses a byte[] directly and also uses 
>> ByteArrayInputStream in ZipFile for small entries below 128k.
> I'm not sure about the benefit of doing the ByteArrayInputStream in 
> ZipFile.getInputStream. It has
> the consequence of changing the "expected" behavior of 
> getInputStream() (instead of return an
> input stream waiting for reading, it now reads all bytes in advance), 
> something we might not want
> to do in a performance tuning. Though it might be reasonable to guess 
> everyone get an input stream
> is to read all bytes from it later.
> -Sherman
>> http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.1
>> //Staffan
Agree that it will change the behavior slightly, but as you said it is 
probably expected that some one will read the stream eventually.
We could reduce the size further if that makes a difference, if the size 
is below 65k we would not use more memory than the buffer allocated for 
the InflaterStream today.
The total allocation would be slightly larger for deflated entries as we 
would allocate a byte[] for the compressed bytes, but it would be 
GC:able and not kept alive. So from a memory perspective the difference 
is very limited.


More information about the core-libs-dev mailing list