ZipFileSystem performance regression
claes.redestad at oracle.com
Tue Apr 16 12:21:53 UTC 2019
Both before and after this regression, it seems the default behavior is
not to use a temporary file (until ZFS.sync(), which writes to a temp
file and then moves it in place, but that's different from what happens
with the useTempFile option enabled). Instead entries (and the backing
zip file system) are kept in-memory.
The cause of the issue here is instead that no deflation happens until
sync(), even when writing to entries in-memory. Previously, the
deflation happened eagerly, then the result of that was copied into
the zip file during sync().
I've written a proof-of-concept patch that restores the behavior of
eagerly compressing entries when the method is METHOD_DEFLATED and the
target is to store bytes in-memory (the default scenario):
This restores performance of parallel zip to that of 11.0.2 for the
default case. It still has a similar regression for the case where
useTempFile is enabled, but that should be easily addressed if this
looks like a way forward?
(I've not yet created a bug as I got too caught up in trying to figure
out what was going on here...)
On 2019-04-16 09:29, Alan Bateman wrote:
> On 15/04/2019 14:32, Lennart Börjeson wrote:
>> Previously, the deflation was done when in the call to Files.copy,
>> thus executed in parallel, and the final ZipFileSystem.close() didn't
>> do anything much.
> Can you submit a bug? When creating/updating a zip file with zipfs then
> the closing the file system creates the zip file. Someone needs to check
> but it may have been that the temporary files (on the file system
> hosting the zip file) were deflated when writing (which is surprising
> but may have been the case).
More information about the core-libs-dev