ZipFileSystem performance regression

Peter Levart peter.levart at
Wed Apr 17 07:23:08 UTC 2019

Just a thought...

Would it be feasible to create a brand new "Generic Caching Filesystem" 
implementation that would delegate to another filesystem for persistent 
storage (be it ZipFilesystem or any other) and implement interesting 
caching strategies (lazy flushing, concurrent flushing, etc...)

So instead of parameterizing a concrete filesystem (e.g. ZipFilesystem), 
the filesystems could be layered to achieve the desired performance 

What do you think? Is the existing filesystem API suitable for such 

Regards, Peter

On 4/16/19 11:50 PM, Claes Redestad wrote:
> It sounds reasonable to have that as an option, but I'd like to see it
> requested by some user first. And at least one (micro-)benchmark where
> keeping entries uncompressed in memory actually shows significant
> positive impact.
> I can see it might have the opposite effect depending on how often that
> memory is inflated/deflated and whether or not the inflated entries
> cause high enough memory pressure to make GC activity spike. javac might
> be one application where it could be negative as it's tuned to run with
> a rather small max heap and already spends significant time doing GCs.
> /Claes
> On 2019-04-16 23:20, Lance Andersen wrote:
>> Would it be worth  adding a ZIP File System property similar to 
>> createNew which enables/disables the change that Claes has made 
>> having the default be the pre-jdk 12 functionality?
>>> On Apr 16, 2019, at 4:50 PM, Xueming Shen <xueming.shen at> 
>>> wrote:
>>> Well, have to admitted I didn't expect your use scenario when made 
>>> the change. Thought as a
>>> filesystem runtime access performance has more weight compared to 
>>> shutdown performance...
>>> basically you are no using zipfs as a filesystem, but another jar 
>>> tool that happens to have
>>> better in/out concurrent performance. Yes, back then I was working 
>>> on using zipfs as a memory
>>> filesystem. One possible usage is that javac to use it as its 
>>> filesystem (temp?) to write out compiled
>>> class files ... so I thought I can have better performance if I can 
>>> keep those classes uncompressed
>>> until the zip/jarfs is closed and written to a "jar" file.
>>> That said, regression is a regression, we probably want to get the 
>>> performance back for your
>>> use scenario. Just wanted to give you guys some background what 
>>> happened back then.
>>> -Sherman
>>> On 4/16/19 12:54 PM, Lennart Börjeson wrote:
>>>> I’m using the tool I wrote to compress directories with thousands 
>>>> of log files. The standard zip utility (as well as my utility when 
>>>> run with JDK 12) takes up to an hour of user time to create the 
>>>> archive, on our server class 40+ core servers this is reduced to 
>>>> 1–2 minutes.
>>>> So while I understand the motivation for the change, I don’t get 
>>>> why you would want to use ZipFs for what in essence is a RAM disk, 
>>>> *unless* you want it compressed in memory?
>>>> Oh well. Do we need a new option for this?
>>>> /Lennart Börjeson
>>>> Electrogramma ab iPhono meo missum est
>>>>> 16 apr. 2019 kl. 21:44 skrev Xueming Shen <xueming.shen at>:
>>>>> One of the motivations back then is to speed up the performance of 
>>>>> accessing
>>>>> those entries, means you don't have to deflate/inflate those 
>>>>> new/updated entries
>>>>> during the lifetime of that zipfilesystem. Those updated entries 
>>>>> only get compressed
>>>>> when go to storage. So the regression is more like a trade off of 
>>>>> performance of
>>>>> different usages. (it also simplifies the logic on handing 
>>>>> different types of entries ...)
>>>>> One idea I experimented long time ago for jartool is to 
>>>>> concurrently write out
>>>>> entries when need compression ... it does gain some performance 
>>>>> improvement
>>>>> on multi-cores, but not lots, as it ends up coming back to the 
>>>>> main thread to
>>>>> write out to the underlying filesystem.
>>>>> -Sherman
>>>>>> On 4/16/19 5:21 AM, Claes Redestad wrote:
>>>>>> Both before and after this regression, it seems the default 
>>>>>> behavior is
>>>>>> not to use a temporary file (until ZFS.sync(), which writes to a 
>>>>>> temp
>>>>>> file and then moves it in place, but that's different from what 
>>>>>> happens
>>>>>> with the useTempFile option enabled). Instead entries (and the 
>>>>>> backing
>>>>>> zip file system) are kept in-memory.
>>>>>> The cause of the issue here is instead that no deflation happens 
>>>>>> until
>>>>>> sync(), even when writing to entries in-memory. Previously, the
>>>>>> deflation happened eagerly, then the result of that was copied into
>>>>>> the zip file during sync().
>>>>>> I've written a proof-of-concept patch that restores the behavior of
>>>>>> eagerly compressing entries when the method is METHOD_DEFLATED 
>>>>>> and the
>>>>>> target is to store byte[]s in-memory (the default scenario):
>>>>>> This restores performance of parallel zip to that of 11.0.2 for the
>>>>>> default case. It still has a similar regression for the case where
>>>>>> useTempFile is enabled, but that should be easily addressed if this
>>>>>> looks like a way forward?
>>>>>> (I've not yet created a bug as I got too caught up in trying to 
>>>>>> figure
>>>>>> out what was going on here...)
>>>>>> Thanks!
>>>>>> /Claes
>>>>>>> On 2019-04-16 09:29, Alan Bateman wrote:
>>>>>>>> On 15/04/2019 14:32, Lennart Börjeson wrote:
>>>>>>>> :
>>>>>>>> Previously, the deflation was done when in the call to 
>>>>>>>> Files.copy, thus executed in parallel, and the final 
>>>>>>>> ZipFileSystem.close() didn't do anything much.
>>>>>>> Can you submit a bug? When creating/updating a zip file with 
>>>>>>> zipfs then the closing the file system creates the zip file. 
>>>>>>> Someone needs to check but it may have been that the temporary 
>>>>>>> files (on the file system hosting the zip file) were deflated 
>>>>>>> when writing (which is surprising but may have been the case).
>>>>>>> -Alan
>> <>
>> <> 
>> <>
>> <>Lance 
>> Andersen| Principal Member of Technical Staff | +1.781.442.2037
>> Oracle Java Engineering
>> 1 Network Drive
>> Burlington, MA 01803
>> Lance.Andersen at <mailto:Lance.Andersen at>

More information about the core-libs-dev mailing list