Suspected regression: fix for 6735255 causes delay in GC of ZipFile InputStreams, increase in heap demand

Xueming Shen xueming.shen at
Tue Mar 29 19:05:12 UTC 2011

Hi Neil,

It appears to be a "regression" in scenario you described (user 
application never close the
input stream after use and the ZipFile instance being retained during 
the lifetime of the
process). The proposed approach seems to solve this particular problem.

Here are my comments regarding the patch.

(1) ZipFileInflaterInputStream.close() now synchronizes on Zipfile.this, 
is it really necessary?
      given the rest of the implementation doesn't guarantee the 
returned InputStream object
      is thread-safe (the implementation does make sure the access to 
the underlying
      ZipFile is thread-safe, though). The new finalize() now also 
invokes close(), which means
      the ZipFile object now gets locked twice (assume the good manner 
application does
      invoke the close() after use) for each InputStream after use.

(2) Is there any particular reason that we don't want to "reuse" the 
Inflater(), if the close
      comes from the finalize() in your testing evn? One thing we 
noticed before is that the
      moment  the InputStream gets finalized, the Inflater embedded 
might have been
      finalized already, this is the reason why we have a "inf.ended()" 
check in releaseInflater().

      The disadvantage/limitation of existing Inflater caching mechanism 
is that it does not
      have a "cap" to limit the maximum number of inflater it should 
cache. Are you worried
      too many inflaters get cached, if they only get collected during 
GC/finalize()? This actually
      indicates that the cache mechanism does not work at all in your 
scenario, even with the
      proposed fix. An alternative solution (better?) is to make ZFIIS 
to check the bytes read
      during its  read() method, and pro-actively close the stream when 
it reaches the end of
      the stream,  same as what ZipFileInputStream does, which I believe 
should solve the
      problem in most use scenario (except, the application does not 
bother reading the
      InputSteeam to its end, which is unlikely).

(3) The attached test case does "f.close()" in its finally block, I 
guess it should be f.delete()?


On 03/23/2011 02:46 PM, Neil Richards wrote:
> Hi,
> I've noticed that the fix introduced to address bug 6735255 [1] [2] had
> an unfortunate side-effect.
> That fix stores references to the InputStream objects returned from
> ZipFile.getInputStream(ZipEntry) in a HashSet (within ZipFile), so that
> the close() method may be called upon those objects when ZipFile.close()
> is called.
> It does this to conform to the existing API description, and to avoid a
> native memory leak which would occur if the InputStream is GC'd without
> its close() being called.
> However, by holding these InputStreams in a set within their ZipFile
> object, their lifecycle is now prolonged until the ZipFile object either
> has its close() method called, or is finalized (prior to GC).
> I've found scenarios (in user code) were ZipFile objects are held onto
> for a long time (eg. the entire lifetime of the process), but where
> InputStream objects from that ZipFile (for individual entries in the zip
> file) are obtained, used, then abandoned on a large number of occasions.
> (An example here might be a user-defined, long-lasting class loader,
> which might retain a ZipFile instance for each jar file in its search
> path, but which will create transitory input streams to read out
> particular files from those (large) jar files).
> I see that the InputStream objects returned from
> ZipFile.getInputStream(ZipEntry) tend to hold references to a couple of
> sizeable byte arrays, one 512 bytes in size, the other 1002 bytes.
> So by prolonging the life span of these objects, the fix for 6735255 can
> inadvertently cause a significant increase in the demand/usage on the
> heap (in practice, running into many Mb's).
> (This is not so if the user code is good-mannered enough to explicitly
> always call close() on the InputStreams in question.
> However, experience shows that user code cannot be relied upon to behave
> so benignly).
> I believe this introduced problem can be addressed by:
>       1. Holding references to the InputStreams in the 'streams' Set
>          within ZipFile via WeakReferences, so they may be GC'd as soon
>          as they are no longer externally (strongly) referenced.
>       2. Adding finalization logic to the InputStream implementation
>          classes, which ensures their close() method is called prior to
>          GC.
>       3. Prevent Inflater objects from being returned to the pool (in the
>          ZipFile object) if close() has been called from finalize().
> To that end, please find below an 'hg export' of a proposed change which
> implements these aspects, together with a testcase to demonstrate the
> problem/fix.
> Any comments / suggestions on this gratefully received,
> Thanks,
> Neil
> [1]
> [2]

More information about the core-libs-dev mailing list