Ephemerons

Gil Tene gil at azul.com
Mon Jan 25 03:13:17 UTC 2016


> On Jan 24, 2016, at 12:14 PM, Peter Levart <peter.levart at gmail.com> wrote:
> 
> Hi Gil,
> 
> The algorithm complexity has been on my mind too and I've been thinking of how to maintain a key (or it's address) -> ephemeron mapping most efficiently.

This can vary. It's possible to do true O(1) reverse reference lookup in a very memory-wasteful way, obviously, but it's unlikely to be the path any implementation chooses to take. Simple tree based schemes will easily cap reverse reference lookup at O(log N) (and thereby overall Ephemeron processing at O(N  log N)). But my guess is that most implementations will opt for a hash based scheme, which will provide "usually O(1)" but cap be capped at O(log N) even with worst-case hashing behavior (e,g. a map scheme with buckets expand to trees instead of linked lists, much like the under-the-hood behavior of Java 8 HashMaps). Since both trees and hash schemes will only need storage space linear to the number of Ephemerons, I think reverse mapping solution will be straight forward to support.

BTW, we use a hash based scheme to maintain our forwarding pointer storage in C4 today, and it works very well. Hence my educated guess that hash based schemes will work well for this too.

> As I have written in a reply to Mandy, if we say that individual phase2 of processing of the normal References has complexity O(N) where N is the number of discovered references, then the equivalent phase2 processing of Ephemerons has comparable complexity O(N*d) where N is the number of discovered Ephemerons and d is the maximum number of "hops" in object graph where a "hop" is defined as a point in chain where the value of some ephemeron refers directly or indirectly to the key of some other ephemeron and the value of that other ephemeron continues the chain. Let's hope that in practical data structures, d is not large. The maximum value of d is N, as in one of the tests. The algorithm in the prototype tries to reduce the number of outer iterations to much less than 'd' in many situations by alternating the direction of scanning in each pass. The value(ephemeronX) -> key(ephemeronY) links could be arranged in a zig-zag pattern relative to the order of discovered references in the list though which would be the worst case for the algorithm, but I think the chances of that happening in a real-world data structure are low.

I've learned the hard way to not assume things about "real world" shapes and topologies for newly proposed constructs. Once they become available, people will use them in new and ingenious ways. Many of which will be very valid and very unanticipated. Since there are [by definition] no common uses of Ephemerons in Java yet, and Ephemerons are not commonly exposed (yet) in other languages and environments that commonly support concurrent thread activity, it's hard to anticipate how they may be put to use beyond the example cases of caches/maps/dictionaries. My guess is that there will be some interesting ways to apply them to concurrent algorithms that would be "hard" to build or solve otherwise.

So I would assume that there might be very legitimate use cases that would be natural to generate a very long linked list weaving through ephemeron keys and values. And there is nothing the multi-pass thing can do (by alternating or zig zagging) to reduce the number of passes needed to propogate across those linked ephemerons one-per-pass… A reverse lookup scheme will eliminate the need for iterative passes and eliminate this issue...

> But we have to think of possible denial-of-service attacks too, so I agree that worst-case scenario has to be improved.
> 
> Presently, Reference objects are discovered and hooked on single-linked _discoveredXXX lists (using Reference.discovered field to point to the next in list). Each reference type has it's own set of lists. Multiple lists in a set enable parallel (concurrent?)     discovery without contention (a list per thread).
> 
> So I was thinking that one possibility would be for ephemerons to have a bigger table of discovered lists - not just one per discovery thread, but quite bigger and that the index of a list to which discovered ephemeron is added would be computed from the ephemeron's key. A classical hash-table with buckets. Big enough table would minimize contention when discovery is parallelized and enable the following algorithm (which could also be parallelized):
> 
> Let each bucket (each head of the list) have 2 boolean flags: include & pending.
> At the start, set include flags of all buckets to true and then enter the loop:
> do {
>     reset pending flags of all buckets to false;
>     for each bucket with (include == true), do {
>         scan the ephemerons and for those with live keys, mark value and transitive closure as live.
>         while marking the value and transitive closure as live, for each object that was newly marked alive,
>         compute the index into the ephemeron buckets as though such object was ephemeron's key
>         and set the pending flag of that bucket.
>     }
>     set the include flags of all buckets from the pending flags of the buckets and count # of pending buckets
> } while (# of pending buckets > 0);

Interesting. I think that this can be described as a "pessimistically estimated reverse reference scheme", where you don't actually track reverse references, but pessimistically (with some hash) act as if you hit them when you newly mark an object live, causing a force-re-evaluate ephemerons that have been "hit" by the hash.

But since I think that an actual reverse reference mapping scheme is fairly simple to implement, and is both "simpler" in many ways and would result in less GC work (no false positive checks forced on ephemerons when marking happens to hit them with a hash), I'd stick with that...

> 
> Hm, ....
> 
> 
> Regards, Peter
> 
> On 01/24/2016 06:43 PM, Gil Tene wrote:
>> A note on implementation possibilities:
>> 
>> If I read the implementation correctly, a "weakness" of the current implementation approach for making sure value-referents (and their transitively reachable) objects are kept alive if key referents are alive is that it requires multiple passes through the discovered Ephemeron list, with the passes terminating when the list stabilizes. While I think that this is sound (in the sense that it will work), it carries a potentially high cost when large sets of Ephemerons exist in the heap. E.g. if the Ephemerons are linked in a k-v list (where the value referent of one ephemeron is the key of another, in a chain), as in your code example, there is an N^2 scanning thing going on. And e.g. if a large set of ephemeron keys become weakly reachable in a single cycle (e.g. a large cache was discarded) while other ephemeron participate in some linked list relationship, the entire list of [stably] weak-keyed ephemerons has to be traversed in each pass (in case one of them has become live in a previous pass). I'd worry that these computational complexity issues could become prohibitive enough in GC cost that you there would be significant resistance to their adoption.
>> 
>> Note that in comparison (to my understanding), current ref processing work involved in GC handling soft/weak/final/phantom refs remains linear to the number of refs, and does not have an O(N^2) component.
>> 
>> I believe that there is a relatively simple way to bring Ephemeron processing to O(N) by establishing reverse mapping during the scan (the below description assumes STW during the scan):
>> 
>> 1. Start ref processing with no reverse mapping table established.
>> 
>> 2. During ref processing, establish an EphemeronKeyReverseMapping table (logically a Map<Address, List<Ephemerons>>) which would map individual heap addresses to ephemerons who's key referent points to those addresses.
>> 
>> 3. Note that since each heap address can show up in multiple key referents, the map needs to return a (potentially empty) list of Ephemerons who's keys refer to the address.
>> 
>> 4. Specifically, starting with an empty list, and for each discovered Ephemeron, add a reverse-mapping entry to the EphemeronKeyReverseMapping, mapping from the key referent address to the Ephemeron.
>> 
>> 5. During Ephemeron processing (under the case where the referent is found to be alive and the ephemeron then needs to keep the value referent alive) mark down the value referent path using a special ephemeron_keep_alive OopClosure (or a mode flag that affects the normal keep_alive behavior) which, when reaching a not-yet-marked-live object [in addition to marking it live and traversing it as keep_alive would normally do] would look up the object's address in the EphemeronKeyReverseMapping to get a list of Ephemerons to traverse, and traverse each of the mapped Ephemerons' value referent with the same ephemeron_keep_alive closure.
>> Note: doing reverse-mapping lookups on each not-yet-marked object in a keep_alive closure will add cost, which is why this ephemeron_keep_alive pass should probably be done after regular keep_alive passes have had their chance to mark objects live. This way only paths that become newly-live via ephemeron processing are subject to the extra reverse-mapping-lookip cost.
>> 
>> While I haven't been poking at this too long to see if it has holes, I think it can produce a reliable result, and is O(N) on the count of Ephemerons.
>> 
>> — Gil.
>> 
>> 
>>> On Jan 24, 2016, at 2:52 AM, Peter Levart < <mailto:peter.levart at gmail.com>peter.levart at gmail.com <mailto:peter.levart at gmail.com>> wrote:
>>> 
>>> Hi Gil,
>>> 
>>> I totally agree with your assessment. We should not introduce another way of reviving the almost collectable objects and I fully support tightening the specification so that soft and weak references to the same referent and to other referents from which this referent is reachable are required to be cleared together atomically.
>>> 
>>> I modified the prototype to (hopefully) adhere to this new Ephemeron specification that Gil and I agreed upon. Anyone interested in experimenting can find it here:
>>> 
>>> http://cr.openjdk.java.net/~plevart/misc/Ephemeron/webrev.jdk.02/ <http://cr.openjdk.java.net/%7Eplevart/misc/Ephemeron/webrev.jdk.02/>
>>> http://cr.openjdk.java.net/~plevart/misc/Ephemeron/webrev.hotspot.02/ <http://cr.openjdk.java.net/%7Eplevart/misc/Ephemeron/webrev.hotspot.02/>
>>> 
>>> It is rebased to current tip of jdk9-dev repositories (after the bulk of merges for jdk-9+102), but still contains the change to remove the Cleaner reference type as it has not yet managed to get in...
>>> 
>>> I have also added a test that is a start for verifying the functionality.
>>> 
>>> Regards, Peter
>>> 
>>> On 01/23/2016 07:25 PM, Gil Tene wrote:
>>>> 
>>>>> On Jan 23, 2016, at 5:14 AM, Peter Levart <peter.levart at gmail.com <mailto:peter.levart at gmail.com>> wrote:
>>>>> 
>>>>> Hi Gil, it's good to have this discussion. See comments inline...
>>>>> 
>>>>> On 01/23/2016 05:13 AM, Gil Tene wrote:
>>>>> ....
>>>>>>> On Jan 22, 2016, at 2:49 PM, Peter Levart < <mailto:peter.levart at gmail.com>peter.levart at gmail.com <mailto:peter.levart at gmail.com>> wrote:
>>>>>>> 
>>>>>>> Ephemeron always touches definitions of at least two consecutive strengths of reachabilities. The prototype says:
>>>>>>> 
>>>>>>>  * <li> An object is <em>weakly reachable</em> if it is neither
>>>>>>>  * strongly nor softly reachable but can be reached by traversing a
>>>>>>>  * weak reference or by traversing an ephemeron through it's value while
>>>>>>>  * the ephemeron's key is at least weakly reachable.
>>>>>>> 
>>>>>>>  * <li> An object is <em>ephemerally reachable</em> if it is neither
>>>>>>>  * strongly, softly nor weakly reachable but can be reached by traversing an
>>>>>>>  * ephemeron through it's key or by traversing an ephemeron through it's value
>>>>>>>  * while it's key is at most ephemerally reachable. When the ephemerons that
>>>>>>>  * refer to ephemerally reachable key object are cleared, the key object becomes
>>>>>>>  * eligible for finalization.
>>>>>> 
>>>>>> Looking into this a bit more, I don't think the above is quite right. Specifically, If an ephemeron's key is either strongly of softly reachable, you want the value to remain appropriately strongly/softly reachable. Without this quality, Ephemeron value referents can (and will) be prematurely collected and finalized while the keys are not. This (IMO) needed quality not provided by the behavior you specify…
>>>>> 
>>>>> This is not quite true. While ephemeron's value is weakly or even ephemerally-reachable, it is not finalizable, because ephemeraly-reachable is stronger than finaly-reachable. After ephemeron's key becomes ephemeraly-reachable, the ephemeron is cleared by GC which sets it's key *and* value to null atomically. The life of key and value at that moment becomes untangled. Either of them can have a finalizer or not and both of them will eventually be collected if not revived by their finalize() methods. But it can never happen that ephemeron's value is finalized or collected while it's key is still reachable through the ephemeron (while the ephemeron is not cleared yet).
>>>>> 
>>>>> But I agree that it would be desirable for ephemeron's value to follow the reachability of it's key. In above specification, if the key is strongly reachable, the value is weakly reachable, so any WeakReferences or SoftReferences pointing at the Ephemeron's value can already be cleared while the key is still strongly reachable. This is arguably no different than current specification of Soft vs. Weak references. A SoftReference can already be cleared while its referent is still reachable through a WeakReference,
>>>> 
>>>> We seem to agree about the cleaner behavior specification (in both of our texts below), so the these next paragraphs are really about arguing for why this is an important design choice if/when adding Ephemerons to Java:
>>>> 
>>>> It is true the [current] spec allows for soft references to an object to be cleared while weak references to the same object are not: the "determines" in "Suppose that the garbage collector determines at a certain point in time hat an object is RRRR reachable..." part [for RRRR = {soft, weak}] does not have to happen at the same "certain point in time".
>>>> 
>>>> However, to my knowledge all current implementations present as if this determination is happening at the same "point in time" for all weakly and softly reachable objects combined. Specifically [in implementations]: if soft reachability is determined for an object at some point in time, then weak reachability for that object is determined at the same point in time. And the weak reachability determination for an object depends on whether the collector chose to clear existing soft references to that object at that same point in time, with the appearance of the choice to clear (or not to clear) soft references to a given object atomically affecting the determination of it's weak reachability. Since the collector is *required* to act on a weak determination when it is made, while it *may* act on a soft determination when it is made, making the combined determination at the same "point in time" eliminates an obviously confusing situation that is not prohibited by the spec: if the determination for weak and soft reachability was not done at the same point in time, then an object that was softly reachable and had it's soft references cleared and queued could later become strongly reachable, and even softly reachable again. When reference processing is done as a STW thing, this "combined determination" effect is a trivial side-effect of STW. When it is done concurrently (or incrementally?), implementations still work to maintain the appearance of combined atomic determination of soft and weak reachability. I know ours does. In our case, we do it because we had no desire to be the ones to argue "I know that all implementations did this atomically because they were STW, but the spec allows us to add this bug to your program…".
>>>> 
>>>> So in actual implementations (to my knowledge), finalization is currently the only mechanism that can create this "strange situation" where an object was no longer strongly reachable, had actions triggered as a result from loss of strong reachability (i.e. actually observed by the program as "known to not be strongly reachable"), and later became strongly reachable again. E.g. a finalizer can propagate a strong reference to a previously non-strongly reachable object ('this' in the finalizer, or anything that 'this' transitively refers and was not otherwise reachable when the finalizer was called).. This is one of those "undesired" things that the introduction of Reference types was meant to deal with (Reference types were introduced in 1.2, after finalization was unfortunately already included and spec'ed. And phantom refs were meant to allow for a cleaner form that could replace finalization). And while the specifications of SoftReference and WeakReference do not prohibit it, implementations are not required to allow it, and in practice non of them do (I think), as doing so would most likely expose some "interesting" spec-allowed-but-extremely-surprising things/bugs that none of us want to have to defend...
>>>> 
>>>> In this context, it would be a "highly undesirable" design choice to introduce Ephemerons in a way that would them to return a strong reference to an object that has previously been determined to no longer be strongly reachable. Structuring the spec to prohibit this is a better design choice.
>>>> 
>>>> To highlight the design choice here, let me describe a specific problem scenario for which the previous (above) spec would cause "re-strengthening" behavior that would break assumptions that are allowed under the current spec: in the above/previously specified behavior an object V that is known to have no finalizers, but has e.g. 3 WeakReference objects that refer to it, can become weakly reachable while both a key referent object K in some ephemeron E with a value referent of V remain strongly reachable. At such a point (V is weakly reachable, K and E are strongly reachable), the collector may determine weak reachability for V, [atomically] clear all weak references to V, and enqueue those weak reference objects on their respective queues. While V is still ephemerally reachable under your previous definition, there are no references to it anywhere other than in ephemeron value referent fields, and weak references that did refer to it have been cleared and queued. Since the ephemeron is still there, and the key is still there, and the ephemeron has not been cleared, an Ephemeron.getValue() call would create a strong reference to an object that was previously determined to not be weakly reachable. Re-creating a strong reference to V after the point where weak references to V were cleared and the weak refs to it were enqueued would be "surprising" to current weak reference based code (the only thing that could cause this under the current spec would be a finalizer), so allowing that (jn the spec) is likely to break all kinds of logic that depends on currently spec'ed weak reference behaviors.
>>>> 
>>>> The spec'ed behavior we seem to be agreeing on (below) would prohibit this loophole and would [I think] maintain any reachability-based expectations that current weak-ref based logic can make under the current spec. Maintaining this continuity is an important design choice for adding Ephemerons into the current set of Reference behaviors.
>>>> 
>>>> And since I suspect that all implementations will continue to choose to do the "determination" of soft and weak reachability at the same "point in time", this will fit well with how people would build this stuff anyway.
>>>> 
>>>> Separate note: It would be separately interesting to consider narrowing the SoftRef spec to require JVM implementations to atomically clear all soft *and* weak references to an object at the same time. I.e. if the garbage collector chooses to clear a soft reference to an object that would become weakly reachable as a result, then all weak references to that object must be [atomically] cleared at the same time. Since I suspect that all current JVM implementations actually adhere to this stronger requirement already, this would not "hurt" anything or require extra work to comply with. [Anyone from Metronome or some other non-STW reference processing implementations want to chime in?].
>>>> 
>>>>> but for Ephemeron's value this might be confusing. The easier to understand conceptual model for Ephemerons might be a pair of (WeakReference<K>, WeakReference<V>) where the key has a virtual strong reference to the value. And this is what we get if we say that reachability of the value follows reachability of the key.
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> For a correctly specified behavior, I think all strengths (from strong down) need to be affected by key/value Ephemeron relationships, but without adding an "ephemerally reachable" strength. E.g. I think you fundamentally need something like this:
>>>>>> 
>>>>>> - "An object is <em>strongly reachable</em> if it can be reached by (a) some thread without traversing any reference objects, or by (b) traversing the value of an Ephemeron whose key is strongly reachable. A newly-created object is strongly reachable by the thread that created it"
>>>>>> 
>>>>>> - "An object is <em>softly reachable</em> if it is not strongly reachable but can be reached by (a) traversing a soft reference or by (b) traversing the value of an Ephemeron whose key is softly reachable.
>>>>>> 
>>>>>> - "An object is <em>weakly reachable</em> if it is neither strongly nor softly reachable but can be reached by (a) traversing a weak reference or by (b) traversing the value of an ephemeron whose key is weakly reachable.
>>>>> 
>>>>> ...and that's where we stop, because when we make Ephemeron just a special kind of WeakReference, the next thing that happens is:
>>>>> 
>>>>>  * <p> Suppose that the garbage collector determines at a certain point in time
>>>>>  * that an object is <a href="package-summary.html#reachability">weakly
>>>>>  * reachable</a>.  At that time it will atomically clear all weak references to
>>>>>  * that object and all weak references to any other weakly-reachable objects
>>>>>  * from which that object is reachable through a chain of strong and soft
>>>>>  * references.  At the same time it will declare all of the formerly
>>>>>  * weakly-reachable objects to be finalizable.  At the same time or at some
>>>>>  * later time it will enqueue those newly-cleared weak references that are
>>>>>  * registered with reference queues.
>>>>> 
>>>>> ...where "clearing of the WeakReference" means reseting the key *and* value to null in case it is an Ephemeron; and
>>>>> "all weak references to some object" means Ephemerons that have that object as a key (but not those that only have it as a value!) in case of ephemerons
>>>>> 
>>>>> ...
>>>>>> I still think that Ephemeron<K, V> should extend WeakReference<K>, since that places already established rules and expectation on (a) when it will be enqueued, (b) when the collector will clear it (when the the collector encounters the <K> key being weakly reachable), and (c) that clearing of all Ephemeron *and* WeakReference instances who share an identical key value is done atomically, along with (d) all weak references to to any other weakly-reachable objects from which that object is reachable through a chain of strong and soft references. These last (c, d) parts are critically captured since an Ephemeron *is a* WeakReference, and the statement in WeakReference that says that "… it will atomically clear all weak references to that object and all weak references to any other weakly-reachable objects from which that object is reachable through a chain of strong and soft references." has a clear application.
>>>>>> 
>>>>>> Here are some suggested edits to the JavaDoc to go with this suggested spec'ed behavior:
>>>>>> /**
>>>>>>   * Ephemeron<K, V> objects are a special kind of WeakReference<K> objects, which
>>>>>>   * hold two referents (a key referent and a value referent) and do not prevent their
>>>>>>   * referents from being made finalizable, finalized, and then reclaimed.
>>>>>>   * In addition to the key referent, which adheres to the referent behavior of a
>>>>>>   * WeakReference<K>, an ephemeron also holds a value referent whose reachabiliy
>>>>>>   * strength is affected by the reachability strength of the key referent:
>>>>>>   * The value referent of an Ephemeron instance is considered:
>>>>>>   * (a) strongly reachable if the key referent of the same Ephemeron
>>>>>>   * object is strongly reachable, or if the value referent is otherwise strongly reachable.
>>>>>>   * (b) softly reachable if it is not strongly reachable, and (i) the key referent of
>>>>>>   * the same Ephemeron object is softly reachable, or (ii) if the value referent is otherwise
>>>>>>   * softly reachable.
>>>>>>   * (c) weakly reachable if it is not strongly or softly reachable, and (i) the key referent of
>>>>>>   * the same Ephemeron object is weakly reachable, or (ii) if the value referent is otherwise
>>>>>>   * weakly reachable.
>>>>>>   * <p> When the collector clears an Ephemeron object instance (according to the rules
>>>>>>   * expressed for clearing WeakReference object instances), the Ephemeron instance's
>>>>>>   * key referent value referent are simultaneously and atomically cleared.
>>>>>>   * <p> By convenience, the Ephemeron's referent is also called the key, and can be
>>>>>>   * obtained either by invoking {@link #get} or {@link #getKey} while the value
>>>>>>   * can be obtained by invoking {@link #getValue} method.
>>>>>>   *...
>>>>> 
>>>>> 
>>>>> Thanks, this is very nice. I do like this behavior more.
>>>>> 
>>>>> Let me see what it takes to implement this strategy...
>>>>> 
>>>>> Regards, Peter
>>>>> 
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20160125/dca41644/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20160125/dca41644/signature.asc>


More information about the hotspot-gc-dev mailing list