deduplicating lambda methods
vicente.romero at oracle.com
Tue Mar 20 12:42:56 UTC 2018
On 03/19/2018 05:37 PM, Maurizio Cimadamore wrote:
> On 18/03/18 02:04, Liam Miller-Cushon wrote:
>> Hi Vicente,
>> On Sat, Mar 17, 2018 at 11:44 AM Vicente Romero
>> <vicente.romero at oracle.com <mailto:vicente.romero at oracle.com>> wrote:
>> overall looks good, but I think that a set of regression tests
>> should be included along with the patch.
>> Definitely, I have some functional tests that I'll work on turning
>> into a jtreg test.
>> Do you have advice about how much test coverage is appropriate for
>> tree comparison?
>> Getting to 100% branch coverage would require a lot of test cases. I
>> could probably
>> generate some of them programatically. FWIW a certain amount of
>> TreeDiffer was
>> generated and not hand-written, so any bugs are probably structural
>> problems rather
>> than typos. (That doesn't lessen the need for tests, of
>> course--there's still the risk of
>> adding typos when changes are made in the future.)
> How about a test harness that takes a source file, compiles, takes its
> AST and then uses it as a 'golden' AST to compare it against other AST
> obtained from:
> 1) compiling same source file with some random harmless alteration
> (e.g. alpha renaming of identifiers)
> 2) compiling same source file with some structural alterations (e.g.
> drop all bodies from nested blocks)
> 3) or, compile another user-defined alteration
> You could get both positive tests (compare AST against itself, or
> against alpha-renamed trees), and negative tests (compare AST against
> structural alterations).
> If we got good at generating such random alteration I bet we could
> achieve quite good coverage by simply feeding already existing javac
> test sources to the fuzzing machinery.
>> at first glance it seemed to me that the hash function would
>> return the same value for:
>> (int i) -> i / 5; and (int i) -> 5 / i; I haven't test it but
>> this was my impression.
>> The hash function currently traverses all nodes and hashes their
>> tags, so for those two
>> examples it would see something like `[DIV, IDENT, LITERAL]` and
>> `[DIV, LITERAL, IDENT]`,
>> which would not result in the same value.
>> This does mean that `x + 42` and `y + 43` both hashed to the same
>> value. I updated the
>> hash function to include literal values and symbols (with the same
>> handling of method
>> parameters as the diff). With that change I am no longer seeing hash
>> collisions that result in
>> unnecessary diffing. There are still cases where two lambda bodies
>> hash to the same
>> value but have different target types, but that case doesn't require
>> a tree diff (the check
>> that the target types are the same happens first).
>> The updates to hashing are here:
More information about the amber-dev