deduplicating lambda methods

Maurizio Cimadamore maurizio.cimadamore at
Mon Mar 19 21:37:49 UTC 2018

On 18/03/18 02:04, Liam Miller-Cushon wrote:
> Hi Vicente,
> On Sat, Mar 17, 2018 at 11:44 AM Vicente Romero 
> <vicente.romero at <mailto:vicente.romero at>> wrote:
>     overall looks good, but I think that a set of regression tests
>     should be included along with the patch.
> Definitely, I have some functional tests that I'll work on turning 
> into a jtreg test.
> Do you have advice about how much test coverage is appropriate for 
> tree comparison?
> Getting to 100% branch coverage would require a lot of test cases. I 
> could probably
> generate some of them programatically. FWIW a certain amount of 
> TreeDiffer was
> generated and not hand-written, so any bugs are probably structural 
> problems rather
> than typos. (That doesn't lessen the need for tests, of 
> course--there's still the risk of
> adding typos when changes are made in the future.)
How about a test harness that takes a source file, compiles, takes its 
AST and then uses it as a 'golden' AST to compare it against other AST 
obtained from:

1) compiling same source file with some random harmless alteration (e.g. 
alpha renaming of identifiers)
2) compiling same source file with some structural alterations (e.g. 
drop all bodies from nested blocks)
3) or, compile another user-defined alteration

You could get both positive tests (compare AST against itself, or 
against alpha-renamed trees), and negative tests (compare AST against 
structural alterations).

If we got good at generating such random alteration I bet we could 
achieve quite good coverage by simply feeding already existing javac 
test sources to the fuzzing machinery.

>     at first glance it seemed to me that the hash function would
>     return the same value for:
>     (int i) -> i / 5; and (int i) -> 5 / i; I haven't test it but this
>     was my impression.
> The hash function currently traverses all nodes and hashes their tags, 
> so for those two
> examples it would see something like `[DIV, IDENT, LITERAL]` and 
> which would not result in the same value.
> This does mean that `x + 42` and `y + 43` both hashed to the same 
> value. I updated the
> hash function to include literal values and symbols (with the same 
> handling of method
> parameters as the diff). With that change I am no longer seeing hash 
> collisions that result in
> unnecessary diffing. There are still cases where two lambda bodies 
> hash to the same
> value but have different target types, but that case doesn't require a 
> tree diff (the check
> that the target types are the same happens first).
> The updates to hashing are here:
> <>

More information about the amber-dev mailing list