Incremental java compile AKA javac print compile dependencies
jonathan.gibbons at oracle.com
Wed May 26 18:56:35 PDT 2010
On 05/25/2010 06:42 PM, Joshua Maurice wrote:
> On Tue, May 25, 2010 at 6:38 PM, Joshua Maurice
> <joshuamaurice at gmail.com <mailto:joshuamaurice at gmail.com>> wrote:
> On Tue, May 25, 2010 at 6:01 PM, Jonathan Gibbons
> <jonathan.gibbons at oracle.com <mailto:jonathan.gibbons at oracle.com>>
> On 05/25/2010 05:11 PM, Joshua Maurice wrote:
> What is relevant is that to get decent levels of
> incremental, aka skipping unnecessary rebuilds, the build
> system needs to know for each java X, the full list of
> class files and java files which will be directly used by
> javac when compiling X. Whenever any of those direct
> compile dependencies have an "interface" / "signature"
> change, X needs to be recompiled.
> Stop right there. There's likely a wrong assumption here,
> hidden in the word "directly".
> If you start from scratch, with no classes precompiled, when
> you compile X, javac will pull in from the sourcepath the
> transitive closure of X and all its dependencies. Thus if X
> refers to Y, and if the implementation of Y refers to Z, then
> javac will compile X and Y and Z, even though there no direct
> reference in any way from X to Z. This is why your proposed
> technique of tracking -verbose output will not work.
> What? For starters, I'm planning on specifically not using the
> -sourcepath option. Suppose a user touches X only, and nothing
> else depends on X, like your example, and I want to only recompile
> X.java. However, if I give the -sourcepath option, then as you
> note, javac will recompile X, Y, and Z, but Y and Z are useless
> Here are some examples to further explain what I'm planning:
> Suppose X, Y, and Z are part of the same javac task. Touch Z.java.
> Do a build. The build system notes by rule 1 that Z.java is "out
> of date" (source file last modification time is newer than last
> compile time). It notes by rule 3b that Y.java is "out of date"
> (direct dependency java file in same javac task is "out of date").
> It then notes by rule 3b that X.java is "out of date" (direct
> dependency java file in same javac task is "out of date").
> Suppose X, Y, and Z are each part of different javac tasks, such
> as in different jars. Touch Z.java. Do a build. The build system
> notes by rule 1 that Z.java is out of date (source file last
> modification timestamp is newer than last compile time). It calls
> javac on Z.java. Z.class has the same "interface", so its last
> "interface change" time remains unchanged. The build system then
> finds no rule which makes Y or X out of date, so it does no
> further recompile.
> Suppose X, Y, and Z are each part of different javac tasks, such
> as in different jars. Modify Z.java so its "signature" /
> "interface" has changed. Do a build. The build system notes by
> rule 1 that Z.java is out of date (source file last modification
> timestamp is newer than last compile time). It calls javac on
> Z.java. Z.class when compared to the old one has a different
> "interface", so its last "interface change" is set to now. The
> build system then finds Y.java to be "out of date" by rule 3a (a
> direct dependency class file has a newer "last interface change"
> time than the last compilation time). Depending on if this affects
> the "interface" of Y, X then might also be found to be "out of
> date", or it might be found to be "up to date".
> Note that when they're part of the same javac task, I do cascade
> without termination downstream, to the extent of the javac task. I
> have made an educated \guess\ that this is a reasonably efficient
> way to get good parallelization, close to minimal rebuilds, and
> avoid a great deal of overhead of calling many separate javac
> There is a difference between the set of files needed to
> compile X, and the set of files on which X has a direct
> dependency (meaning that if they change, X needs to be
> recompiled.) To determine the set of files (or even better,
> the classes) on which X depends, you must either look at the
> classfile (which has the constant problem) or at the AST
> sometime after Attr.
> What? There is? No there isn't. There is no difference between:
> - the set A - the set of files needed to compile some java file X
> - the set B - the set of files which X has a direct dependency -
> meaning that if they change, the java file X needs to be recompiled.
> At least, perhaps a more intelligent / sophisticated build system
> could make such a distinction, but that is not my aim at the
> moment. I am being conservative at the moment, and if some class
> definition Y is required to compile X.java, then I find it quite
> reasonable that X.java's compilation might be different, or fail
> altogether, with a different class definition Y or an un-findable
> class definition Y.
> What do you propose is the difference between sets A and B? An
> example would be enlightening. (Unless we're talking about Ghost
> Dependencies, names which might refer to a different type or
> member depending on what's on the classpath and in the java files
> in the compile, such as A.B "hiding" A.B, where one of them is a
> package A, class B, and the other is a class A, and an inner class
> B. I don't think you're talking about Ghost Dependencies though.)
> PS: Hopefully we're not quibbling over the definition of "minimal
> rebuild". Yes, by a certain strict definition of minimal rebuild,
> where "equivalent to a full clean build" is defined as "the output
> class files display the same observable behavior over all 'allowed
> by documentation' inputs", then a minimal rebuild is equivalent to
> the Halting problem. However, if we define "equivalent to a full
> clean build" in terms of same binary contents of class files, then
> I'm inclined to think that it's not equivalent to the Halting
> problem, though I'm not sure. Either way, I'm going for a
> conservative approximation, one which is 100% correct, but may do
> unnecessary rebuilds, though preferably as little unnecessary
> rebuilds as "reasonable".
> Oh, nevermind. I'm sorry. I think I see your point now. You're talking
> about transitive dependencies vs direct dependencies. Yes, a change to
> a "transitive compile dependency" (quote unquote), may require a
> rebuild of me. I believe my above examples highlight how I plan to
> catch that. With the direct dependencies of javac -verbose, I could
> then construct the dependency graph and start recompiling out of date
> nodes. However, I do not want to cascade endlessly downstream, and to
> do that I need to know all possible \direct\ dependencies, to know
> that when I have a set of unchanged leaves of the cascade that there
> are no possible effects on nodes outside the rebuild portion of the
> graph from the changed portion.
You say "With the direct dependencies of javac -verbose". Unless
*all* other files have been compiled except the one you're interested
in, then -verbose is not going to give you direct dependencies. In the
worst case (no files have been compiled) then -verbose is going to give
you transitive dependencies.
The only reliable way to get the direct dependencies is to look at the
class files or to hook into javac and look at the AST at the right point
in the compilation.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the compiler-dev