Hello, and other things
Jason.Fordham at Sun.COM
Sun Mar 16 08:54:19 PDT 2008
On 3/14/2008 4:47 PM, John Rose wrote:
> The hard part, though, is the essentially untyped nature of C memory.
> I've seen C implementations that run over typed heaps, but they
> are artful compromises, rather than simple ports to a new backend.
> Centerline C and Zeta-C come to mind. (Both are old projects, that
> may pre-date the Google cache. I don't have references handy.)
It seems to me that the ability of (machineRadix *)pointers to overrun -
above and below - the arrays they were based on is a feature of C. The
memory model I'm proposing makes it possible to leverage the existing
code generation models, and the libraries.
> The latter was a C compiler for the Symbolic Lisp Machine which
> used ordered pairs (cons cells) for all C pointers, to represent the
> combination of a base address and an arbitrary offset.
> A similar product was Bounds-Check C, which widened
> pointers into little 3-tuples (min, max, cur). The idea is
> that a tuple-based pointer will never be allowed to "reach
> beyond" the heap object it was created for; such operations
> are always indeterminate, since there is no guaranteed
> distance (or ordering) of heap objects, from one instruction
> to the next, in a system like the Symbolics with a powerful GC.
While I understand that many C programmers have a secret wish that the
GC in GCC could stand for Garbage Collection, it doesn't: I think that
it's OK to avoid the Java GC; philosophically, I regard the ability to
leave malloced objects on the heap without references to them as a C
"feature", just like buffer over/underruns.
> That would work very nicely on the JVM also. You could use
> the sun.misc.Unsafe API (with great care!) to handle punning
> among memory-resident primitive types. You must avoid
> using Unsafe to pun between primitives and references, because
> there is absolutely no way to control when the GC might want
> to move things around underneath your code.
I hadn't come across this before, and it doesn't seem to have any
documentation! Given your limited description of the features, it sounds
as though it would be very easy to leave a gap where the compiler could
be used to break Java protection, which I would not want to do.
>> The key obstacles I see are that the instruction set makes
>> a C-like stack expensive: there are no neat push and pop operations
>> this memory model, it feels like microcoding. Though I understand the
>> motivation, which is to protect the bytecodes from malicious or
>> lazy use
>> of buffer overflows, and other mechanisms for executing data.
> The stack is really just a shorthand for operand renaming.
> Feel free to generate code to a register-to-register machine,
> and map your virtual registers to JVM locals.
Again, I'm inclined to retain the classic stack-based calling pragma in
the memory model, because it makes it trivial to construct and
manipulate pointers to C objects allocated in the local frame - they're
the same as pointers to objects on the heap, because they're in the same
untyped array - machineRadix memory.
>> I like the method handle mechanism, for a variety of reasons, and I
>> would like to see some easing up on where the a stack is located so
>> operations which index into the stack are more flexible, and fast. Is
>> this possible?
> If you need a memory-resident stack, you can just build an array
> to hold it, can't you? I'm not sure where the pain point is here, yet.
Stack operations - manipulating and indexing the BP and SP - will be
frequent multi-bytecode operations. I don't know how well the JIT
compiler will work out what's going on.
More information about the mlvm-dev