Stream/IntermediateOp properties/flags

Paul Sandoz paul.sandoz at
Mon Oct 1 02:31:35 PDT 2012


Here is a patch that makes properties/flags for operations be invariant to the stream source:

this is based on top of the previous patch for refactoring terminal operations [1].

This feature can:

- improve the invariance of pipeline helper to the stream source;

- support detached pipelines, where the properties of the pipeline can be pre-calculated independent of the source properties; and

- make cleaner the IntermediateOp interface. Implementations only need to declare the logical or of properties, rather than perform more bit twizzles, which while not complicated are easily prone to errors.

- enable analysis of operations in the pipeline (e.g. remove redundant operations, UniqOp can operate differently if it is told, statically, that the upstream stream is will be sorted)


To support properties independent of the stream source three pieces of information are required:

 1) the property is known e.g. the output stream is known to be sorted
 2) the property is not known e.g. the output stream is not known to be sorted
   (there is subtle distinction between "not known to be" and "known not to be". An operation
    may monkey around with the elements such that they may no longer be distinct, e.g. MapOp,
    but certain inputs, and/or the map function, may result in certain outputs still being distinct). 
 3) the property is taken from upstream i.e. identity function

So two bits are required to represent this information. Given the number of properties we currently have (4) and a max of 16 properties fitting into an "int" i think there is ample space to cope with 2 bits instead of 1 bit.

A bit pattern of 0b01 represents a known property.
A bit pattern of 0b10 represents a not known property.
A bit pattern of 0b11 represents the "take whatever is upstream".

Then one can use a standard "& with the masks" then  "| with the values" pattern.

Given the bit patterns of properties it is possible to create a mask from the logical or of the properties themselves, thus enabling intermediate ops to simply declare the logical or of properties.

For now i have retained two methods on ops for comparison, e.g. in UniqueOp:

    public int getOpFlags() {
        return StreamProperties.FLAG_IS_DISTINCT | StreamProperties.FLAG_NOT_SIZED;

    public int getStreamFlags(int upstreamFlags) {
        // If the upstream is sorted, we need only cache last element
        // If the upstream is unique, this is a no-op
        return (upstreamFlags & StreamProperties.FLAG_MASK_SIZED & StreamProperties.FLAG_MASK_DISTINCT)
               | StreamProperties.FLAG_IS_DISTINCT | StreamProperties.FLAG_NOT_SIZED;

In AbstractPipeline the code is as follows to calculate the properties given the operations and pipeline:

            int opsFlags = StreamProperties.FLAG_MASK;
            for (int i = from; i < to; i++) {
                isIntermediateShortCircuit |= ops[i].isShortCircuit();

                // @@@ Declarative implementation, if IntermediateOp#getStreamFlags is removed
//                int opFlags = ops[i].getOpFlags();
//                opsFlags = (opsFlags & StreamProperties.getStreamFlagsMask(opFlags)) | opFlags;
                opsFlags = ops[i].getStreamFlags(opsFlags);

            this.flags = sourceFlags & opsFlags;

I am inclined to go with the simplicity of declarative approach for a slightly increased cost in the number bit twizzles, which i suspect is likely to add very little to the fixed cost of constructing a pipeline. A default method can be added to IntermediateOp if we really want to allow the choice:

  getStreamFlags(int upstreamFlags) default {
    int flags = getOpFlags();
    return (upstreamFlags & StreamProperties.getStreamFlagsMask(flags)) | flags;



More information about the lambda-dev mailing list