Numerical Stream code

Peter Levart peter.levart at
Thu Feb 14 07:56:45 PST 2013

On 02/14/2013 03:45 PM, Brian Goetz wrote:
>> The parallel version is almost certainly suffering false cache line
>> sharing when adjacent tasks are writing to the shared arrays u0, etc.
>> Nothing to do with streams, just a standard parallelism gotcha.
> Cure: don't write to shared arrays from parallel tasks.

I would like to discuss this a little bit (hence the cc: 
concurrency-interest - the conversation can continue on this list only).

Is it really important to avoid writing to shared arrays from multiple 
threads (of course without synchronization, not even volatile 
writes/reads) when indexes are not shared (each thread writes/reads it's 
own disjunct subset).

Do element sizes matter (byte vs. short vs. int  vs. long)?

I had a (false?) feeling that cache lines are not invalidated when 
writes are performed without fences.

Also I don't know how short (byte, char) writes are combined into memory 
words on the hardware when they come from different cores and whether 
this is connected to any performance issues.



More information about the lambda-dev mailing list