Numerical Stream code
peter.levart at gmail.com
Thu Feb 14 07:56:45 PST 2013
On 02/14/2013 03:45 PM, Brian Goetz wrote:
>> The parallel version is almost certainly suffering false cache line
>> sharing when adjacent tasks are writing to the shared arrays u0, etc.
>> Nothing to do with streams, just a standard parallelism gotcha.
> Cure: don't write to shared arrays from parallel tasks.
I would like to discuss this a little bit (hence the cc:
concurrency-interest - the conversation can continue on this list only).
Is it really important to avoid writing to shared arrays from multiple
threads (of course without synchronization, not even volatile
writes/reads) when indexes are not shared (each thread writes/reads it's
own disjunct subset).
Do element sizes matter (byte vs. short vs. int vs. long)?
I had a (false?) feeling that cache lines are not invalidated when
writes are performed without fences.
Also I don't know how short (byte, char) writes are combined into memory
words on the hardware when they come from different cores and whether
this is connected to any performance issues.
More information about the lambda-dev