Re: Thread and Container locality

2015-10-02 Thread Vlad Rozov
Pramod and I are looking into replacing/enhancing CircularBuffer with SPSC. One challenge is supporting FrozenIterator as SPSC and MPMC do not support iterators. We may also want to deprecate or at minimum stop using UnsafeBlockingQueue as SPSC.size() overestimates actual number of elements in

Re: Thread and Container locality

2015-10-02 Thread Chetan Narsude
Excellent points about low level performance optimizations in SPSC. A few things to learn there. Thanks for sharing. I wish I had come across this material earlier. CircularBuffer is ad hoc knowledge. It misses out on cache misses in CPU ( would have never guessed that - truly a silent killer) and

Re: Thread and Container locality

2015-09-30 Thread Pramod Immaneni
With Mpmc I get upto 20M for container local. Lets sync up for doing across nodes. Thanks On Wed, Sep 30, 2015 at 8:31 AM, Vlad Rozov wrote: > Blog and presentation on algorithms behind JCTools: > > http://psy-lob-saw.blogspot.com/p/lock-free-queues.html > https://vimeo.com/100197431 > > Thank

Re: Thread and Container locality

2015-09-30 Thread Vlad Rozov
Blog and presentation on algorithms behind JCTools: http://psy-lob-saw.blogspot.com/p/lock-free-queues.html https://vimeo.com/100197431 Thank you, Vlad On 9/29/15 21:14, Vlad Rozov wrote: I guess yes, it should show improvement every time there is consumer/producer contention on a resource fr

Re: Thread and Container locality

2015-09-29 Thread Vlad Rozov
I guess yes, it should show improvement every time there is consumer/producer contention on a resource from two different threads, so we should see improvements in the buffer server as well. The current prototype does not support containers on different nodes. Thank you, Vlad On 9/29/15 20:4

Re: Thread and Container locality

2015-09-29 Thread Pramod Immaneni
Would it show any improvement in the case where the containers are on different nodes. On Tue, Sep 29, 2015 at 7:17 PM, Vlad Rozov wrote: > By changing QUEUE_CAPACITY to 120 I can get around 62 mil tuples for > the case when wordGenerator emits the same tuple and 34 mil when it > generates n

Re: Thread and Container locality

2015-09-29 Thread Vlad Rozov
By changing QUEUE_CAPACITY to 120 I can get around 62 mil tuples for the case when wordGenerator emits the same tuple and 34 mil when it generates new tuples each time. Thank you, Vlad On 9/29/15 17:08, Vlad Rozov wrote: 3 mil for container local and 55 mil for thread local. Thank you,

Re: Thread and Container locality

2015-09-29 Thread Vlad Rozov
3 mil for container local and 55 mil for thread local. Thank you, Vlad On 9/29/15 16:57, Chetan Narsude wrote: Vlad, what was the number without this fix? -- Chetan On Tue, Sep 29, 2015 at 4:48 PM, Vlad Rozov wrote: I did a quick prototype that uses http://jctools.github.io/JCTools SPSC

Re: Thread and Container locality

2015-09-29 Thread Chetan Narsude
Vlad, what was the number without this fix? -- Chetan On Tue, Sep 29, 2015 at 4:48 PM, Vlad Rozov wrote: > I did a quick prototype that uses http://jctools.github.io/JCTools SPSC > bounded queue instead of CircularBuffer. For container local I now see 13 > mil tuples per second. > > Thank you,

Re: Thread and Container locality

2015-09-29 Thread Vlad Rozov
I did a quick prototype that uses http://jctools.github.io/JCTools SPSC bounded queue instead of CircularBuffer. For container local I now see 13 mil tuples per second. Thank you, Vlad On 9/28/15 12:58, Chetan Narsude wrote: Let me shed some light on THREAD

Re: Thread and Container locality

2015-09-28 Thread Chetan Narsude
Let me shed some light on THREAD_LOCAL and CONTAINER_LOCAL. THREAD_LOCAL at the core is nothing but a function call. When an operator does emit(tuple), it gets translated in downstream ports "process(tuple)" call which immediately gets invoked in the same thread. So obviously the performance is g

Re: Thread and Container locality

2015-09-28 Thread Vlad Rozov
Hi Tim, I use benchmark application that is part of Apache Malhar project. Please let me know if you need help with compiling or running the application. Thank you, Vlad On 9/28/15 11:09, Timothy Farkas wrote: Also sharing a diff https://github.com/DataTorrent/Netlet/compare/master...iloo

Re: Thread and Container locality

2015-09-28 Thread Munagala Ramanath
I wrote a quick benchmark program appended below; here are the results of running it on my laptop: ram@ram-laptop:threads: time java Volatile 1 nThreads = 1 MAX_VALUE reached, exiting real0m13.834s user0m13.829s sys0m0.024s ram@ram-laptop:threads: time java Volatile 2 nThreads = 2 M

Re: Thread and Container locality

2015-09-28 Thread Vlad Rozov
both threads increment static volatile long in a loop until it is less than Integer.MAX_VALUE. Thank you, Vlad On 9/28/15 10:56, Pramod Immaneni wrote: Vlad what was your mode of interaction/ordering between the two threads for the 3rd test. On Mon, Sep 28, 2015 at 10:51 AM, Vlad Rozov wrot

Re: Thread and Container locality

2015-09-28 Thread Timothy Farkas
Also sharing a diff https://github.com/DataTorrent/Netlet/compare/master...ilooner:condVarBuffer Thanks, Tim On Mon, Sep 28, 2015 at 10:07 AM, Timothy Farkas wrote: > Hi Vlad, > > Could you share your benchmarking applications? I'd like to test a change > I made to the Circular Buffer > > > ht

Re: Thread and Container locality

2015-09-28 Thread Timothy Farkas
Hi Vlad, Could you share your benchmarking applications? I'd like to test a change I made to the Circular Buffer https://github.com/ilooner/Netlet/blob/condVarBuffer/src/main/java/com/datatorrent/netlet/util/CircularBuffer.java Thanks, Tim On Mon, Sep 28, 2015 at 9:56 AM, Pramod Immaneni wrote

Re: Thread and Container locality

2015-09-28 Thread Pramod Immaneni
Vlad what was your mode of interaction/ordering between the two threads for the 3rd test. On Mon, Sep 28, 2015 at 10:51 AM, Vlad Rozov wrote: > I created a simple test to check how quickly java can count to > Integer.MAX_INTEGER. The result that I see is consistent with > CONTAINER_LOCAL behavio

Re: Thread and Container locality

2015-09-28 Thread Vlad Rozov
I created a simple test to check how quickly java can count to Integer.MAX_INTEGER. The result that I see is consistent with CONTAINER_LOCAL behavior: counting long in a single thread: 0.9 sec counting volatile long in a single thread: 17.7 sec counting volatile long shared between two threads:

Re: Thread and Container locality

2015-09-28 Thread Vlad Rozov
Ram, The stream between operators in case of CONTAINER_LOCAL is InlineStream. InlineStream extends DefaultReservoir that extends CircularBuffer. CircularBuffer does not use synchronized methods or locks, it uses volatile. I guess that using volatile causes CPU cache invalidation and along wit

Re: Thread and Container locality

2015-09-27 Thread Munagala Ramanath
Vlad, That's a fascinating and counter-intuitive result. I wonder if some internal synchronization is happening (maybe the stream between them is a shared data structure that is lock protected) to slow down the 2 threads in the CONTAINER_LOCAL case. If they are both going as fast as possible it is