Hi Rajiv,

Thanks for this proposal, it would be great if you can upload some
implementation patch for the CAS idea and show some memory usage / perf
differences.

Guozhang

On Sun, Dec 14, 2014 at 9:27 PM, Rajiv Kurian <ra...@signalfuse.com> wrote:

> Resuscitating this thread. I've done some more experiments and profiling.
> My messages are very tiny (currently 25 bytes) per message and creating
> multiple objects per message leads to a lot of churn. The memory churn
> through creation of convenience objects is more than the memory being used
> by my objects right now. I could probably batch my messages further, to
> make this effect less pronounced.​ I did some rather unscientific
> experiments with a flyweight approach on top of the ByteBuffer for a simple
> messaging API (peer to peer NIO based so not a real comparison) and the
> numbers were very satisfactory and there is no garbage created in steady
> state at all. Though I don't expect such good numbers from actually going
> through the broker + all the other extra stuff that a real producer would
> do, I think there is great potential here.
>
> The general mechanism for me is this:
> i) A buffer (I used Unsafe but I imagine ByteBuffer having similar
> performance) is created per partition.
> ii) A CAS loop (in Java 7 and less) or even better unsafe.getAndAddInt() in
> Java 8 can be used to claim a chunk of bytes on the per topic buffer. This
> code can be invoked from multiple threads in a wait free manner (wait-free
> in Java 8, since getAndAddInt() is wait-free).  Once a region in the buffer
> is claimed, it can be operated on using the flyweight method that we talked
> about. If the buffer doesn't have enough space then we can drop the message
> or move onto a new buffer. Further this creates absolutely zero objects in
> steady state (only a few objects created in the beginning). Even if the
> flyweight method is not desired, the API can just take byte arrays or
> objects that need to be serialized and copy them onto the per topic buffers
> in a similar way. This API has been validated in Aeron too, so I am pretty
> confident that it will work well. For the zero copy technique here is a
> link to Aeron API with zero copy -
> https://github.com/real-logic/Aeron/issues/18. The regular one copies byte
> arrays but without any object creation.
> iii) The producer send thread can then just go in FIFO order through the
> buffer sending messages that have been committed using NIO to rotate
> between brokers. We might need a background thread to zero out used buffers
> too.
>
> I've left out some details, but again none of this very revolutionary -
> it's mostly the same techniques used in Aeron. I really think that we can
> keep the API ga rbage free and wait-free (even in the multi producer case)
> without compromising how pretty it looks - the total zero copy API will low
> level, but it should only be used by advanced users. Moreover the usual
> producer.send(msg, topic, partition) can use the efficient ByteBuffer
> offset API internally without it itself creating any garbage. With the
> technique I talked about there is no need for an intermediate queue of any
> kind since the underlying ByteBuffer per partition acts as the queue.
>
> I can do more experiments with some real producer code instead of my toy
> code to further validate the idea, but I am pretty sure that both
> throughput and jitter characteristics will improve thanks to lower
> contention (wait-free in java 8 with a single getAndAddInt() operation for
> sync ) and better cache locality (C like buffers and a few constant number
> of objects per partition). If you guys are interested, I'd love to talk
> more. Again just to reiterate, I don't think the API will suffer at all -
> most of this can be done under the covers. Additionally it will open up
> things so that a low level zero copy API is possible.
>
> Thanks,
> Rajiv
>



-- 
-- Guozhang

Reply via email to