Hi guys! I'm one of the mentioned devs (like many others) that are using external (and unsafe) APIs to concurrent access ByteBuffer's content and a developer of a messaging broker's journal that would benefit by this JEP :) Re concurrent access API, how it looks https://github.com/real-logic/agrona/blob/master/agrona/src/main/java/org/agrona/concurrent/AtomicBuffer.java ?
note: I don't know how's considered to appear in these discussions without presenting myself and I hope to not be OT, but both this JEP and the comments around are so interesting that I couldn't resist: I apologize if I'm not respecting some rule on it Thanks for the hard work, Francesco Il giorno ven 28 set 2018 alle ore 09:21 Peter Levart < peter.lev...@gmail.com> ha scritto: > Hi Stuart, > > I mostly agree with your assessment about the suitability of the > ByteBuffer API for nice multithreaded use. What would such API look like? I > think pretty much like ByteBuffer but without things that mutate > mark/position/limit/ByteOrder. A stripped-down ByteBuffer API therefore. > That would be in my opinion the most low-level API possible. If you add > things to such API that coordinate multithreaded access to the underlying > memory, you are already creating a concurrent data structure for a > particular set of use cases, which might not cover all possible use cases > or be sub-optimal at some of them. So I think this is better layered on top > of such API not built into it. Low-level multithreaded access to memory is, > in my opinion, always going to be "unsafe" from the standpoint of > coordination. It's not only the mark/position/limit/ByteOrder that is not > multithreaded-friendly about ByteBuffer API, but the underlying memory too. > It would be nice if mark/position/limit/ByteOrder weren't in the way though. > > Regards, Peter > > > On 09/28/2018 07:51 AM, Stuart Marks wrote: > > Hi Andrew, > > Let me first stay that this issue of "ByteBuffer might not be the right > answer" is something of a digression from the JEP discussion. I think the > JEP should proceed forward using MBB with the API that you and Alan had > discussed previously. At most, the discussion of the "right thing" issue > might affect a side note in the JEP text about possible limitations and > future directions of this effort. However, it's not a blocker to the JEP > making progress as far as I'm concerned. > > With that in mind, I'll discuss the issue of multithreaded access to > ByteBuffers and how this bears on whether buffers are or aren't the "right > answer." There are actually several issues that figure into the "right > answer" analysis. In this message, though, I'll just focus on the issue of > multithreaded access. > > To recap (possibly for the benefit of other readers) the Buffer class doc > has the following statement: > > Buffers are not safe for use by multiple concurrent threads. If a > buffer > is to be used by more than one thread then access to the buffer should > be > controlled by appropriate synchronization. > > Buffers are primarily designed for sequential operations such as I/O or > codeset conversion. Typical buffer operations set the mark, position, and > limit before initiating the operation. If the operation completes partially > -- not uncommon with I/O or codeset conversion -- the position is updated > so that the operation can be resumed easily from where it left off. > > The fact that buffers not only contain the data being operated upon but > also mutable state information such as mark/position/limit makes it > difficult to have multiple threads operate on different parts of the same > buffer. Each thread would have to lock around setting the position and > limit and performing the operation, preventing any parallelism. The typical > way to deal with this is to create multiple buffer slices, one per thread. > Each slice has its own mark/position/limit values but shares the same > backing data. > > We can avoid the need for this by adding absolute bulk operations, right? > > Let's suppose we were to add something like this (considering ByteBuffer > only, setting the buffer views aside): > > get(int srcOff, byte[] dst, int dstOff, int length) > put(int dstOff, byte[] src, int srcOff, int length) > > Each thread can perform its operations on a different part of the buffer, > in parallel, without interference from the others. Presumably these > operations don't read or write the mark and position. Oh, wait. The > existing absolute put and get overloads *do* respect the buffer's limit, so > the absolute bulk operations ought to as well. This means they do depend on > shared state. (I guess we could make the absolute bulk ops not respect the > limit, but that seems inconsistent.) > > OK, let's adopt an approach similar to what was described by Peter Levart > a couple messages upthread, where a) there is an initialization step where > various things including the limit are set properly; b) the buffer is > published to the worker threads properly, e.g., using a lock or other > suitable memory operation; and c) all worker threads agree only to use > absolute operations and to avoid relative operations. > > Now suppose the threads have completed their work and you want to, say, > write the buffer's contents to a channel. You have to carefully make sure > the threads are all finished and properly publish their results back to > some central thread, have that central thread receive the results, set the > position and limit, after which the central thread can initiate the I/O > operation. > > This can certainly be made to work. > > But note what we just did. We now have an API where: > > - there are different "phases", where in one phase all the methods work, > but in another phase only certain methods work (otherwise it breaks > silently); > > - you have to carefully control all the code to ensure that the wrong > methods aren't called when the buffer is in the wrong phase (otherwise it > breaks silently); and > > - you can't hand off the buffer to a library (3rd party or JDK) without > carefully orchestrating a transition into the right phase (otherwise it > breaks silently). > > Frankly, this is pretty crappy. It's certainly possible to work around it. > People do, and it is painful, and they complain about it up and down all > day long (and rightfully so). > > Note that this discussion is based primarily on looking at the ByteBuffer > API. I have not done extensive investigation of the impact of the various > buffer views (IntBuffer, LongBuffer, etc.), nor have I looked thoroughly at > the implementations. I have no doubt that we will run into additional > issues when we do those investigations. > > If we were designing an API to support multi-threaded access to memory > regions, it would almost certainly look nothing like the buffer API. This > is what Alan means by "buffers might not be the right answer." As things > stand, it appears quite difficult to me to fix the multi-threaded access > problem without turning buffers into something they aren't, or fragmenting > the API in some complex and uncomfortable way. > > Finally, note that this is not an argument against adding bulk absolute > operations! I think we should probably go ahead and do that anyway. But > let's not fool ourselves into thinking that bulk absolute operations solve > the multi-threaded buffer access problem. > > s'marks > > >