On Tue, Jan 20, 2009 at 10:24 PM, Alex Rousskov <rouss...@measurement-factory.com> wrote: > Hello, > > Kinkie has finished another round of his String NG project. The code > is available at https://code.launchpad.net/~kinkie/squid/stringng > During code review and subsequent IRC discussion archived at > http://wiki.squid-cache.org/MeetUps/IrcMeetup-2009-01-17 it became > apparent that the current design makes all participating developers > unhappy (for different reasons). > > We have to revisit the discussion we had in the beginning of this > project[1] and put this issue to rest, at last. > [1] http://thread.gmane.org/gmane.comp.web.squid.devel/8188 > > > There was not enough developers on the IRC to come to a consensus > regarding the best direction, but it was clear that the current design > is the worst one considered as it tries to mix at least two incompatible > designs together. > > This email summarizes a few design options we can chose from (none of > them matches the current code for the above mentioned reasons). > > Please voice your opinion: which design would be best for Squid 3.2 and > the foreseeable future. > > > * Universal Buffer: > > Blob = low-level raw chunk of RAM invisible to general code > allocates, holds, frees raw RAM buffer > can grow the buffer and write to the buffer > the memory allocation strategy can change w/o affecting others > does not have a notion of "content", just allocated RAM > > Buffer = all-purpose user-level buffer > allows users to safely share a Blob instance via COW > search, compare, consume, append, truncate, import, export, etc. > has (offset, length) to maintain an area of Blob used by this Buffer > > This design is very similar to std::string. The code gets a "universal > buffer" that can do "everything". This is probably the simplest design > possible. > > The primary drawback here is that it would be difficult and messy to > optimize different buffering needs in a single Buffer class. > > For example, I/O buffers usually need to track appended/consumed size > and want to optimize (or eliminate) coping when it is time to do the > next I/O while some strings are pointing to the old buffer content. > Adding that tracking logic and optimizations to generic Buffer would be > "wrong" because it will pollute Buffers used "like strings". > > Similarly, general strings may want to keep encoding information or > perform heavy search optimizations. Adding those to generic Buffer would > be "wrong" because it will pollute "I/O buffers" code. > > Another example is adding simple but efficient vector I/O support. With > a single Buffer, it would be difficult to support vectors because it > will clash with string-like usage needs. > > > > * Divide and Conquer (D&C): > > Blob = low-level raw chunk of RAM invisible to general code > same as Blob in the Universal Buffer approach > > Buffer = shareable Blob > allows users to safely share a Blob instance via COW > works with Blob as a whole: not areas (see note below) > used as exchange interface between specialized buffers > > IoBuffer = buffer optimized for I/O needs > perhaps should be called IoStream > uses Buffer > has (appended, consumed) to track I/O progress and area > exports available data as a Buffer instance > may eventually support vector I/O by using multiple Buffers > > String = buffer optimized for content manipulation > uses Buffer > has (offset, length) to maintain a Buffer "content area" > search, compare, replace, append, truncate, import, export, etc. > may eventually store content encoding information > > The killer idea here is that the interpretation of a piece of allocated > and shareable RAM (i.e, Buffer) is left to classes that specialize in > certain memory manipulations (e.g., I/O or string search). Optimizing or > changing one class does not have to affect the other. > > More specialized classes can be added as needed. Buffer is used to share > info between classes. Conversions are explicit and easier to track. We > could also add an Area class that makes it possible to store "content" > offset and length when importing or exporting a Buffer. > > (note) A possible variation of the same design would be to move area > manipulation to Buffer. This will free String from "area code" but force > IoBuffers and others to use the same area model instead of > appended/consumed counters or whatever they need. This will probably > make migration to vectored I/O more complex, but we can deal with it. If > D&C approach is chosen, we will decide where to put area manipulation: > Buffer, String, or perhaps a separate Area class. > > > * Other > > There are probably other options. > > > I still think we should implement one good design, commit it, and work > on converting the code to use it rather than starting with massaging the > old code to be easier to convert to something in the future. If you > would like to discuss the choice between those two strategies, please > start your own thread :-)! > > > So far, my _personal_ interpretation of votes based on the recent IRC > discussions and that earlier squid-dev thread[1] is: > > Universal String: Kinkie, Amos > Divide and Conquer: Adrian, Henrik, Alex > > Do you prefer a Universal Buffer or a Divide and Conquer design? > > > Thank you, > > Alex. > P.S. I am focusing on the overall design and ignoring all the secondary > bugs present in the current stringng lp branch. I have sent a partial > list to Kinkie, but it may not make sense to work on those bugs until > the above issue is resolved, at last.
What I fear from the D&C approach is that we'll end up with lots of duplicate code between the 'buffer' classes, to gain a tiny little bit of efficiency and semantic clarity. If that approach has to be taken, then I'd rather take the variant of the note - in fact that's quite in line with what the current (agreeably ugly) code does. In my opinion the 'universal buffer' model can be adapted quite easily to address different uses by extending its allocation strategy - it's a self-contained function of code exactly for this purpose, and it could be extended again by using Strategy patterns to do whatever the caller wishes. It would be trivial for instance for users to request that the underlying memory be allocated by the pageful, or to request preallocation of a certain amount of memory if they know they'll be using, etc. Having a wide interface is a drawback of the Universal approach, Regarding vector i/o, it's almost a no-brainer at a first glance: given UniversalBuffer, implement UniversalBufferList and make MemBuf use the latter to implement producer-consumer semantics. Then use this for writev(). produce and consume become then extremely lightweight calls. Let me remind you that currently MemBuf happily memmoves contents at each consume, and other producer-consumer classes I could find (BodyPipe and StoreEntry) are entirely different beasts, which would benefit from having their interfaces changed to use UniversalBuffers, but probably not their innards. Regarding Adrian's proposal, he and I discussed the issue extensively. I don't agree with him that the current String will give us the best long-term benefits. My expectation is (but we can only know after we have at least some extensive use of it) that the cheap substringing features of the current UniversalBuffer implementation will give us substantial benefits in the long term. I agree with him that fixing the most broken parts of the String interface is a sensible strategy for merging whatever String implementation we end up choosing. I fear that if we focus too much on the long-term, we may end up losing sight of the medium-term, and thus we risk reaching neither because short-term noone does anything. EVERYONE keeps on asserting that squid (2 and 3) has low-level issues to be fixed, yet at the same time only Adrian does something in squid-2, and I feel I'm the only one trying to do something in squid-3 - PLEASE correct me and prove me wrong. There's another issue which worries me: the current implementation has been in the works for 5 months; there have been two extensive reviews, two half-rewrites and endless discussions. Now the issue crops up that the basic design - whose blueprint has also been available for 5 months in the wiki - is not good, and that we may end up having to basically start from scratch. How can we as a project prevent for the future such a waste of resources (in this case, mine) is in my very humble opinion an issue key to the very survival of squid as a project. As a side note, if we end up starting from scratch, it will have to be someone else doing it; I would be obviously biased towards the 'wrong' design, and I would prefer to concentrate on other things to be done which are characterized by a lesser risk of merge rejection. -- /kinkie