Buffer/String split, take2

Alex Rousskov Tue, 20 Jan 2009 13:24:42 -0800

Hello,

    Kinkie has finished another round of his String NG project. The code
is available at https://code.launchpad.net/~kinkie/squid/stringng
During code review and subsequent IRC discussion archived at
http://wiki.squid-cache.org/MeetUps/IrcMeetup-2009-01-17 it became
apparent that the current design makes all participating developers
unhappy (for different reasons).


We have to revisit the discussion we had in the beginning of this
project[1] and put this issue to rest, at last.
[1] http://thread.gmane.org/gmane.comp.web.squid.devel/8188


There was not enough developers on the IRC to come to a consensus
regarding the best direction, but it was clear that the current design
is the worst one considered as it tries to mix at least two incompatible
designs together.

This email summarizes a few design options we can chose from (none of
them matches the current code for the above mentioned reasons). 

Please voice your opinion: which design would be best for Squid 3.2 and
the foreseeable future.


* Universal Buffer:

Blob = low-level raw chunk of RAM invisible to general code
    allocates, holds, frees raw RAM buffer
    can grow the buffer and write to the buffer
    the memory allocation strategy can change w/o affecting others
    does not have a notion of "content", just allocated RAM

Buffer = all-purpose user-level buffer
    allows users to safely share a Blob instance via COW
    search, compare, consume, append, truncate, import, export, etc.
    has (offset, length) to maintain an area of Blob used by this Buffer

This design is very similar to std::string. The code gets a "universal
buffer" that can do "everything". This is probably the simplest design
possible.

The primary drawback here is that it would be difficult and messy to
optimize different buffering needs in a single Buffer class. 

For example, I/O buffers usually need to track appended/consumed size
and want to optimize (or eliminate) coping when it is time to do the
next I/O while some strings are pointing to the old buffer content.
Adding that tracking logic and optimizations to generic Buffer would be
"wrong" because it will pollute Buffers used "like strings".

Similarly, general strings may want to keep encoding information or
perform heavy search optimizations. Adding those to generic Buffer would
be "wrong" because it will pollute "I/O buffers" code.

Another example is adding simple but efficient vector I/O support. With
a single Buffer, it would be difficult to support vectors because it
will clash with string-like usage needs.



* Divide and Conquer (D&C):

Blob = low-level raw chunk of RAM invisible to general code
    same as Blob in the Universal Buffer approach

Buffer = shareable Blob
    allows users to safely share a Blob instance via COW
    works with Blob as a whole: not areas (see note below)
    used as exchange interface between specialized buffers

IoBuffer = buffer optimized for I/O needs
    perhaps should be called IoStream
    uses Buffer
    has (appended, consumed) to track I/O progress and area
    exports available data as a Buffer instance
    may eventually support vector I/O by using multiple Buffers

String = buffer optimized for content manipulation
    uses Buffer
    has (offset, length) to maintain a Buffer "content area"
    search, compare, replace, append, truncate, import, export, etc.
    may eventually store content encoding information

The killer idea here is that the interpretation of a piece of allocated
and shareable RAM (i.e, Buffer) is left to classes that specialize in
certain memory manipulations (e.g., I/O or string search). Optimizing or
changing one class does not have to affect the other. 

More specialized classes can be added as needed. Buffer is used to share
info between classes. Conversions are explicit and easier to track. We
could also add an Area class that makes it possible to store "content"
offset and length when importing or exporting a Buffer.

(note) A possible variation of the same design would be to move area
manipulation to Buffer. This will free String from "area code" but force
IoBuffers and others to use the same area model instead of
appended/consumed counters or whatever they need. This will probably
make migration to vectored I/O more complex, but we can deal with it. If
D&C approach is chosen, we will decide where to put area manipulation:
Buffer, String, or perhaps a separate Area class.


* Other

There are probably other options.


I still think we should implement one good design, commit it, and work
on converting the code to use it rather than starting with massaging the
old code to be easier to convert to something in the future. If you
would like to discuss the choice between those two strategies, please
start your own thread :-)!


So far, my _personal_ interpretation of votes based on the recent IRC
discussions and that earlier squid-dev thread[1] is:

  Universal String: Kinkie, Amos
  Divide and Conquer: Adrian, Henrik, Alex

Do you prefer a Universal Buffer or a Divide and Conquer design?


Thank you,

Alex.
P.S. I am focusing on the overall design and ignoring all the secondary
bugs present in the current stringng lp branch. I have sent a partial
list to Kinkie, but it may not make sense to work on those bugs until
the above issue is resolved, at last.

Buffer/String split, take2

Reply via email to