Hello, Kinkie has finished another round of his String NG project. The code is available at https://code.launchpad.net/~kinkie/squid/stringng During code review and subsequent IRC discussion archived at http://wiki.squid-cache.org/MeetUps/IrcMeetup-2009-01-17 it became apparent that the current design makes all participating developers unhappy (for different reasons).
We have to revisit the discussion we had in the beginning of this project[1] and put this issue to rest, at last. [1] http://thread.gmane.org/gmane.comp.web.squid.devel/8188 There was not enough developers on the IRC to come to a consensus regarding the best direction, but it was clear that the current design is the worst one considered as it tries to mix at least two incompatible designs together. This email summarizes a few design options we can chose from (none of them matches the current code for the above mentioned reasons). Please voice your opinion: which design would be best for Squid 3.2 and the foreseeable future. * Universal Buffer: Blob = low-level raw chunk of RAM invisible to general code allocates, holds, frees raw RAM buffer can grow the buffer and write to the buffer the memory allocation strategy can change w/o affecting others does not have a notion of "content", just allocated RAM Buffer = all-purpose user-level buffer allows users to safely share a Blob instance via COW search, compare, consume, append, truncate, import, export, etc. has (offset, length) to maintain an area of Blob used by this Buffer This design is very similar to std::string. The code gets a "universal buffer" that can do "everything". This is probably the simplest design possible. The primary drawback here is that it would be difficult and messy to optimize different buffering needs in a single Buffer class. For example, I/O buffers usually need to track appended/consumed size and want to optimize (or eliminate) coping when it is time to do the next I/O while some strings are pointing to the old buffer content. Adding that tracking logic and optimizations to generic Buffer would be "wrong" because it will pollute Buffers used "like strings". Similarly, general strings may want to keep encoding information or perform heavy search optimizations. Adding those to generic Buffer would be "wrong" because it will pollute "I/O buffers" code. Another example is adding simple but efficient vector I/O support. With a single Buffer, it would be difficult to support vectors because it will clash with string-like usage needs. * Divide and Conquer (D&C): Blob = low-level raw chunk of RAM invisible to general code same as Blob in the Universal Buffer approach Buffer = shareable Blob allows users to safely share a Blob instance via COW works with Blob as a whole: not areas (see note below) used as exchange interface between specialized buffers IoBuffer = buffer optimized for I/O needs perhaps should be called IoStream uses Buffer has (appended, consumed) to track I/O progress and area exports available data as a Buffer instance may eventually support vector I/O by using multiple Buffers String = buffer optimized for content manipulation uses Buffer has (offset, length) to maintain a Buffer "content area" search, compare, replace, append, truncate, import, export, etc. may eventually store content encoding information The killer idea here is that the interpretation of a piece of allocated and shareable RAM (i.e, Buffer) is left to classes that specialize in certain memory manipulations (e.g., I/O or string search). Optimizing or changing one class does not have to affect the other. More specialized classes can be added as needed. Buffer is used to share info between classes. Conversions are explicit and easier to track. We could also add an Area class that makes it possible to store "content" offset and length when importing or exporting a Buffer. (note) A possible variation of the same design would be to move area manipulation to Buffer. This will free String from "area code" but force IoBuffers and others to use the same area model instead of appended/consumed counters or whatever they need. This will probably make migration to vectored I/O more complex, but we can deal with it. If D&C approach is chosen, we will decide where to put area manipulation: Buffer, String, or perhaps a separate Area class. * Other There are probably other options. I still think we should implement one good design, commit it, and work on converting the code to use it rather than starting with massaging the old code to be easier to convert to something in the future. If you would like to discuss the choice between those two strategies, please start your own thread :-)! So far, my _personal_ interpretation of votes based on the recent IRC discussions and that earlier squid-dev thread[1] is: Universal String: Kinkie, Amos Divide and Conquer: Adrian, Henrik, Alex Do you prefer a Universal Buffer or a Divide and Conquer design? Thank you, Alex. P.S. I am focusing on the overall design and ignoring all the secondary bugs present in the current stringng lp branch. I have sent a partial list to Kinkie, but it may not make sense to work on those bugs until the above issue is resolved, at last.