On tis, 2008-08-26 at 09:35 +0200, Kinkie wrote: > In my opinion there is not that much of a difference between Strings > and Buffers, and the latter could use the services of the former to > delegate the issues of memory management, while concentrating on > different aspects - joining, chaining, vector I/O come to mind.
Agreed. And as I gave this considerable thought some years ago (including a simple implementation) I'll give my view of "strings" and buffers: The classes involved should be MemoryBlob, a chunk of memory, reference counted with a high water mark on current use. Exists in various different sizes depending on the use (creator defined). MemoryRegion, references a region of a MemoryBlob by keeping a reference to the MemoryBlob, and location of the region within that blob. String, subclass of MemoryRegion adding string semantics where needed. MemoryRegions (and Strings) can be created from a MemoryBlob in append like behavior only, where each new MemoryRegion immediately follows the previous. In addition there is low level access to the current tail of the raw buffer and the amount of free space, only to be used by the owner who populates the MemoryBlob (i.e. I/O read function etc) before it's known how much data that actually gets placed there. Strings can be created from a MemoryRegion by specifying a subrange of that region, or certain String operations who split a string in components.. (i.e. a parser of some kind splitting the data in tokens) Maybe MemoryRegion and String should be one and the same, but implementation is probably easier to follow if String is a subclass with the string methods, and in the using code it also makes some sense to differentiate the two making it more visible what kind of data is being processed.. But the internal data of the two is exactly the same (reference to a MemoryBlob, pointer to the data within the blob, length) The big and complex architectural question regarding strings is if \0 termination of String should be kept.. memory management and casting gets a bit easier if \0 is not used, but string operations and debugging do get a little bit easier with the \0... (but also less secure if there is a risk of \0 in the string data..). I think we are at the point where we can fully drop the \0 without too much headache, but but it's also true that in all cases where we tokenise a string there is separators we can nuke and replace by \0's... However, with the \0 casting between MemoryRegion and String is tricky (needs to copy if there is no \0) and tokenising gets destructive as it destroys the original string by replacing separators by \0.. Append operation on String/MemoryRegion objects is easy in this model, but if the region is not at the end of the MemoryBlob or if the result gets too large the it will need to trigger a copy to a new MemoryBlob of sufficient size. A special case to the above is if the appended data already follows the first linearly. It's then a simple merge operation of the two regions. Other modifications of String/MemoryRegion content generally requires a COW operation. As you already noted MemoryRegion is sufficiently small to be passed around by value just like if it was a plain pointer. Another question to ask is if there is need for a vectorized String built of many non-linear segments. But my gut feeling is the same as yours, that this should be a separate class. Main use is in writev kind operations of composed data. Users needing string like operations (other than append) on such vectorized data is probably best served by linearising the data first. Regards Henrik
signature.asc
Description: This is a digitally signed message part
