Thanks, excellent comments. While it is clear that some string models have more complicated structures (with their own pros and cons), my focus was on simple internal structures. The focus was also on immutable strings — and the tradeoffs for mutable ones can be quite different — and that needs to be clearer. I'll add some material about those two areas (with pointers to sources where possible).
Mark On Sat, Sep 8, 2018 at 9:20 PM John Cowan <[email protected]> wrote: > This paper makes the default assumption that the internal storage of a > string is a featureless array. If this assumption is abandoned, it is > possible to get O(1) indexes with fairly low space overhead. The Scheme > language has recently adopted immutable strings called "texts" as a > supplement to its pre-existing mutable strings, and the sample > implementation for this feature uses a vector of either native strings or > bytevectors (char[] vectors in C/Java terms). I would urge anyone > interested in the question of storing and accessing mutable strings to read > the following parts of SRFI 135 at < > https://srfi.schemers.org/srfi-135/srfi-135.html>: Abstract, Rationale, > Specification / Basic concepts, and Implementation. In addition, the > design notes at <https://github.com/larcenists/larceny/wiki/ImmutableTexts>, > though not up to date (in particular, UTF-16 internals are now allowed as > an alternative to UTF-8), are of interest: unfortunately, the link to the > span API has rotted. > > On Sat, Sep 8, 2018 at 12:53 PM Mark Davis ☕️ via Unicore < > [email protected]> wrote: > >> I recently did some extensive revisions of a paper on Unicode string >> models (APIs). Comments are welcome. >> >> >> https://docs.google.com/document/d/1wuzzMOvKOJw93SWZAqoim1VUl9mloUxE0W6Ki_G23tw/edit# >> >> Mark >> >

