string splitting funcs

spir Sat, 22 Jan 2011 11:53:38 -0800

While we're at tweaking std.string:

When writing string libs or types (like Text recently), I implement 3string splitting methods. This may --or not-- be useful for D's stringmodule.


The core point is: what to do with empty parts? They may be generated when:
* the separator is present at either end of the source string
* successive separators occur in the source string
Thus,
    split("--abc-----def----", "--")
basically returns
    ["","abc,"","def","",""]

This may be or not what we expect. But why? I ended up considering thereare 2 distinct use cases where we need to split a string:

1. it is like a record (fields)
2. it is like a list (elements)

In the first case, we want to keep empty fields so that each field has aconstant index, and sometimes empty fields are meaningful. For instance,in name--phone--email, when phone is absent, we still want email asthird field.In the case of a list instead, most commonly empty elements areirrelevant, actually often due to flexibility of the grammar (not alwaysformal). For instance, lists of words / numbers / tokens; or more simplylines: we will rarely keep blank ones for further process.


This leads to 2 different string splitting funcs, eg
    string[] listElements (string sep)
    string[] recordFields (string sep)
(names discussable ;-)

The first func is symmetric to join. The second one may simply filterthe first one's results, or instead drop empty elements on the fly.

Finally, there is a third, different, use case, which may well be themost common one, and requires yet another func:

    string[] split (string whitespace=" \t\n")

which indeed splits on any whitespace. Usually, the expected behaviouris any combination or repetition of ws chars is considered a singleseparator; but ws at start/end well generates an empty part.


Makes sense?

Denis
_________________
vita es estrany
spir.wikidot.com

string splitting funcs

Reply via email to