While we're at tweaking std.string:

When writing string libs or types (like Text recently), I implement 3 string splitting methods. This may --or not-- be useful for D's string module.

The core point is: what to do with empty parts? They may be generated when:
* the separator is present at either end of the source string
* successive separators occur in the source string
Thus,
    split("--abc-----def----", "--")
basically returns
    ["","abc,"","def","",""]

This may be or not what we expect. But why? I ended up considering there are 2 distinct use cases where we need to split a string:
1. it is like a record (fields)
2. it is like a list (elements)

In the first case, we want to keep empty fields so that each field has a constant index, and sometimes empty fields are meaningful. For instance, in name--phone--email, when phone is absent, we still want email as third field. In the case of a list instead, most commonly empty elements are irrelevant, actually often due to flexibility of the grammar (not always formal). For instance, lists of words / numbers / tokens; or more simply lines: we will rarely keep blank ones for further process.

This leads to 2 different string splitting funcs, eg
    string[] listElements (string sep)
    string[] recordFields (string sep)
(names discussable ;-)
The first func is symmetric to join. The second one may simply filter the first one's results, or instead drop empty elements on the fly.

Finally, there is a third, different, use case, which may well be the most common one, and requires yet another func:
    string[] split (string whitespace=" \t\n")
which indeed splits on any whitespace. Usually, the expected behaviour is any combination or repetition of ws chars is considered a single separator; but ws at start/end well generates an empty part.

Makes sense?

Denis
_________________
vita es estrany
spir.wikidot.com

Reply via email to