It's easy to state this, but - please - don't get sarcastical!
I'm obviously (as I've learned) speaking about UTF-8 chars as
they are NOT implemented right now in D; so I'm criticizing that
D, as a language which emphasizes on UTF-8 characters, isn't
taking the last step, like e.g. Python
Why does string.length return the number of bytes and not the
number of UTF-8 characters, whereas wstring.length and
dstring.length return the number of UTF-16 and UTF-32
characters?
Wouldn't it be more consistent to have string.length return the
number of UTF-8 characters as well (instead of
This is simply wrong. All strings return number of codeunits.
And it's only UTF-32 where codepoint (~ character) happens to
fit into one codeunit.
I do not agree:
writeln(säд.length);= 5 chars: 5 (1 + 2 [C3A4] + 2
[D094], UTF-8)
writeln(std.utf.count(säд)) = 3 chars: 5
Ok, if my understandig is wrong, how do YOU measure the length of
a string?
Do you always use count(), or is there an alternative?
Ok, I understand, that length is - obviously - used in analogy
to any array's length value.
Still, this seems to be inconsistent. D elaborates on
implementing chars as UTF-8 which means that a char in D can
be of any length between 1 and 4 bytes for an arbitrary Unicode
code point. Shouldn't
This will _not_ return a trailing surrogate of a Cyrillic
letter. It will return the second code unit of the ä
character (U+00E4).
True. It's UTF-8, not UTF-16.
However, it could also yield the first code unit of the umlaut
diacritic, depending on how the string is represented.
This is not