Re: Inconsitency

2013-10-14 Thread nickles
It's easy to state this, but - please - don't get sarcastical! I'm obviously (as I've learned) speaking about UTF-8 chars as they are NOT implemented right now in D; so I'm criticizing that D, as a language which emphasizes on UTF-8 characters, isn't taking the last step, like e.g. Python

Inconsitency

2013-10-13 Thread nickles
Why does string.length return the number of bytes and not the number of UTF-8 characters, whereas wstring.length and dstring.length return the number of UTF-16 and UTF-32 characters? Wouldn't it be more consistent to have string.length return the number of UTF-8 characters as well (instead of

Re: Inconsitency

2013-10-13 Thread nickles
This is simply wrong. All strings return number of codeunits. And it's only UTF-32 where codepoint (~ character) happens to fit into one codeunit. I do not agree: writeln(säд.length);= 5 chars: 5 (1 + 2 [C3A4] + 2 [D094], UTF-8) writeln(std.utf.count(säд)) = 3 chars: 5

Re: Inconsitency

2013-10-13 Thread nickles
Ok, if my understandig is wrong, how do YOU measure the length of a string? Do you always use count(), or is there an alternative?

Re: Inconsitency

2013-10-13 Thread nickles
Ok, I understand, that length is - obviously - used in analogy to any array's length value. Still, this seems to be inconsistent. D elaborates on implementing chars as UTF-8 which means that a char in D can be of any length between 1 and 4 bytes for an arbitrary Unicode code point. Shouldn't

Re: Inconsitency

2013-10-13 Thread nickles
This will _not_ return a trailing surrogate of a Cyrillic letter. It will return the second code unit of the ä character (U+00E4). True. It's UTF-8, not UTF-16. However, it could also yield the first code unit of the umlaut diacritic, depending on how the string is represented. This is not