On Wednesday, 27 November 2013 at 20:13:32 UTC, Dmitry Olshansky
wrote:
I could have sworn we had byGrapheme somewhere, well apparently
not :(
Simple attempt:
https://github.com/D-Programming-Language/phobos/pull/1736
On 11/27/2013 12:06 PM, Dmitry Olshansky wrote:
27-Nov-2013 18:45, David Nadlinger пишет:
As far as I'm aware, this behavior is the result of a deliberate
decision, as normalizing strings on the fly isn't really cheap.
It's anything but cheap.
At the minimum imagine crawling the string and iss
On 11/28/2013 10:19 AM, H. S. Teoh wrote:
Always decoding strings
*is* slow, esp. when you already know that it only contains ASCII
characters.
It doesn't have to be merely ASCII. You can do string substring searches without
any need for decoding, for example. You don't even need decoding to d
On 11/28/2013 11:32 AM, Dmitry Olshansky wrote:
I had a (a bit cloudy) vision of settling encoded ranges problem once and for
good. That includes defining notion of an encoded range that is 2 in one: some
stronger (as in capabilities) range of code elements and the default decoded
view imposed on
28-Nov-2013 17:24, monarch_dodra пишет:
On Thursday, 28 November 2013 at 09:02:12 UTC, Walter Bright
wrote:
Sadly,
I think it's great. It means by default, your strings will always
be handled correctly. I think there's quite a few algorithms that
were written without ever taking strings into a
On Thursday, 28 November 2013 at 18:55:44 UTC, Dicebot wrote:
http://dlang.org/phobos/std_encoding.html#.AsciiString ?
Yeah, that or just ubyte[].
The problem with both of these though, is printing :/ (which
prints ugly as sin)
Something like:
struct AsciiChar
{
private char c;
alia
http://dlang.org/phobos/std_encoding.html#.AsciiString ?
On Thu, Nov 28, 2013 at 09:52:08AM -0800, Walter Bright wrote:
> On 11/28/2013 5:24 AM, monarch_dodra wrote:
> >Which operations are you thinking of in std.array that decode
> >when they shouldn't?
>
> front() in std.array looks like:
>
> @property dchar front(T)(T[] a) @safe pure if (isNarrowStr
On 11/28/2013 5:24 AM, monarch_dodra wrote:
Which operations are you thinking of in std.array that decode
when they shouldn't?
front() in std.array looks like:
@property dchar front(T)(T[] a) @safe pure if (isNarrowString!(T[]))
{
assert(a.length, "Attempting to fetch the front of an empty
On Thursday, 28 November 2013 at 09:02:12 UTC, Walter Bright
wrote:
Sadly,
I think it's great. It means by default, your strings will always
be handled correctly. I think there's quite a few algorithms that
were written without ever taking strings into account, but still
happen to work with the
Walter Bright:
This means that all algorithms on strings will be crippled
as far as performance goes.
If you want to sort an array of chars you need to use a dchar[],
or code like this:
char[] word = "just a test".dup;
auto sword = cast(char[])word.representation.sort().release;
See:
http:
On Thursday, 28 November 2013 at 09:02:12 UTC, Walter Bright
wrote:
Sadly, std.array is determined to decode (i.e. convert to
dchar[]) all your strings when they are used as ranges. This
means that all algorithms on strings will be crippled as far as
performance goes.
http://dlang.org/glossar
On 11/27/2013 9:22 AM, Jakob Ovrum wrote:
In D, we can write code that is both Unicode-correct and highly performant,
while still being simple and pleasant to read. To write such code, one must have
a modicum of understanding of how Unicode works (in order to choose the right
tools from the toolb
On Wednesday, 27 November 2013 at 17:22:43 UTC, Jakob Ovrum wrote:
i18nString sounds like a range of graphemes to me.
Maybe. If I had called it...say, "normalisedString"? Would you
still think that? That was an off-the-cuff name because my
morning brain imagined that this sort of thing wou
27-Nov-2013 20:18, Wyatt пишет:
On Wednesday, 27 November 2013 at 15:43:11 UTC, Jakob Ovrum wrote:
It
honestly surprised me how many things in std.uni don't seem to work on
ranges.
Which ones? Or do you mean more like isAlpha(rangeOfCodepoints)?
--
Dmitry Olshansky
27-Nov-2013 20:22, Wyatt пишет:
On Wednesday, 27 November 2013 at 16:18:34 UTC, Wyatt wrote:
trouble following all that (e.g. Isn't "noe\u0308l" a grapheme
Whoops, overzealous pasting. That is, "e\u0308", which composes to
"ë". A grapheme cluster seems to represent one printed character: ".
27-Nov-2013 22:54, Jacob Carlborg пишет:
On 2013-11-27 18:56, Dicebot wrote:
+1
Working with graphemes is rather expensive thing to do performance-wise.
I like how D makes this fact obvious and provides continuous transition
through abstraction levels here. It is important to make the costs
ob
27-Nov-2013 22:12, H. S. Teoh пишет:
On Wed, Nov 27, 2013 at 10:07:43AM -0800, Andrei Alexandrescu wrote:
On 11/27/13 7:43 AM, Jakob Ovrum wrote:
On that note, I tried to use std.uni to write a simple example of how
to correctly handle this in D, but it became apparent that std.uni
should expos
On 27.11.2013 19:07, Andrei Alexandrescu wrote:
On 11/27/13 7:43 AM, Jakob Ovrum wrote:
On that note, I tried to use std.uni to write a simple example of how to
correctly handle this in D, but it became apparent that std.uni should
expose something like `byGrapheme` which lazily transforms a ran
27-Nov-2013 18:45, David Nadlinger пишет:
On Wednesday, 27 November 2013 at 12:46:38 UTC, bearophile wrote:
Through Reddit I have seen this small comparison of Unicode handling
between different programming languages:
http://mortoray.com/2013/11/27/the-string-type-is-broken/
D+Phobos seem to f
On 11/27/2013 08:53 AM, Jakob Ovrum wrote:
On Wednesday, 27 November 2013 at 16:18:34 UTC, Wyatt wrote:
I agree with the assertion that people SHOULD know how unicode works
if they want to work with it, but the way our docs are now is
off-putting enough that most probably won't learn anything.
On 11/27/2013 06:45 AM, David Nadlinger wrote:
On Wednesday, 27 November 2013 at 12:46:38 UTC, bearophile wrote:
Through Reddit I have seen this small comparison of Unicode handling
between different programming languages:
http://mortoray.com/2013/11/27/the-string-type-is-broken/
D+Phobos see
On 2013-11-27 18:56, Dicebot wrote:
+1
Working with graphemes is rather expensive thing to do performance-wise.
I like how D makes this fact obvious and provides continuous transition
through abstraction levels here. It is important to make the costs obvious.
I think it's missing a final high
On 11/27/2013 8:18 AM, Wyatt wrote:
It honestly surprised me how
many things in std.uni don't seem to work on ranges.
Many things in Phobos either predate ranges, or are written by people who aren't
used to ranges and don't think in terms of ranges. It's an ongoing issue, and
one we need to i
On Wed, Nov 27, 2013 at 10:07:43AM -0800, Andrei Alexandrescu wrote:
> On 11/27/13 7:43 AM, Jakob Ovrum wrote:
> >On that note, I tried to use std.uni to write a simple example of how
> >to correctly handle this in D, but it became apparent that std.uni
> >should expose something like `byGrapheme`
On 11/27/13 7:43 AM, Jakob Ovrum wrote:
On that note, I tried to use std.uni to write a simple example of how to
correctly handle this in D, but it became apparent that std.uni should
expose something like `byGrapheme` which lazily transforms a range of
code points to a range of graphemes (probab
On Wednesday, 27 November 2013 at 17:37:48 UTC, Jakob Ovrum wrote:
On Wednesday, 27 November 2013 at 17:30:22 UTC, Jacob Carlborg
wrote:
On 2013-11-27 18:22, Jakob Ovrum wrote:
What would it do that std.uni doesn't already?
A class/struct that handles all these normalizations and other
stuf
On Wed, Nov 27, 2013 at 06:22:41PM +0100, Jakob Ovrum wrote:
> On Wednesday, 27 November 2013 at 16:15:53 UTC, Wyatt wrote:
> >I don't remember if it was brought up before, but this makes me
> >wonder if something like an i18nString should exist for cases
> >where it IS important. Making i18n stuf
On Wednesday, 27 November 2013 at 17:30:22 UTC, Jacob Carlborg
wrote:
On 2013-11-27 18:22, Jakob Ovrum wrote:
What would it do that std.uni doesn't already?
A class/struct that handles all these normalizations and other
stuff automatically.
Sounds terrible :)
On 2013-11-27 18:22, Jakob Ovrum wrote:
What would it do that std.uni doesn't already?
A class/struct that handles all these normalizations and other stuff
automatically.
--
/Jacob Carlborg
On Wednesday, 27 November 2013 at 16:15:53 UTC, Wyatt wrote:
I don't remember if it was brought up before, but this makes me
wonder if something like an i18nString should exist for cases
where it IS important. Making i18n stuff as simple as it looks
like it "should" be has merit, IMO. (Maybe
On 2013-11-27 17:15, Wyatt wrote:
I don't remember if it was brought up before, but this makes me wonder
if something like an i18nString should exist for cases where it IS
important. Making i18n stuff as simple as it looks like it "should" be
has merit, IMO. (Maybe there's even room for a std.
On Wednesday, 27 November 2013 at 16:18:34 UTC, Wyatt wrote:
I agree with the assertion that people SHOULD know how unicode
works if they want to work with it, but the way our docs are
now is off-putting enough that most probably won't learn
anything. If they know, they know; if they don't, th
On Wednesday, 27 November 2013 at 16:22:58 UTC, Wyatt wrote:
Whoops, overzealous pasting. That is, "e\u0308", which
composes to "ë". A grapheme cluster seems to represent one
printed character: "...a horizontally segmentable unit of text,
consisting of some grapheme base (which may consist of
On Wednesday, 27 November 2013 at 12:46:38 UTC, bearophile wrote:
Through Reddit I have seen this small comparison of Unicode
handling between different programming languages:
http://mortoray.com/2013/11/27/the-string-type-is-broken/
D+Phobos seem to fail most things (it produces BAFFLE):
http
On Wednesday, 27 November 2013 at 16:18:34 UTC, Wyatt wrote:
trouble following all that (e.g. Isn't "noe\u0308l" a grapheme
Whoops, overzealous pasting. That is, "e\u0308", which composes
to "ë". A grapheme cluster seems to represent one printed
character: "...a horizontally segmentable uni
On Wednesday, 27 November 2013 at 16:15:53 UTC, Wyatt wrote:
Seems like a pretty big "gotcha" from a usability standpoint;
it's not exactly intuitive. I understand WHY this decision was
made, but it feels like a source of code smell and weird string
comparison errors.
It probably is, but is
On Wednesday, 27 November 2013 at 15:43:11 UTC, Jakob Ovrum wrote:
The author also doesn't seem to understand the Unicode
definitions of character and grapheme, which is a shame,
because the difference is more or less the whole point of the
post.
I agree with the assertion that people SHOUL
On Wednesday, 27 November 2013 at 14:45:32 UTC, David Nadlinger
wrote:
If you need to perform this kind of operations on Unicode
strings in D, you can call normalize (std.uni) on the string
first to make sure it is in one of the Normalization Forms. For
example, just appending .normalize to y
On Wednesday, 27 November 2013 at 12:46:38 UTC, bearophile wrote:
Through Reddit I have seen this small comparison of Unicode
handling between different programming languages:
http://mortoray.com/2013/11/27/the-string-type-is-broken/
Most of the points are good, but the author seems to confus
On 2013-11-27 16:07, Adam D. Ruppe wrote:
Yeah, I saw it too. The fix is simple:
https://github.com/D-Programming-Language/phobos/pull/1728
tbh this makes me think version(unittest) might just be considered
harmful. I'm sure that code passed the tests, but only because a vital
import was in a
David Nadlinger:
If you need to perform this kind of operations on Unicode
strings in D, you can call normalize (std.uni) on the string
first to make sure it is in one of the Normalization Forms. For
example, just appending .normalize to your strings (which
defaults to NFC) would make the cod
On Wednesday, 27 November 2013 at 15:03:37 UTC, Jacob Carlborg
wrote:
std/uni.d(6301): Error: undefined identifier tuple
Yeah, I saw it too. The fix is simple:
https://github.com/D-Programming-Language/phobos/pull/1728
tbh this makes me think version(unittest) might just be
considered harmfu
On 2013-11-27 15:45, David Nadlinger wrote:
If you need to perform this kind of operations on Unicode strings in D,
you can call normalize (std.uni) on the string first to make sure it is
in one of the Normalization Forms. For example, just appending
.normalize to your strings (which defaults to
On Wednesday, 27 November 2013 at 12:46:38 UTC, bearophile wrote:
Through Reddit I have seen this small comparison of Unicode
handling between different programming languages:
http://mortoray.com/2013/11/27/the-string-type-is-broken/
D+Phobos seem to fail most things (it produces BAFFLE):
http
On Wednesday, 27 November 2013 at 12:46:38 UTC, bearophile wrote:
D+Phobos seem to fail most things (it produces BAFFLE):
I still think we're doing pretty good.
At least, we *handle* unicode at all (looking at you C++). And we
handle *true* unicode, not BMP style UCS (looking at you
Java/C#)
On 2013-11-27 13:46, bearophile wrote:
Through Reddit I have seen this small comparison of Unicode handling
between different programming languages:
http://mortoray.com/2013/11/27/the-string-type-is-broken/
D+Phobos seem to fail most things (it produces BAFFLE):
http://dpaste.dzfl.pl/a5268c435
47 matches
Mail list logo