Re: Major performance problem with std.array.front()

2014-03-18 Thread Marco Leise
Am Mon, 10 Mar 2014 17:44:22 -0400 schrieb Nick Sabalausky seewebsitetocontac...@semitwist.com: On 3/7/2014 8:40 AM, Michel Fortin wrote: On 2014-03-07 03:59:55 +, bearophile bearophileh...@lycos.com said: Walter Bright: I understand this all too well. (Note that we currently have

Re: Major performance problem with std.array.front()

2014-03-13 Thread Jonathan M Davis
On Thursday, March 06, 2014 18:37:13 Walter Bright wrote: Is there any hope of fixing this? I agree with Andrei. I don't think that there's really anything to fix. The problem is that there's roughly 3 levels at which string operations can be done 1. By code unit 2. By code point 3. By

Re: Major performance problem with std.array.front()

2014-03-11 Thread w0rp
On Sunday, 9 March 2014 at 21:38:06 UTC, Nick Sabalausky wrote: On 3/9/2014 7:47 AM, w0rp wrote: My knowledge of Unicode pretty much just comes from having to deal with foreign language customers and discovering the problems with the code unit abstraction most languages seem to use. (Java

Re: Major performance problem with std.array.front()

2014-03-11 Thread Chris
On Friday, 7 March 2014 at 03:52:42 UTC, Walter Bright wrote: Ok, I have a plan. Each step will be separated by at least one version: 1. implement decode() as an algorithm for string types, so one can write: string s; s.decode.algorithm... suggest that people start doing that

Re: Major performance problem with std.array.front()

2014-03-11 Thread Sean Kelly
On Tuesday, 11 March 2014 at 02:07:19 UTC, Steven Schveighoffer wrote: On Mon, 10 Mar 2014 19:59:07 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/10/2014 6:47 AM, Dicebot wrote: (array literals that allocate, I will never forgive that). It was done that way simply to get it

Re: Major performance problem with std.array.front()

2014-03-10 Thread Nick Sabalausky
On 3/10/2014 12:23 AM, Walter Bright wrote: On 3/9/2014 9:19 PM, Nick Sabalausky wrote: On 3/9/2014 6:31 PM, Walter Bright wrote: On 3/9/2014 6:08 AM, Marc Schütz schue...@gmx.net wrote: Also, `byCodeUnit` and `byCodePoint` would probably be better names than `raw` and `decode`, to much the

Re: Major performance problem with std.array.front()

2014-03-10 Thread Walter Bright
On 3/10/2014 12:09 AM, Nick Sabalausky wrote: On 3/10/2014 12:23 AM, Walter Bright wrote: On 3/9/2014 9:19 PM, Nick Sabalausky wrote: On 3/9/2014 6:31 PM, Walter Bright wrote: On 3/9/2014 6:08 AM, Marc Schütz schue...@gmx.net wrote: Also, `byCodeUnit` and `byCodePoint` would probably be

Re: Major performance problem with std.array.front()

2014-03-10 Thread ponce
On Sunday, 9 March 2014 at 21:14:30 UTC, Nick Sabalausky wrote: With all due respect, D string type is exclusively for UTF-8 strings. If it is not valid UTF-8, it should never had been a D string in the first place. In the other cases, ubyte[] is there. This is an arbitrary self-imposed

Re: Major performance problem with std.array.front()

2014-03-10 Thread Andrea Fontana
I'm not sure I understood the point of this (long) thread. The main problem is that decode() is called also if not needed? Well, in this case that's not a problem only for string. I found this problem also when I was writing other ranges. For example when I read binary data from db stream. Front

Re: Major performance problem with std.array.front()

2014-03-10 Thread Nick Sabalausky
On 3/10/2014 6:21 AM, ponce wrote: On Sunday, 9 March 2014 at 21:14:30 UTC, Nick Sabalausky wrote: Yea, I've had problems before - completely unnecessary problems that were *not* helpful or indicative of latent bugs - which were a direct result of Phobos being overly pedantic and eager about

Re: Major performance problem with std.array.front()

2014-03-10 Thread ponce
On Monday, 10 March 2014 at 11:04:43 UTC, Nick Sabalausky wrote: I may have missed it, but I don't see where it says anything about validation or immediate sanitation of invalid sequences. It's mostly UTF-16 sucks and so does Windows (not that I'm necessarily disagreeing with it). (ot: Kinda

Re: Major performance problem with std.array.front()

2014-03-10 Thread Nick Sabalausky
On 3/9/2014 11:27 AM, Vladimir Panteleev wrote: On Sunday, 9 March 2014 at 08:32:09 UTC, monarch_dodra wrote: On topic, I think D's implicit default decode to dchar is *infinity* times better than C++'s char-based strings. While imperfect in terms of grapheme, it was still a design decision

Re: Major performance problem with std.array.front()

2014-03-10 Thread Dicebot
On Sunday, 9 March 2014 at 17:27:20 UTC, Andrei Alexandrescu wrote: On 3/9/14, 6:47 AM, Marc Schütz schue...@gmx.net wrote: On Friday, 7 March 2014 at 15:03:24 UTC, Dicebot wrote: 2) It is regression back to C++ days of no-one-cares-about-Unicode pain. Thinking about strings as character

Re: Major performance problem with std.array.front()

2014-03-10 Thread Dicebot
On Friday, 7 March 2014 at 19:43:57 UTC, Walter Bright wrote: On 3/7/2014 7:03 AM, Dicebot wrote: 1) It is a huge breakage and you have been refusing to do one even for more important problems. What is about this sudden change of mind? 1. Performance Performance Performance Not important

Re: Major performance problem with std.array.front()

2014-03-10 Thread Abdulhaq
On Monday, 10 March 2014 at 10:52:02 UTC, Andrea Fontana wrote: I'm not sure I understood the point of this (long) thread. The main problem is that decode() is called also if not needed? I'd like to offer up one D 'user' perspective, it's just a single data point but perhaps useful. I write

Re: Major performance problem with std.array.front()

2014-03-10 Thread Andrea Fontana
In italian we need unicode too. We have several accented letters and often programming languages don't handle utf-8 and other encoding so well... In D I never had any problem with this, and I work a lot on text processing. So my question: is there any problem I'm missing in D with unicode

Re: Major performance problem with std.array.front()

2014-03-10 Thread dennis luehring
Am 07.03.2014 03:37, schrieb Walter Bright: In Lots of low hanging fruit in Phobos the issue came up about the automatic encoding and decoding of char ranges. after reading many of the attached posts the question is - what could be Ds future design of introducing breaking changes, its not a

Re: Major performance problem with std.array.front()

2014-03-10 Thread Dicebot
On Monday, 10 March 2014 at 14:05:39 UTC, dennis luehring wrote: Am 07.03.2014 03:37, schrieb Walter Bright: In Lots of low hanging fruit in Phobos the issue came up about the automatic encoding and decoding of char ranges. after reading many of the attached posts the question is - what

Re: Major performance problem with std.array.front()

2014-03-10 Thread Abdulhaq
On Monday, 10 March 2014 at 14:05:39 UTC, dennis luehring wrote: Am 07.03.2014 03:37, schrieb Walter Bright: In Lots of low hanging fruit in Phobos the issue came up about the automatic encoding and decoding of char ranges. after reading many of the attached posts the question is - what

Re: Major performance problem with std.array.front()

2014-03-10 Thread Vladimir Panteleev
On Monday, 10 March 2014 at 14:11:13 UTC, Dicebot wrote: Historically 2 approaches has been practiced: 1) argue a lot and then do nothing 2) suddenly change something and tell users is was necessary These are one and the same, just from the two opposing points of view. I also think that

Re: Major performance problem with std.array.front()

2014-03-10 Thread Abdulhaq
Historically 2 approaches has been practiced: 1) argue a lot and then do nothing This happens (I think) because Andrei and Walter really value your's and other expert's opinions, but nevertheless have to preserve the general way things work to preserve the long term future of D. They

Re: Major performance problem with std.array.front()

2014-03-10 Thread Dicebot
On Monday, 10 March 2014 at 14:27:02 UTC, Vladimir Panteleev wrote: On Monday, 10 March 2014 at 14:11:13 UTC, Dicebot wrote: Historically 2 approaches has been practiced: 1) argue a lot and then do nothing 2) suddenly change something and tell users is was necessary These are one and the

Re: Major performance problem with std.array.front()

2014-03-10 Thread Johannes Pfau
Am Mon, 10 Mar 2014 14:05:03 + schrieb Andrea Fontana nos...@example.com: In italian we need unicode too. We have several accented letters and often programming languages don't handle utf-8 and other encoding so well... In D I never had any problem with this, and I work a lot on text

Re: Major performance problem with std.array.front()

2014-03-10 Thread Marc Schütz
On Monday, 10 March 2014 at 13:18:50 UTC, Dicebot wrote: On Sunday, 9 March 2014 at 17:27:20 UTC, Andrei Alexandrescu wrote: On 3/9/14, 6:47 AM, Marc Schütz schue...@gmx.net wrote: On Friday, 7 March 2014 at 15:03:24 UTC, Dicebot wrote: 2) It is regression back to C++ days of

Re: Major performance problem with std.array.front()

2014-03-10 Thread Marc Schütz
On Monday, 10 March 2014 at 13:48:44 UTC, Abdulhaq wrote: My app deals with unicode arabic text that is 'out there', and the UnicodeTM support for Arabic is not that well thought out, so the data is often (always) inconsistent in terms of sequencing diacritics etc. Even the code page can vary.

Re: Major performance problem with std.array.front()

2014-03-10 Thread Abdulhaq
On Monday, 10 March 2014 at 18:54:26 UTC, Marc Schütz wrote: On Monday, 10 March 2014 at 13:48:44 UTC, Abdulhaq wrote: My app deals with unicode arabic text that is 'out there', and the UnicodeTM support for Arabic is not that well thought out, so the data is often (always) inconsistent in

Re: Major performance problem with std.array.front()

2014-03-10 Thread Abdulhaq
On Monday, 10 March 2014 at 18:54:26 UTC, Marc Schütz wrote: On Monday, 10 March 2014 at 13:48:44 UTC, Abdulhaq wrote: My app deals with unicode arabic text that is 'out there', and the UnicodeTM support for Arabic is not that well thought out, so the data is often (always) inconsistent in

Re: Major performance problem with std.array.front()

2014-03-10 Thread Nick Sabalausky
On 3/7/2014 8:40 AM, Michel Fortin wrote: On 2014-03-07 03:59:55 +, bearophile bearophileh...@lycos.com said: Walter Bright: I understand this all too well. (Note that we currently have a different silent problem: unnoticed large performance problems.) On the other hand your change

Re: Major performance problem with std.array.front()

2014-03-10 Thread Yota
On Monday, 10 March 2014 at 14:42:18 UTC, Dicebot wrote: Yes. I have given up about this idea at some point as there seemed to be consensus that no breaking changes will be even considered for D2 and those that come from fixing bugs are not worth the fuss. So at what point are we going to

Re: Major performance problem with std.array.front()

2014-03-10 Thread Walter Bright
On 3/10/2014 6:47 AM, Dicebot wrote: (array literals that allocate, I will never forgive that). It was done that way simply to get it up and running quickly. Having them not allocate is an optimization, it doesn't change the nature.

Re: Major performance problem with std.array.front()

2014-03-10 Thread Nick Sabalausky
On 3/10/2014 7:35 PM, Yota wrote: On Monday, 10 March 2014 at 14:42:18 UTC, Dicebot wrote: Yes. I have given up about this idea at some point as there seemed to be consensus that no breaking changes will be even considered for D2 and those that come from fixing bugs are not worth the fuss. So

Re: Major performance problem with std.array.front()

2014-03-10 Thread Steven Schveighoffer
On Mon, 10 Mar 2014 19:59:07 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/10/2014 6:47 AM, Dicebot wrote: (array literals that allocate, I will never forgive that). It was done that way simply to get it up and running quickly. Having them not allocate is an optimization,

Re: Major performance problem with std.array.front()

2014-03-10 Thread Andrei Alexandrescu
On 3/10/14, 7:07 PM, Steven Schveighoffer wrote: On Mon, 10 Mar 2014 19:59:07 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/10/2014 6:47 AM, Dicebot wrote: (array literals that allocate, I will never forgive that). It was done that way simply to get it up and running quickly.

Re: Major performance problem with std.array.front()

2014-03-10 Thread Steven Schveighoffer
On Mon, 10 Mar 2014 22:56:22 -0400, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 3/10/14, 7:07 PM, Steven Schveighoffer wrote: On Mon, 10 Mar 2014 19:59:07 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/10/2014 6:47 AM, Dicebot wrote: (array literals that

Re: Major performance problem with std.array.front()

2014-03-10 Thread Andrei Alexandrescu
On 3/10/14, 8:05 PM, Steven Schveighoffer wrote: I think you are missing what I'm saying, I don't want the allocation eliminated, but if we eliminate some allocations with [] and not others, it will be confusing. The path I'd always hoped we would go in was to make all array literals immutable,

Re: Major performance problem with std.array.front()

2014-03-09 Thread Nick Sabalausky
On 3/7/2014 6:33 PM, H. S. Teoh wrote: On Fri, Mar 07, 2014 at 11:13:50PM +, Sarath Kodali wrote: On Friday, 7 March 2014 at 22:35:47 UTC, Sarath Kodali wrote: +1 In Indian languages, a character consists of one or more UNICODE code points. For example, in Sanskrit ddhrya

Re: Major performance problem with std.array.front()

2014-03-09 Thread Peter Alexander
On Sunday, 9 March 2014 at 08:32:09 UTC, monarch_dodra wrote: On topic, I think D's implicit default decode to dchar is *infinity* times better than C++'s char-based strings. While imperfect in terms of grapheme, it was still a design decision made of win. I'd be tempted to not ask how do we

Re: Major performance problem with std.array.front()

2014-03-09 Thread w0rp
On Sunday, 9 March 2014 at 09:24:02 UTC, Nick Sabalausky wrote: I'm leaning the same way too. But I also think Andrei is right that, at this point in time, it'd be a terrible move to change things so that by code unit is default. For better or worse, that ship has sailed. Perhaps we *can*

Re: Major performance problem with std.array.front()

2014-03-09 Thread ponce
- In lots of places, I've discovered that Phobos did UTF decoding (thus murdering performance) when it didn't need to. Such cases included format (now fixed), appender (now fixed), startsWith (now fixed - recently), skipOver (still unfixed). These have caused latent bugs in my programs that

Re: Major performance problem with std.array.front()

2014-03-09 Thread Joseph Rushton Wakeling
On 09/03/14 04:26, Andrei Alexandrescu wrote: 2. Add byChar that returns a random-access range iterating a string by character. Add byWchar that does on-the-fly transcoding to UTF16. Add byDchar that accepts any range of char and does decoding. And such stuff. Then whenever one wants to go

Re: Major performance problem with std.array.front()

2014-03-09 Thread monarch_dodra
On Sunday, 9 March 2014 at 11:34:31 UTC, Peter Alexander wrote: On Sunday, 9 March 2014 at 08:32:09 UTC, monarch_dodra wrote: On topic, I think D's implicit default decode to dchar is *infinity* times better than C++'s char-based strings. While imperfect in terms of grapheme, it was still a

Re: Major performance problem with std.array.front()

2014-03-09 Thread Marc Schütz
On Friday, 7 March 2014 at 04:11:15 UTC, Nick Sabalausky wrote: What about this?: Anywhere we currently have a front() that decodes, such as your example: @property dchar front(T)(T[] a) @safe pure if (isNarrowString!(T[])) { assert(a.length, Attempting to fetch the front of an

Re: Major performance problem with std.array.front()

2014-03-09 Thread Jakob Ovrum
On Sunday, 9 March 2014 at 13:08:05 UTC, Marc Schütz wrote: Also, `byCodeUnit` and `byCodePoint` would probably be better names than `raw` and `decode`, to much the already existing `byGrapheme` in std.uni. There already is a std.uni.byCodePoint. It is a higher order range that accepts

Re: Major performance problem with std.array.front()

2014-03-09 Thread monarch_dodra
On Saturday, 8 March 2014 at 20:05:36 UTC, Andrei Alexandrescu wrote: The current approach is a cut above treating strings as arrays of bytes for some languages, and still utterly broken for others. If I'm operating on a right to left language like Hebrew, what would I expect the result to be

Re: Major performance problem with std.array.front()

2014-03-09 Thread Marc Schütz
On Friday, 7 March 2014 at 15:03:24 UTC, Dicebot wrote: 2) It is regression back to C++ days of no-one-cares-about-Unicode pain. Thinking about strings as character arrays is so natural and convenient that if language/Phobos won't punish you for that, it will be extremely widespread. Not

Re: Major performance problem with std.array.front()

2014-03-09 Thread Marc Schütz
On Friday, 7 March 2014 at 16:43:30 UTC, Dicebot wrote: On Friday, 7 March 2014 at 16:18:06 UTC, Vladimir Panteleev Can we look at some example situations that this will break? Any code that relies on countUntil to count dchar's? Or, to generalize, almost any code that uses std.algorithm

Re: Major performance problem with std.array.front()

2014-03-09 Thread Michel Fortin
On 2014-03-09 13:00:45 +, monarch_dodra monarchdo...@gmail.com said: AFAIK, the most common algorithm case insensitive search *must* decode. Not necessarily. While the unicode collation algorithms (which should be used to compare text) are defined in term of code points, you could build

Re: Major performance problem with std.array.front()

2014-03-09 Thread Marc Schütz
On Friday, 7 March 2014 at 23:13:50 UTC, H. S. Teoh wrote: On Fri, Mar 07, 2014 at 10:35:46PM +, Sarath Kodali wrote: On Friday, 7 March 2014 at 20:43:45 UTC, Vladimir Panteleev wrote: On Friday, 7 March 2014 at 19:57:38 UTC, Andrei Alexandrescu wrote: [...] Clearly one might argue that

Re: Major performance problem with std.array.front()

2014-03-09 Thread Michel Fortin
On 2014-03-09 14:12:28 +, Marc Schütz schue...@gmx.net said: That won't work, because your needle might be in a different normalization form than your haystack, thus a byte-by-byte comparison will not be able to find it. The core of the problem is that sometime this byte-by-byte

Re: Major performance problem with std.array.front()

2014-03-09 Thread Peter Alexander
On Sunday, 9 March 2014 at 13:00:46 UTC, monarch_dodra wrote: IMO, the normalization argument is overrated. I've yet to encounter a real-world case of normalization: only hand written counter-examples. Not saying it doesn't exist, just that: 1. It occurs only in special cases that the program

Re: Major performance problem with std.array.front()

2014-03-09 Thread Vladimir Panteleev
On Sunday, 9 March 2014 at 05:10:26 UTC, Andrei Alexandrescu wrote: On 3/8/14, 8:24 PM, Vladimir Panteleev wrote: On Sunday, 9 March 2014 at 04:18:15 UTC, Andrei Alexandrescu wrote: What exactly is the consensus? From your wiki page I see One of the proposals in the thread is to switch the

Re: Major performance problem with std.array.front()

2014-03-09 Thread Vladimir Panteleev
On Sunday, 9 March 2014 at 13:47:26 UTC, Marc Schütz wrote: On Friday, 7 March 2014 at 15:03:24 UTC, Dicebot wrote: 2) It is regression back to C++ days of no-one-cares-about-Unicode pain. Thinking about strings as character arrays is so natural and convenient that if language/Phobos won't

Re: Major performance problem with std.array.front()

2014-03-09 Thread Vladimir Panteleev
On Sunday, 9 March 2014 at 13:51:12 UTC, Marc Schütz wrote: On Friday, 7 March 2014 at 16:43:30 UTC, Dicebot wrote: On Friday, 7 March 2014 at 16:18:06 UTC, Vladimir Panteleev Can we look at some example situations that this will break? Any code that relies on countUntil to count dchar's?

Re: Major performance problem with std.array.front()

2014-03-09 Thread Vladimir Panteleev
On Sunday, 9 March 2014 at 12:24:11 UTC, ponce wrote: - In lots of places, I've discovered that Phobos did UTF decoding (thus murdering performance) when it didn't need to. Such cases included format (now fixed), appender (now fixed), startsWith (now fixed - recently), skipOver (still

Re: Major performance problem with std.array.front()

2014-03-09 Thread Sean Kelly
On Sunday, 9 March 2014 at 08:32:09 UTC, monarch_dodra wrote: On Saturday, 8 March 2014 at 20:05:36 UTC, Andrei Alexandrescu wrote: The current approach is a cut above treating strings as arrays of bytes for some languages, and still utterly broken for others. If I'm operating on a right to

Re: Major performance problem with std.array.front()

2014-03-09 Thread Vladimir Panteleev
On Sunday, 9 March 2014 at 08:32:09 UTC, monarch_dodra wrote: On topic, I think D's implicit default decode to dchar is *infinity* times better than C++'s char-based strings. While imperfect in terms of grapheme, it was still a design decision made of win. Care to argument? I'd be tempted

Re: Major performance problem with std.array.front()

2014-03-09 Thread Vladimir Panteleev
On Sunday, 9 March 2014 at 13:00:46 UTC, monarch_dodra wrote: As for the belief that iterating by code point has utility. I have to strongly disagree. Unicode is composed of codepoints, and that is what we handle. The fact that it can be be encoded and stored as UTF is implementation detail.

Re: Major performance problem with std.array.front()

2014-03-09 Thread bearophile
Vladimir Panteleev: Seriously, Bearophile suggested ABCD.sort(), and it took about 6 pages (!) for someone to point out this would be wrong. Sorting a string has quite limited use in the general case, It seems I am sorting arrays of mutable ASCII chars often enough :-) Time ago I have

Re: Major performance problem with std.array.front()

2014-03-09 Thread Vladimir Panteleev
On Sunday, 9 March 2014 at 16:02:55 UTC, bearophile wrote: Vladimir Panteleev: Seriously, Bearophile suggested ABCD.sort(), and it took about 6 pages (!) for someone to point out this would be wrong. Sorting a string has quite limited use in the general case, It seems I am sorting arrays

Re: Major performance problem with std.array.front()

2014-03-09 Thread bearophile
Vladimir Panteleev: What do you use this for? For lots of different reasons (counting, testing, histograms, to unique-ify, to allow binary searches, etc), you can find alternative solutions for every one of those use cases. I can think of sort being useful e.g. to see which characters

Re: Major performance problem with std.array.front()

2014-03-09 Thread Andrei Alexandrescu
On 3/9/14, 5:28 AM, Joseph Rushton Wakeling wrote: On 09/03/14 04:26, Andrei Alexandrescu wrote: 2. Add byChar that returns a random-access range iterating a string by character. Add byWchar that does on-the-fly transcoding to UTF16. Add byDchar that accepts any range of char and does decoding.

Re: Major performance problem with std.array.front()

2014-03-09 Thread Andrei Alexandrescu
On 3/9/14, 4:34 AM, Peter Alexander wrote: I think this is the main confusion: the belief that iterating by code point has utility. If you care about normalization then neither by code unit, by code point, nor by grapheme are correct (except in certain language subsets). I suspect that code

Re: Major performance problem with std.array.front()

2014-03-09 Thread Vladimir Panteleev
On Sunday, 9 March 2014 at 17:18:47 UTC, Andrei Alexandrescu wrote: On 3/9/14, 5:28 AM, Joseph Rushton Wakeling wrote: So IIUC iterating over s.byChar would not encounter the decoding-related speed hits that Walter is concerned about? That is correct. Unless I'm missing something, all

Re: Major performance problem with std.array.front()

2014-03-09 Thread Andrei Alexandrescu
On 3/9/14, 6:47 AM, Marc Schütz schue...@gmx.net wrote: On Friday, 7 March 2014 at 15:03:24 UTC, Dicebot wrote: 2) It is regression back to C++ days of no-one-cares-about-Unicode pain. Thinking about strings as character arrays is so natural and convenient that if language/Phobos won't punish

Re: Major performance problem with std.array.front()

2014-03-09 Thread Andrei Alexandrescu
On 3/9/14, 6:34 AM, Jakob Ovrum wrote: On Sunday, 9 March 2014 at 13:08:05 UTC, Marc Schütz wrote: Also, `byCodeUnit` and `byCodePoint` would probably be better names than `raw` and `decode`, to much the already existing `byGrapheme` in std.uni. There already is a std.uni.byCodePoint. It is a

Re: Major performance problem with std.array.front()

2014-03-09 Thread Marc Schütz
On Sunday, 9 March 2014 at 15:23:57 UTC, Vladimir Panteleev wrote: On Sunday, 9 March 2014 at 13:51:12 UTC, Marc Schütz wrote: On Friday, 7 March 2014 at 16:43:30 UTC, Dicebot wrote: On Friday, 7 March 2014 at 16:18:06 UTC, Vladimir Panteleev Can we look at some example situations that this

Re: Major performance problem with std.array.front()

2014-03-09 Thread Peter Alexander
On Sunday, 9 March 2014 at 17:15:59 UTC, Andrei Alexandrescu wrote: On 3/9/14, 4:34 AM, Peter Alexander wrote: I think this is the main confusion: the belief that iterating by code point has utility. If you care about normalization then neither by code unit, by code point, nor by grapheme

Re: Major performance problem with std.array.front()

2014-03-09 Thread Andrei Alexandrescu
On 3/9/14, 9:02 AM, bearophile wrote: Time ago I have even asked for a helper function: https://d.puremagic.com/issues/show_bug.cgi?id=10162 I commented on that and preapproved it. Andrei

Re: Major performance problem with std.array.front()

2014-03-09 Thread Andrei Alexandrescu
On 3/9/14, 10:21 AM, Vladimir Panteleev wrote: On Sunday, 9 March 2014 at 17:18:47 UTC, Andrei Alexandrescu wrote: On 3/9/14, 5:28 AM, Joseph Rushton Wakeling wrote: So IIUC iterating over s.byChar would not encounter the decoding-related speed hits that Walter is concerned about? That is

Re: Major performance problem with std.array.front()

2014-03-09 Thread Andrei Alexandrescu
On 3/9/14, 10:34 AM, Peter Alexander wrote: If we assume strings are normalized then substring search, equality testing, sorting all work the same with either code units or code points. But others such as edit distance or equal(some_string, some_wstring) will not. If you don't care about

Re: Major performance problem with std.array.front()

2014-03-09 Thread Vladimir Panteleev
On Sunday, 9 March 2014 at 17:48:47 UTC, Andrei Alexandrescu wrote: wc What should wc produce on a Sanskrit text? The problem is that such questions quickly become philosophical. (Generally: I've always been very very very doubtful about arguments that start with I can't think of... because

Re: Major performance problem with std.array.front()

2014-03-09 Thread Dmitry Olshansky
09-Mar-2014 21:45, Andrei Alexandrescu пишет: On 3/9/14, 10:21 AM, Vladimir Panteleev wrote: On Sunday, 9 March 2014 at 17:18:47 UTC, Andrei Alexandrescu wrote: On 3/9/14, 5:28 AM, Joseph Rushton Wakeling wrote: So IIUC iterating over s.byChar would not encounter the decoding-related speed

Re: Major performance problem with std.array.front()

2014-03-09 Thread Dmitry Olshansky
09-Mar-2014 21:16, Andrei Alexandrescu пишет: On 3/9/14, 4:34 AM, Peter Alexander wrote: I think this is the main confusion: the belief that iterating by code point has utility. If you care about normalization then neither by code unit, by code point, nor by grapheme are correct (except in

Re: Major performance problem with std.array.front()

2014-03-09 Thread Peter Alexander
On Sunday, 9 March 2014 at 17:48:47 UTC, Andrei Alexandrescu wrote: On 3/9/14, 10:34 AM, Peter Alexander wrote: If we assume strings are normalized then substring search, equality testing, sorting all work the same with either code units or code points. But others such as edit distance or

Re: Major performance problem with std.array.front()

2014-03-09 Thread Andrei Alexandrescu
On 3/9/14, 8:18 AM, Vladimir Panteleev wrote: On Sunday, 9 March 2014 at 05:10:26 UTC, Andrei Alexandrescu wrote: On 3/8/14, 8:24 PM, Vladimir Panteleev wrote: On Sunday, 9 March 2014 at 04:18:15 UTC, Andrei Alexandrescu wrote: What exactly is the consensus? From your wiki page I see One of

Re: Major performance problem with std.array.front()

2014-03-09 Thread Andrei Alexandrescu
On 3/9/14, 11:14 AM, Dmitry Olshansky wrote: 09-Mar-2014 21:45, Andrei Alexandrescu пишет: On 3/9/14, 10:21 AM, Vladimir Panteleev wrote: On Sunday, 9 March 2014 at 17:18:47 UTC, Andrei Alexandrescu wrote: On 3/9/14, 5:28 AM, Joseph Rushton Wakeling wrote: So IIUC iterating over s.byChar

Re: Major performance problem with std.array.front()

2014-03-09 Thread Andrei Alexandrescu
On 3/9/14, 11:19 AM, Peter Alexander wrote: On Sunday, 9 March 2014 at 17:48:47 UTC, Andrei Alexandrescu wrote: On 3/9/14, 10:34 AM, Peter Alexander wrote: If we assume strings are normalized then substring search, equality testing, sorting all work the same with either code units or code

Re: Major performance problem with std.array.front()

2014-03-09 Thread Dmitry Olshansky
09-Mar-2014 07:53, Vladimir Panteleev пишет: On Sunday, 9 March 2014 at 03:26:40 UTC, Andrei Alexandrescu wrote: I don't understand this argument. Iterating by code unit is not meaningless if you don't want to extract meaning from each unit iteration. For example, if you're parsing JSON or XML,

Re: Major performance problem with std.array.front()

2014-03-09 Thread Dmitry Olshansky
09-Mar-2014 21:54, Vladimir Panteleev пишет: On Sunday, 9 March 2014 at 17:48:47 UTC, Andrei Alexandrescu wrote: wc What should wc produce on a Sanskrit text? The problem is that such questions quickly become philosophical. Technically it could use word-braking algorithm for words. Or

Re: Major performance problem with std.array.front()

2014-03-09 Thread Andrei Alexandrescu
On 3/9/14, 11:34 AM, Dmitry Olshansky wrote: 09-Mar-2014 07:53, Vladimir Panteleev пишет: On Sunday, 9 March 2014 at 03:26:40 UTC, Andrei Alexandrescu wrote: I don't understand this argument. Iterating by code unit is not meaningless if you don't want to extract meaning from each unit

Re: Major performance problem with std.array.front()

2014-03-09 Thread monarch_dodra
On Sunday, 9 March 2014 at 14:57:32 UTC, Peter Alexander wrote: You have mentioned case-insensitive searching, but I think I've adequately demonstrated that this doesn't work in general by code point: you need to normalize and take locales into account. I don't understand what your argument.

Re: Major performance problem with std.array.front()

2014-03-09 Thread Dmitry Olshansky
09-Mar-2014 22:41, Andrei Alexandrescu пишет: On 3/9/14, 11:34 AM, Dmitry Olshansky wrote: This. Anyhow searching dchar makes sense for _some_ languages, the problem is that it shouldn't decode the whole string but rather encode the needle properly and search that. That's just an

Re: Major performance problem with std.array.front()

2014-03-09 Thread Andrei Alexandrescu
On 3/9/14, 12:25 PM, Dmitry Olshansky wrote: Okay putting potential breakage aside. Let me sketch up an additive way of improving current situation. Now you're talking. 1. Say we recognize any indexable entity of char/wchar/dchar, that however has .front returning a dchar as a narrow string.

Re: Major performance problem with std.array.front()

2014-03-09 Thread w0rp
On Sunday, 9 March 2014 at 19:40:32 UTC, Andrei Alexandrescu wrote: 6. Take into account ASCII and maybe other alphabets? Should be as trivial as .assumeASCII and then on you march with all of std.algo/etc. Walter is against that. His main argument is that UTF already covers ASCII with only

Re: Major performance problem with std.array.front()

2014-03-09 Thread Dmitry Olshansky
09-Mar-2014 23:40, Andrei Alexandrescu пишет: On 3/9/14, 12:25 PM, Dmitry Olshansky wrote: Okay putting potential breakage aside. Let me sketch up an additive way of improving current situation. Now you're talking. 1. Say we recognize any indexable entity of char/wchar/dchar, that however

Re: Major performance problem with std.array.front()

2014-03-09 Thread Nick Sabalausky
On 3/9/2014 1:26 PM, Andrei Alexandrescu wrote: On 3/9/14, 6:34 AM, Jakob Ovrum wrote: `byCodeUnit` is essentially std.string.representation. Actually not because for reasons that are unclear to me people really want the individual type to be char, not ubyte. Probably because char *is*

Re: Major performance problem with std.array.front()

2014-03-09 Thread Nick Sabalausky
On 3/9/2014 11:21 AM, Vladimir Panteleev wrote: On Sunday, 9 March 2014 at 12:24:11 UTC, ponce wrote: - In lots of places, I've discovered that Phobos did UTF decoding (thus murdering performance) when it didn't need to. Such cases included format (now fixed), appender (now fixed), startsWith

Re: Major performance problem with std.array.front()

2014-03-09 Thread Nick Sabalausky
On 3/8/2014 9:15 PM, Michel Fortin wrote: Text is an interesting topic for never-ending discussions. It's also a good example for when non-programmers are surprised to hear that I *don't* see the world as binary black and white *because* of my programming experience ;) Problems like

Re: Major performance problem with std.array.front()

2014-03-09 Thread Nick Sabalausky
On 3/9/2014 7:47 AM, w0rp wrote: My knowledge of Unicode pretty much just comes from having to deal with foreign language customers and discovering the problems with the code unit abstraction most languages seem to use. (Java and Python suffer from similar issues, but they don't really have

Re: Major performance problem with std.array.front()

2014-03-09 Thread Walter Bright
On 3/9/2014 6:08 AM, Marc Schütz schue...@gmx.net wrote: Also, `byCodeUnit` and `byCodePoint` would probably be better names than `raw` and `decode`, to much the already existing `byGrapheme` in std.uni. I'd vastly prefer 'byChar', 'byWchar', 'byDchar' for each of string, wstring, dstring,

Re: Major performance problem with std.array.front()

2014-03-09 Thread Walter Bright
On 3/9/2014 6:34 AM, Jakob Ovrum wrote: `byCodeUnit` is essentially std.string.representation. Not at all. std.string.representation takes a string and casts it to the corresponding ubyte, ushort, uint string. It doesn't work at all with InputRange!char

Re: Major performance problem with std.array.front()

2014-03-09 Thread Nick Sabalausky
On 3/9/2014 6:31 PM, Walter Bright wrote: On 3/9/2014 6:08 AM, Marc Schütz schue...@gmx.net wrote: Also, `byCodeUnit` and `byCodePoint` would probably be better names than `raw` and `decode`, to much the already existing `byGrapheme` in std.uni. I'd vastly prefer 'byChar', 'byWchar',

Re: Major performance problem with std.array.front()

2014-03-09 Thread Nick Sabalausky
On 3/10/2014 12:19 AM, Nick Sabalausky wrote: (str|wchar|dchar).byChar // Always range of char (str|wchar|dchar).byWchar // Always range of wchar (str|wchar|dchar).byDchar // Always range of dchar Erm, naturally I meant (str|wstr|dstr)

Re: Major performance problem with std.array.front()

2014-03-09 Thread Walter Bright
On 3/9/2014 9:19 PM, Nick Sabalausky wrote: On 3/9/2014 6:31 PM, Walter Bright wrote: On 3/9/2014 6:08 AM, Marc Schütz schue...@gmx.net wrote: Also, `byCodeUnit` and `byCodePoint` would probably be better names than `raw` and `decode`, to much the already existing `byGrapheme` in std.uni.

Re: Major performance problem with std.array.front()

2014-03-08 Thread Dmitry Olshansky
08-Mar-2014 05:23, Andrei Alexandrescu пишет: On 3/7/14, 1:58 PM, Vladimir Panteleev wrote: On Friday, 7 March 2014 at 21:56:45 UTC, Eyrk wrote: On Friday, 7 March 2014 at 20:43:45 UTC, Vladimir Panteleev wrote: No, it doesn't. import std.algorithm; void main() { auto s = cassé;

Re: Major performance problem with std.array.front()

2014-03-08 Thread Dmitry Olshansky
08-Mar-2014 12:09, Dmitry Olshansky пишет: 08-Mar-2014 05:23, Andrei Alexandrescu пишет: On 3/7/14, 1:58 PM, Vladimir Panteleev wrote: On Friday, 7 March 2014 at 21:56:45 UTC, Eyrk wrote: On Friday, 7 March 2014 at 20:43:45 UTC, Vladimir Panteleev wrote: No, it doesn't. import

Re: Major performance problem with std.array.front()

2014-03-08 Thread Dmitry Olshansky
08-Mar-2014 05:18, Andrei Alexandrescu пишет: On 3/7/14, 12:48 PM, Dmitry Olshansky wrote: 07-Mar-2014 23:57, Andrei Alexandrescu пишет: On 3/6/14, 6:37 PM, Walter Bright wrote: In Lots of low hanging fruit in Phobos the issue came up about the automatic encoding and decoding of char ranges.

Re: Major performance problem with std.array.front()

2014-03-08 Thread Eyrk
On Saturday, 8 March 2014 at 02:04:12 UTC, bearophile wrote: Vladimir Panteleev: It's not about types, it's about algorithms. Given sufficiently refined types, it can be about types :-) Bye, bearophile I think Bear is onto something, we already solved an analogous problem in an elegant

Re: Major performance problem with std.array.front()

2014-03-08 Thread Andrei Alexandrescu
On 3/8/14, 12:14 AM, Dmitry Olshansky wrote: 08-Mar-2014 12:09, Dmitry Olshansky пишет: 08-Mar-2014 05:23, Andrei Alexandrescu пишет: On 3/7/14, 1:58 PM, Vladimir Panteleev wrote: On Friday, 7 March 2014 at 21:56:45 UTC, Eyrk wrote: On Friday, 7 March 2014 at 20:43:45 UTC, Vladimir Panteleev

Re: Major performance problem with std.array.front()

2014-03-08 Thread Andrei Alexandrescu
On 3/8/14, 12:09 AM, Dmitry Olshansky wrote: 08-Mar-2014 05:23, Andrei Alexandrescu пишет: On 3/7/14, 1:58 PM, Vladimir Panteleev wrote: On Friday, 7 March 2014 at 21:56:45 UTC, Eyrk wrote: On Friday, 7 March 2014 at 20:43:45 UTC, Vladimir Panteleev wrote: No, it doesn't. import

  1   2   3   >