The Case Against Autodecode

2016-05-12 Thread Walter Bright via Digitalmars-d
On 5/12/2016 9:29 AM, Andrei Alexandrescu wrote: > I am as unclear about the problems of autodecoding as I am about the necessity > to remove curl. Whenever I ask I hear some arguments that work well emotionally > but are scant on reason and engineering. Maybe it's time to rehash them? I just >

Re: The Case Against Autodecode

2016-05-12 Thread Vladimir Panteleev via Digitalmars-d
On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote: On 5/12/2016 9:29 AM, Andrei Alexandrescu wrote: > I am as unclear about the problems of autodecoding as I am about the necessity > to remove curl. Whenever I ask I hear some arguments that work well emotionally > but are scant on rea

Re: The Case Against Autodecode

2016-05-12 Thread H. S. Teoh via Digitalmars-d
On Thu, May 12, 2016 at 08:24:23PM +, Vladimir Panteleev via Digitalmars-d wrote: > On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote: [...] > >1. Ranges of characters do not autodecode, but arrays of characters > >do. This is a glaring inconsistency. > > > >2. Every time one want

Re: The Case Against Autodecode

2016-05-12 Thread H. S. Teoh via Digitalmars-d
On Thu, May 12, 2016 at 08:24:23PM +, Vladimir Panteleev via Digitalmars-d wrote: [...] > 12. The result of autodecoding, a range of Unicode code points, is > rarely actually useful, and code that relies on autodecoding is rarely > actually, universally correct. Graphemes are occasionally usef

Re: The Case Against Autodecode

2016-05-12 Thread Daniel Kozak via Digitalmars-d
On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote: On 5/12/2016 9:29 AM, Andrei Alexandrescu wrote: > I am as unclear about the problems of autodecoding as I am about the necessity > to remove curl. Whenever I ask I hear some arguments that work well emotionally > but are scant on rea

Re: The Case Against Autodecode

2016-05-12 Thread Walter Bright via Digitalmars-d
On 5/12/2016 4:23 PM, Daniel Kozak wrote: But what I am really piss of is that current string type is alias to immutable(char)[] (so it is not usable at all). This is really problem for me. Because this make working on array of chars almost impossible. Even char[] is unusable. So I am force to u

Re: The Case Against Autodecode

2016-05-12 Thread Marco Leise via Digitalmars-d
Am Thu, 12 May 2016 13:15:45 -0700 schrieb Walter Bright : > 7. Autodecode cannot be used with unicode path/filenames, because it is legal > (at least on Linux) to have invalid UTF-8 as filenames. More precisely they are byte strings with '/' reserved to separate path elements. While on an out-o

Re: The Case Against Autodecode

2016-05-12 Thread Jack Stouffer via Digitalmars-d
On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote: Here are some that are not matters of opinion. If you're serious about removing auto-decoding, which I think you and others have shown has merits, you have to the THE SIMPLEST migration path ever, or you will kill D. I'm talking a

Re: The Case Against Autodecode

2016-05-12 Thread Walter Bright via Digitalmars-d
On 5/12/2016 4:52 PM, Marco Leise wrote: I'd like 'string' to mean valid UTF-8 in D as far as the encoding goes. A filename should not be a 'string'. I would have agreed with you in the past, but more and more it just doesn't seem practical. UTF-8 is dirty in the real world, and D code will ha

Re: The Case Against Autodecode

2016-05-12 Thread Jack Stouffer via Digitalmars-d
On Friday, 13 May 2016 at 00:47:04 UTC, Jack Stouffer wrote: I'm not exaggerating here. Python, a language which was much more popular than D at the time, came out with two versions in 2008: Python 2.7 which had numerous unicode problems, and Python 3.0 which fixed those problems. Almost eight

Re: The Case Against Autodecode

2016-05-12 Thread Walter Bright via Digitalmars-d
On 5/12/2016 5:47 PM, Jack Stouffer wrote: D is much less popular now than was Python at the time, and Python 2 problems were more straight forward than the auto-decoding problem. You'll need a very clear migration path, years long deprecations, and automatic tools in order to make the transitio

Re: The Case Against Autodecode

2016-05-12 Thread Jack Stouffer via Digitalmars-d
On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote: 2. Every time one wants an algorithm to work with both strings and ranges, you wind up special casing the strings to defeat the autodecoding, or to decode the ranges. Having to constantly special case it makes for more special cases

Re: The Case Against Autodecode

2016-05-12 Thread Bill Hicks via Digitalmars-d
On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote: Here are some that are not matters of opinion. 1. Ranges of characters do not autodecode, but arrays of characters do. This is a glaring inconsistency. 2. Every time one wants an algorithm to work with both strings and ranges, y

Re: The Case Against Autodecode

2016-05-13 Thread Ethan Watson via Digitalmars-d
On Friday, 13 May 2016 at 06:50:49 UTC, Bill Hicks wrote: *rant* Actually, chap, it's the attitude that's the turn-off in your post there. Listing problems in order to improve them, and listing problems to convince people something is a waste of time are incompatible mindsets around here.

Re: The Case Against Autodecode

2016-05-13 Thread poliklosio via Digitalmars-d
On Friday, 13 May 2016 at 06:50:49 UTC, Bill Hicks wrote: On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote: (...) Wow, that's eleven things wrong with just one tiny element of D, with the potential to cause problems, whether fixed or not. And I get called a troll and other names

Re: The Case Against Autodecode

2016-05-13 Thread Ola Fosheim Grøstad via Digitalmars-d
On Friday, 13 May 2016 at 00:47:04 UTC, Jack Stouffer wrote: D is much less popular now than was Python at the time, and Python 2 problems were more straight forward than the auto-decoding problem. You'll need a very clear migration path, years long deprecations, and automatic tools in order t

Re: The Case Against Autodecode

2016-05-13 Thread Chris via Digitalmars-d
On Friday, 13 May 2016 at 01:00:54 UTC, Walter Bright wrote: On 5/12/2016 5:47 PM, Jack Stouffer wrote: D is much less popular now than was Python at the time, and Python 2 problems were more straight forward than the auto-decoding problem. You'll need a very clear migration path, years long d

Re: The Case Against Autodecode

2016-05-13 Thread Chris via Digitalmars-d
On Friday, 13 May 2016 at 06:50:49 UTC, Bill Hicks wrote: Wow, that's eleven things wrong with just one tiny element of D, with the potential to cause problems, whether fixed or not. And I get called a troll and other names when I list half a dozen things wrong with D, my posts get removed/c

Re: The Case Against Autodecode

2016-05-13 Thread Kagamin via Digitalmars-d
On Friday, 13 May 2016 at 06:50:49 UTC, Bill Hicks wrote: not to waste time with D because it's a broken and failed language. D is a better broken thing among all the broken things in this broken world, so it's to be expected to be preferred to spend time on.

Re: The Case Against Autodecode

2016-05-13 Thread Jonathan M Davis via Digitalmars-d
On Thursday, May 12, 2016 13:15:45 Walter Bright via Digitalmars-d wrote: > On 5/12/2016 9:29 AM, Andrei Alexandrescu wrote: > > I am as unclear about the problems of autodecoding as I am about the > > necessity to remove curl. Whenever I ask I hear some arguments that work > > well emotionally

Re: The Case Against Autodecode

2016-05-13 Thread Marc Schütz via Digitalmars-d
On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote: 7. Autodecode cannot be used with unicode path/filenames, because it is legal (at least on Linux) to have invalid UTF-8 as filenames. It turns out in the wild that pure Unicode is not universal - there's lots of dirty Unicode that s

Re: The Case Against Autodecode

2016-05-13 Thread Chris via Digitalmars-d
On Friday, 13 May 2016 at 10:38:09 UTC, Jonathan M Davis wrote: Based on what I've seen in previous conversations on auto-decoding over the past few years (be it in the newsgroup, on github, or at dconf), most of the core devs think that auto-decoding was a major blunder that we continue to p

Re: The Case Against Autodecode

2016-05-13 Thread Marc Schütz via Digitalmars-d
On Thursday, 12 May 2016 at 23:16:23 UTC, H. S. Teoh wrote: Therefore, autodecoding actually only produces intuitively correct results when your string has a 1-to-1 correspondence between grapheme and code point. In general, this is only true for a small subset of languages, mainly a few common

Re: The Case Against Autodecode

2016-05-13 Thread Marc Schütz via Digitalmars-d
On Friday, 13 May 2016 at 10:38:09 UTC, Jonathan M Davis wrote: Ideally, algorithms would be Unicode aware as appropriate, but the default would be to operate on code units with wrappers to handle decoding by code point or grapheme. Then it's easy to write fast code while still allowing for ful

Re: The Case Against Autodecode

2016-05-13 Thread Nick Treleaven via Digitalmars-d
On Friday, 13 May 2016 at 00:47:04 UTC, Jack Stouffer wrote: If you're serious about removing auto-decoding, which I think you and others have shown has merits, you have to the THE SIMPLEST migration path ever, or you will kill D. I'm talking a simple press of a button. char[] is always going

Re: The Case Against Autodecode

2016-05-13 Thread Kagamin via Digitalmars-d
On Friday, 13 May 2016 at 10:38:09 UTC, Jonathan M Davis wrote: IIRC, Andrei talked in TDPL about how Java's choice to go with UTF-16 was worse than the choice to go with UTF-8, because it was correct in many more cases UTF-16 was a migration from UCS-2, and UCS-2 was superior at the time.

Re: The Case Against Autodecode

2016-05-13 Thread Walter Bright via Digitalmars-d
On 5/13/2016 2:12 AM, Chris wrote: If autodecode is killed, could we have a test version asap? I'd be willing to test my programs with autodecode turned off and see what happens. Others should do likewise and we could come up with a transition strategy based on what happened. You can avoid aut

Re: The Case Against Autodecode

2016-05-13 Thread Walter Bright via Digitalmars-d
On 5/12/2016 11:50 PM, Bill Hicks wrote: And I get called a troll and other names when I list half a dozen things wrong with D, my posts get removed/censored, etc, all because I try to inform people not to waste time with D because it's a broken and failed language. Posts that engage in persona

Re: The Case Against Autodecode

2016-05-13 Thread Walter Bright via Digitalmars-d
On 5/13/2016 3:43 AM, Marc Schütz wrote: On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote: 7. Autodecode cannot be used with unicode path/filenames, because it is legal (at least on Linux) to have invalid UTF-8 as filenames. It turns out in the wild that pure Unicode is not universa

Re: The Case Against Autodecode

2016-05-13 Thread Chris via Digitalmars-d
On Friday, 13 May 2016 at 13:17:44 UTC, Walter Bright wrote: On 5/13/2016 2:12 AM, Chris wrote: If autodecode is killed, could we have a test version asap? I'd be willing to test my programs with autodecode turned off and see what happens. Others should do likewise and we could come up with a t

Re: The Case Against Autodecode

2016-05-13 Thread Vladimir Panteleev via Digitalmars-d
On Friday, 13 May 2016 at 13:41:30 UTC, Chris wrote: PS Why does do I get a "StopForumSpam error" every time I post today? Has anyone else experienced the same problem: "StopForumSpam error: Socket error: Lookup error: getaddrinfo error: Name or service not known. Please solve a CAPTCHA to co

Re: The Case Against Autodecode

2016-05-13 Thread Chris via Digitalmars-d
On Friday, 13 May 2016 at 14:06:28 UTC, Vladimir Panteleev wrote: On Friday, 13 May 2016 at 13:41:30 UTC, Chris wrote: PS Why does do I get a "StopForumSpam error" every time I post today? Has anyone else experienced the same problem: "StopForumSpam error: Socket error: Lookup error: getaddrin

Re: The Case Against Autodecode

2016-05-13 Thread Steven Schveighoffer via Digitalmars-d
On 5/12/16 4:15 PM, Walter Bright wrote: 10. Autodecoded arrays cannot be RandomAccessRanges, losing a key benefit of being arrays in the first place. I'll repeat what I said in the other thread. The problem isn't auto-decoding. The problem is hijacking the char[] and wchar[] (and variants)

Re: The Case Against Autodecode

2016-05-13 Thread H. S. Teoh via Digitalmars-d
On Fri, May 13, 2016 at 12:16:30PM +, Nick Treleaven via Digitalmars-d wrote: > On Friday, 13 May 2016 at 00:47:04 UTC, Jack Stouffer wrote: > >If you're serious about removing auto-decoding, which I think you and > >others have shown has merits, you have to the THE SIMPLEST migration > >path

Re: The Case Against Autodecode

2016-05-13 Thread Marco Leise via Digitalmars-d
Am Fri, 13 May 2016 10:49:24 + schrieb Marc Schütz : > In fact, even most European languages are affected if NFD > normalization is used, which is the default on MacOS X. > > And this is actually the main problem with it: It was introduced > to make unicode string handling correct. Well, it

Re: The Case Against Autodecode

2016-05-13 Thread Iakh via Digitalmars-d
On Friday, 13 May 2016 at 01:00:54 UTC, Walter Bright wrote: On 5/12/2016 5:47 PM, Jack Stouffer wrote: D is much less popular now than was Python at the time, and Python 2 problems were more straight forward than the auto-decoding problem. You'll need a very clear migration path, years long d

Re: The Case Against Autodecode

2016-05-13 Thread Jonathan M Davis via Digitalmars-d
On Friday, May 13, 2016 11:00:19 Marc Schütz via Digitalmars-d wrote: > On Friday, 13 May 2016 at 10:38:09 UTC, Jonathan M Davis wrote: > > Ideally, algorithms would be Unicode aware as appropriate, but > > the default would be to operate on code units with wrappers to > > handle decoding by code p

Re: The Case Against Autodecode

2016-05-13 Thread Alex Parrill via Digitalmars-d
On Friday, 13 May 2016 at 16:05:21 UTC, Steven Schveighoffer wrote: On 5/12/16 4:15 PM, Walter Bright wrote: 10. Autodecoded arrays cannot be RandomAccessRanges, losing a key benefit of being arrays in the first place. I'll repeat what I said in the other thread. The problem isn't auto-deco

Re: The Case Against Autodecode

2016-05-13 Thread Steven Schveighoffer via Digitalmars-d
On 5/13/16 5:25 PM, Alex Parrill wrote: On Friday, 13 May 2016 at 16:05:21 UTC, Steven Schveighoffer wrote: On 5/12/16 4:15 PM, Walter Bright wrote: 10. Autodecoded arrays cannot be RandomAccessRanges, losing a key benefit of being arrays in the first place. I'll repeat what I said in the ot

Re: The Case Against Autodecode

2016-05-13 Thread Jonathan M Davis via Digitalmars-d
On Friday, May 13, 2016 12:52:13 Kagamin via Digitalmars-d wrote: > On Friday, 13 May 2016 at 10:38:09 UTC, Jonathan M Davis wrote: > > IIRC, Andrei talked in TDPL about how Java's choice to go with > > UTF-16 was worse than the choice to go with UTF-8, because it > > was correct in many more cases

Re: The Case Against Autodecode

2016-05-13 Thread H. S. Teoh via Digitalmars-d
On Fri, May 13, 2016 at 09:26:40PM +0200, Marco Leise via Digitalmars-d wrote: > Am Fri, 13 May 2016 10:49:24 + > schrieb Marc Schütz : > > > In fact, even most European languages are affected if NFD > > normalization is used, which is the default on MacOS X. > > > > And this is actually the

Re: The Case Against Autodecode

2016-05-14 Thread Bill Hicks via Digitalmars-d
On Friday, 13 May 2016 at 07:26:53 UTC, poliklosio wrote: Also, you are missing the point by claiming that a technical problem is sure to kill D. Note that very successful languages like C++, python and so on also have undergone heated discussions about various features, and often live design

Re: The Case Against Autodecode

2016-05-14 Thread Bill Hicks via Digitalmars-d
On Friday, 13 May 2016 at 09:28:45 UTC, Chris wrote: PS I wonder does Bill Hicks know you're using his name? But I guess he's lost interest in this planet and happily lives on Mars now. Maybe I'm using the name to avoid being harassed. Or maybe, there are thousands of people in the world n

Re: The Case Against Autodecode

2016-05-15 Thread Ola Fosheim Grøstad via Digitalmars-d
On Sunday, 15 May 2016 at 01:45:25 UTC, Bill Hicks wrote: From a technical point, D is not successful, for the most part. C/C++ at least can use the excuse that they were created during a time when we didn't have the experience and the knowledge that we do now. Not really. The dominating pre

Re: The Case Against Autodecode

2016-05-15 Thread Jon D via Digitalmars-d
On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote: On 5/12/2016 9:29 AM, Andrei Alexandrescu wrote: > I am as unclear about the problems of autodecoding as I am about the necessity > to remove curl. Whenever I ask I hear some arguments that work well emotionally > but are scant on rea

Re: The Case Against Autodecode

2016-05-15 Thread Jack Stouffer via Digitalmars-d
On Sunday, 15 May 2016 at 23:10:38 UTC, Jon D wrote: Given the importance of performance in the auto-decoding topic, it seems reasonable to quantify it. I took a stab at this. It would of course be prudent to have others conduct similar analysis rather than rely on my numbers alone. Here is a

Re: The Case Against Autodecode

2016-05-15 Thread H. S. Teoh via Digitalmars-d
On Mon, May 16, 2016 at 12:31:04AM +, Jack Stouffer via Digitalmars-d wrote: > On Sunday, 15 May 2016 at 23:10:38 UTC, Jon D wrote: > >Given the importance of performance in the auto-decoding topic, it > >seems reasonable to quantify it. I took a stab at this. It would of > >course be prudent t

Re: The Case Against Autodecode

2016-05-16 Thread jmh530 via Digitalmars-d
On Sunday, 15 May 2016 at 23:10:38 UTC, Jon D wrote: Runs for each combination were done five times and the median times used. The median times and the char[] to ubyte[] ratio are below: | | |char[] | ubyte[] | | Compiler | Text type | time (ms) | time (ms) | ratio |

Re: The Case Against Autodecode

2016-05-17 Thread Kagamin via Digitalmars-d
On Friday, 13 May 2016 at 21:46:28 UTC, Jonathan M Davis wrote: The history of why UTF-16 was chosen isn't really relevant to my point (Win32 has the same problem as Java and for similar reasons). My point was that if you use UTF-8, then it's obvious _really_ fast when you screwed up Unicode-

Re: The Case Against Autodecode

2016-05-17 Thread sarn via Digitalmars-d
On Tuesday, 17 May 2016 at 09:53:17 UTC, Kagamin wrote: With UTF-8 problems happened on a massive scale in LAMP setups: mysql used latin1 as a default encoding and almost everything worked fine. ^ latin-1 with Swedish collation rules. And even if you set the encoding to "utf8", almost everythi

Re: The Case Against Autodecode

2016-05-26 Thread Andrei Alexandrescu via Digitalmars-d
This might be a good time to discuss this a tad further. I'd appreciate if the debate stayed on point going forward. Thanks! My thesis: the D1 design decision to represent strings as char[] was disastrous and probably one of the largest weaknesses of D1. The decision in D2 to use immutable(cha

Re: The Case Against Autodecode

2016-05-26 Thread Jack Stouffer via Digitalmars-d
On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote: instead, it should use standard library algorithms for searching, matching etc. When needed, iterating every code unit is trivially done through indexing. For an example where the std.algorithm/range functions don't cut it,

Re: The Case Against Autodecode

2016-05-26 Thread H. S. Teoh via Digitalmars-d
On Thu, May 26, 2016 at 12:00:54PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: [...] > On 05/12/2016 04:15 PM, Walter Bright wrote: [...] > > 4. Autodecoding is slow and has no place in high speed string processing. > > I would agree only with the amendment "...if used naively", which is

Re: The Case Against Autodecode

2016-05-26 Thread Andrei Alexandrescu via Digitalmars-d
On 05/26/2016 07:23 PM, H. S. Teoh via Digitalmars-d wrote: Therefore, instead of: myString.splitter!"abc".joiner!"def".count; we have to write: myString.representation .splitter!("abc".representation) .joiner!("def".representation)

Re: The Case Against Autodecode

2016-05-26 Thread Vladimir Panteleev via Digitalmars-d
On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote: 4. Autodecoding is slow and has no place in high speed string processing. I would agree only with the amendment "...if used naively", which is important. Knowledge of how autodecoding works is a prerequisite for writing fast

Re: The Case Against Autodecode

2016-05-27 Thread Kagamin via Digitalmars-d
On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote: 11. Indexing an array produces different results than autodecoding, another glaring special case. This is a direct consequence of the fact that string is immutable(char)[] and not a specific type. That error predates autode

Re: The Case Against Autodecode

2016-05-27 Thread Marc Schütz via Digitalmars-d
On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote: This might be a good time to discuss this a tad further. I'd appreciate if the debate stayed on point going forward. Thanks! My thesis: the D1 design decision to represent strings as char[] was disastrous and probably one of

Re: The Case Against Autodecode

2016-05-27 Thread Chris via Digitalmars-d
On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote: [snip] I would agree only with the amendment "...if used naively", which is important. Knowledge of how autodecoding works is a prerequisite for writing fast string code in D. Also, little code should deal with one code uni

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 7:19 AM, Chris wrote: On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote: [snip] I would agree only with the amendment "...if used naively", which is important. Knowledge of how autodecoding works is a prerequisite for writing fast string code in D. Also, little code

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 6:56 AM, Marc Schütz wrote: It is not, which has been shown by various posts in this thread. Couldn't quite find strong arguments. Could you please be more explicit on which you found most convincing? -- Andrei

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 6:26 AM, Kagamin wrote: As I understand, design rationale behind strings being plain arrays of code units is that it's impractical for the string to smarter than array of code units - it just won't cut it, while plain array provides simple and easy to understand implementation of strin

Re: The Case Against Autodecode

2016-05-27 Thread ag0aep6g via Digitalmars-d
On 05/27/2016 03:32 PM, Andrei Alexandrescu wrote: However the following do require autodecoding: s.walkLength s.count!(c => !"!()-;:,.?".canFind(c)) // non-punctuation s.count!(c => c >= 32) // non-control characters Currently the standard library operates at code point level even though insid

Re: The Case Against Autodecode

2016-05-27 Thread Chris via Digitalmars-d
On Friday, 27 May 2016 at 13:47:32 UTC, ag0aep6g wrote: Misunderstanding. All examples work properly today because of autodecoding. -- Andrei They only work "properly" if you define "properly" as "in terms of code points". But working in terms of code points is usually wrong. If you want to c

Re: The Case Against Autodecode

2016-05-27 Thread H. S. Teoh via Digitalmars-d
On Fri, May 27, 2016 at 03:47:32PM +0200, ag0aep6g via Digitalmars-d wrote: > On 05/27/2016 03:32 PM, Andrei Alexandrescu wrote: > > > > However the following do require autodecoding: > > > > > > > > s.walkLength > > > > s.count!(c => !"!()-;:,.?".canFind(c)) // non-punctuation > > > > s.count!(c

Re: The Case Against Autodecode

2016-05-27 Thread Walter Bright via Digitalmars-d
On 5/26/2016 9:00 AM, Andrei Alexandrescu wrote: My thesis: the D1 design decision to represent strings as char[] was disastrous and probably one of the largest weaknesses of D1. The decision in D2 to use immutable(char)[] for strings is a vast improvement but still has a number of issues. The

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 10:15 AM, Chris wrote: It has happened to me that characters like "é" return length == 2 Would normalization make length 1? -- Andrei

Re: The Case Against Autodecode

2016-05-27 Thread Adam D. Ruppe via Digitalmars-d
On Friday, 27 May 2016 at 18:11:22 UTC, Andrei Alexandrescu wrote: Would normalization make length 1? -- Andrei In some, but not all cases.

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 1:11 PM, Walter Bright wrote: They mean code units. Always valid or potentially invalid as well? -- Andrei

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 12:40 PM, H. S. Teoh via Digitalmars-d wrote: Exactly. And we just keep getting stuck on this point. It seems that the message just isn't getting through. The unfounded assumption continues to be made that iterating by code point is somehow "correct" by definition and nobody can challe

Re: The Case Against Autodecode

2016-05-27 Thread ag0aep6g via Digitalmars-d
On 05/27/2016 08:42 PM, Andrei Alexandrescu wrote: Which languages are covered by code points, and which languages require graphemes consisting of multiple code points? How does normalization play into this? -- Andrei I don't think there is value in distinguishing by language. The point of Uni

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 3:10 PM, ag0aep6g wrote: I don't think there is value in distinguishing by language. The point of Unicode is that you shouldn't need to do that. It seems code points are kind of useless because they don't really mean anything, would that be accurate? -- Andrei

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 5/27/16 1:11 PM, Walter Bright wrote: The std.string algorithms I wrote all work much better (i.e. faster) without autodecoding, while maintaining proper Unicode support. Violent agreement is occurring here. We have plenty of those and need more. -- Andrei

Re: The Case Against Autodecode

2016-05-27 Thread Dmitry Olshansky via Digitalmars-d
On 27-May-2016 21:11, Andrei Alexandrescu wrote: On 5/27/16 10:15 AM, Chris wrote: It has happened to me that characters like "é" return length == 2 Would normalization make length 1? -- Andrei No, this is not the point of normalization. -- Dmitry Olshansky

Re: The Case Against Autodecode

2016-05-27 Thread ag0aep6g via Digitalmars-d
On 05/27/2016 09:30 PM, Andrei Alexandrescu wrote: It seems code points are kind of useless because they don't really mean anything, would that be accurate? -- Andrei I think so, yeah. Due to combining characters, code points are similar to code units: a Unicode thing that you need to know ab

Re: The Case Against Autodecode

2016-05-27 Thread H. S. Teoh via Digitalmars-d
On Fri, May 27, 2016 at 02:42:27PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: > On 5/27/16 12:40 PM, H. S. Teoh via Digitalmars-d wrote: > > Exactly. And we just keep getting stuck on this point. It seems that > > the message just isn't getting through. The unfounded assumption > > contin

Re: The Case Against Autodecode

2016-05-27 Thread H. S. Teoh via Digitalmars-d
On Fri, May 27, 2016 at 03:30:53PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: > On 5/27/16 3:10 PM, ag0aep6g wrote: > > I don't think there is value in distinguishing by language. The > > point of Unicode is that you shouldn't need to do that. > > It seems code points are kind of useless

Re: The Case Against Autodecode

2016-05-27 Thread Adam D. Ruppe via Digitalmars-d
On Friday, 27 May 2016 at 19:30:53 UTC, Andrei Alexandrescu wrote: It seems code points are kind of useless because they don't really mean anything, would that be accurate? -- Andrei It might help to think of code points as being a kind of byte code for a text-representing VM. It's not meani

Re: The Case Against Autodecode

2016-05-27 Thread Steven Schveighoffer via Digitalmars-d
On 5/27/16 3:30 PM, Andrei Alexandrescu wrote: On 5/27/16 3:10 PM, ag0aep6g wrote: I don't think there is value in distinguishing by language. The point of Unicode is that you shouldn't need to do that. It seems code points are kind of useless because they don't really mean anything, would tha

Re: The Case Against Autodecode

2016-05-27 Thread H. S. Teoh via Digitalmars-d
On Fri, May 27, 2016 at 07:53:30PM +, Adam D. Ruppe via Digitalmars-d wrote: > On Friday, 27 May 2016 at 19:30:53 UTC, Andrei Alexandrescu wrote: > > It seems code points are kind of useless because they don't really > > mean anything, would that be accurate? -- Andrei > > It might help to thi

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 05/27/2016 03:43 PM, H. S. Teoh via Digitalmars-d wrote: That's what we've been trying to say all along! If that's the case things are pretty dire, autodecoding or not. -- Andrei

Re: The Case Against Autodecode

2016-05-27 Thread Andrei Alexandrescu via Digitalmars-d
On 05/27/2016 03:39 PM, Dmitry Olshansky wrote: On 27-May-2016 21:11, Andrei Alexandrescu wrote: On 5/27/16 10:15 AM, Chris wrote: It has happened to me that characters like "é" return length == 2 Would normalization make length 1? -- Andrei No, this is not the point of normalization. Wha

Re: The Case Against Autodecode

2016-05-27 Thread Minas Mina via Digitalmars-d
On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote: On 05/27/2016 03:39 PM, Dmitry Olshansky wrote: On 27-May-2016 21:11, Andrei Alexandrescu wrote: On 5/27/16 10:15 AM, Chris wrote: It has happened to me that characters like "é" return length == 2 Would normalization make len

Re: The Case Against Autodecode

2016-05-27 Thread tsbockman via Digitalmars-d
On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote: On 05/27/2016 03:39 PM, Dmitry Olshansky wrote: No, this is not the point of normalization. What is? -- Andrei 1) A grapheme may include several combining characters (such as diacritics) whose order is not supposed to be sem

Re: The Case Against Autodecode

2016-05-27 Thread Minas Mina via Digitalmars-d
On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote: On 05/27/2016 03:39 PM, Dmitry Olshansky wrote: On 27-May-2016 21:11, Andrei Alexandrescu wrote: On 5/27/16 10:15 AM, Chris wrote: It has happened to me that characters like "é" return length == 2 Would normalization make len

Re: The Case Against Autodecode

2016-05-27 Thread David Nadlinger via Digitalmars-d
On Friday, 27 May 2016 at 22:12:57 UTC, Minas Mina wrote: Those should be the same though, i.e compare the same. In order to do that, there is normalization. What is does is to _expand_ the single codepoint Ä into A + ¨ Unless I'm mistaken, this depends on the form used. For example, in NFKC

Re: The Case Against Autodecode

2016-05-27 Thread Walter Bright via Digitalmars-d
On 5/27/2016 11:27 AM, Andrei Alexandrescu wrote: On 5/27/16 1:11 PM, Walter Bright wrote: They mean code units. Always valid or potentially invalid as well? -- Andrei Some years ago I would have said always valid. Experience, however, says that Unicode is often dirty and code should be tol

Re: The Case Against Autodecode

2016-05-27 Thread H. S. Teoh via Digitalmars-d
On Fri, May 27, 2016 at 04:41:09PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: > On 05/27/2016 03:43 PM, H. S. Teoh via Digitalmars-d wrote: > > That's what we've been trying to say all along! > > If that's the case things are pretty dire, autodecoding or not. -- > Andrei Like it or not,

Re: The Case Against Autodecode

2016-05-28 Thread Dmitry Olshansky via Digitalmars-d
On 28-May-2016 01:04, tsbockman wrote: On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote: On 05/27/2016 03:39 PM, Dmitry Olshansky wrote: No, this is not the point of normalization. What is? -- Andrei 1) A grapheme may include several combining characters (such as diacritics

Re: The Case Against Autodecode

2016-05-28 Thread Marc Schütz via Digitalmars-d
On Friday, 27 May 2016 at 13:34:33 UTC, Andrei Alexandrescu wrote: On 5/27/16 6:56 AM, Marc Schütz wrote: It is not, which has been shown by various posts in this thread. Couldn't quite find strong arguments. Could you please be more explicit on which you found most convincing? -- Andrei Th

Re: The Case Against Autodecode

2016-05-28 Thread Andrei Alexandrescu via Digitalmars-d
On 5/28/16 6:59 AM, Marc Schütz wrote: The fundamental problem is choosing one of those possibilities over the others without knowing what the user actually wants, which is what both BEFORE and AFTER do. OK, that's a fair argument, thanks. So it seems there should be no "default" way to iterat

Re: The Case Against Autodecode

2016-05-28 Thread Chris via Digitalmars-d
On Friday, 27 May 2016 at 18:11:22 UTC, Andrei Alexandrescu wrote: On 5/27/16 10:15 AM, Chris wrote: It has happened to me that characters like "é" return length == 2 Would normalization make length 1? -- Andrei No, I've tried it. I think dchar[] returns one or you check by grapheme.

Re: The Case Against Autodecode

2016-05-28 Thread Walter Bright via Digitalmars-d
On 5/28/2016 5:04 AM, Andrei Alexandrescu wrote: So it harkens back to the original mistake: strings should NOT be arrays with the respective primitives. An array of code units provides consistency, predictability, flexibility, and performance. It's a solid base upon which the programmer can b

Re: The Case Against Autodecode

2016-05-28 Thread Andrew Godfrey via Digitalmars-d
On Saturday, 28 May 2016 at 19:04:14 UTC, Walter Bright wrote: On 5/28/2016 5:04 AM, Andrei Alexandrescu wrote: So it harkens back to the original mistake: strings should NOT be arrays with the respective primitives. An array of code units provides consistency, predictability, flexibility, a

Re: The Case Against Autodecode

2016-05-28 Thread Jack Stouffer via Digitalmars-d
On Saturday, 28 May 2016 at 12:04:20 UTC, Andrei Alexandrescu wrote: OK, that's a fair argument, thanks. So it seems there should be no "default" way to iterate a string Yes! So it harkens back to the original mistake: strings should NOT be arrays with the respective primitives. If you're p

Re: The Case Against Autodecode

2016-05-29 Thread Dicebot via Digitalmars-d
On 05/28/2016 03:04 PM, Andrei Alexandrescu wrote: > On 5/28/16 6:59 AM, Marc Schütz wrote: >> The fundamental problem is choosing one of those possibilities over the >> others without knowing what the user actually wants, which is what both >> BEFORE and AFTER do. > > OK, that's a fair argument,

Re: The Case Against Autodecode

2016-05-29 Thread Chris via Digitalmars-d
On Saturday, 28 May 2016 at 22:29:12 UTC, Andrew Godfrey wrote: [snip] From all the detail in this thread, I wonder now if "a grapheme" is even an unambiguous concept across different environments. Unicode graphemes are not always the same as graphemes in natural (written) languages. If <é

Re: The Case Against Autodecode

2016-05-29 Thread Tobias Müller via Digitalmars-d
On Sunday, 29 May 2016 at 11:25:11 UTC, Chris wrote: Unicode graphemes are not always the same as graphemes in natural (written) languages. If <é> is composed in Unicode, it is still one grapheme in a written language, not two distinct characters. However, in natural languages two characters ca

Re: The Case Against Autodecode

2016-05-29 Thread default0 via Digitalmars-d
On Sunday, 29 May 2016 at 11:47:30 UTC, Tobias Müller wrote: On Sunday, 29 May 2016 at 11:25:11 UTC, Chris wrote: Unicode graphemes are not always the same as graphemes in natural (written) languages. If <é> is composed in Unicode, it is still one grapheme in a written language, not two distinc

Re: The Case Against Autodecode

2016-05-29 Thread Chris via Digitalmars-d
On Sunday, 29 May 2016 at 11:47:30 UTC, Tobias Müller wrote: On Sunday, 29 May 2016 at 11:25:11 UTC, Chris wrote: Unicode graphemes are not always the same as graphemes in natural (written) languages. If <é> is composed in Unicode, it is still one grapheme in a written language, not two distinc

Re: The Case Against Autodecode

2016-05-29 Thread Tobias M via Digitalmars-d
On Sunday, 29 May 2016 at 12:08:52 UTC, default0 wrote: I am pretty sure that a single grapheme in unicode does not correspond to your notion of "character". I am pretty sure that what you think of as a "character" is officially called "Grapheme Cluster" not "Grapheme". Grapheme is a linguist

  1   2   3   4   5   >