On 6/5/2016 1:05 AM, deadalnix wrote:
TIL: books are read by computers.
I should introduce you to a fabulous technology called OCR. :-)
On 6/5/2016 1:07 AM, deadalnix wrote:
On Saturday, 4 June 2016 at 03:03:16 UTC, Walter Bright wrote:
Oh rubbish. Let go of the idea that choosing bad fonts should drive Unicode
codepoint decisions.
Interestingly enough, I've mentioned earlier here that only people from the US
would believe th
On Saturday, 4 June 2016 at 08:12:47 UTC, Walter Bright wrote:
On 6/3/2016 11:17 PM, H. S. Teoh via Digitalmars-d wrote:
On Fri, Jun 03, 2016 at 08:03:16PM -0700, Walter Bright via
Digitalmars-d wrote:
It works for books.
Because books don't allow their readers to change the font.
Unicode is
On Friday, June 03, 2016 15:38:38 Walter Bright via Digitalmars-d wrote:
> On 6/3/2016 2:10 PM, Jonathan M Davis via Digitalmars-d wrote:
> > Actually, I would argue that the moment that Unicode is concerned with
> > what
> > the character actually looks like rather than what character it logically
On Saturday, 4 June 2016 at 03:03:16 UTC, Walter Bright wrote:
Oh rubbish. Let go of the idea that choosing bad fonts should
drive Unicode codepoint decisions.
Interestingly enough, I've mentioned earlier here that only
people from the US would believe that documents with mixed
languages ar
On Friday, 3 June 2016 at 18:43:07 UTC, Walter Bright wrote:
On 6/3/2016 9:28 AM, H. S. Teoh via Digitalmars-d wrote:
Eventually you have no choice but to encode by logical meaning
rather
than by appearance, since there are many lookalikes between
different
languages that actually mean somethin
On Friday, 3 June 2016 at 12:04:39 UTC, Chris wrote:
I do exactly this. Validate and normalize.
And once you've done this, auto decoding is useless because the
same character has the same representation anyway.
On 03/06/2016 20:12, Dmitry Olshansky wrote:
On 02-Jun-2016 23:27, Walter Bright wrote:
I wonder what rationale there is for Unicode to have two different
sequences of codepoints be treated as the same. It's madness.
Yeah, Unicode was not meant to be easy it seems. Or this is whatever
happen
On 6/3/2016 11:17 PM, H. S. Teoh via Digitalmars-d wrote:
On Fri, Jun 03, 2016 at 08:03:16PM -0700, Walter Bright via Digitalmars-d wrote:
It works for books.
Because books don't allow their readers to change the font.
Unicode is not the font.
This madness already exists *without* Unicode.
One has also to take into consideration that Unicode is the way
it is because it was not invented in an empty space. It had to
take consideration of the existing and find compromisses allowing
its adoption. Even if they had invented the perfect encoding, NO
ONE WOULD HAVE USED IT, as it would h
On Friday, 3 June 2016 at 20:53:32 UTC, H. S. Teoh wrote:
Even the Greek sigma has two forms depending on whether it's at
the end of a word or not -- so should it be two code points or
one? If you say two, then you'd have a problem with how to
search for sigma in Greek text, and you'd have to
On Fri, Jun 03, 2016 at 08:03:16PM -0700, Walter Bright via Digitalmars-d wrote:
> On 6/3/2016 6:08 PM, H. S. Teoh via Digitalmars-d wrote:
> > It's not a hard concept, except that these different letters have
> > lookalike forms with completely unrelated letters. Again:
> >
> > - Lowercase Latin
On 6/3/2016 6:08 PM, H. S. Teoh via Digitalmars-d wrote:
It's not a hard concept, except that these different letters have
lookalike forms with completely unrelated letters. Again:
- Lowercase Latin m looks visually the same as lowercase Cyrillic Т in
cursive form. In some font renderings the
On Saturday, 4 June 2016 at 02:46:31 UTC, Walter Bright wrote:
On 6/3/2016 5:42 PM, ketmar wrote:
sometimes used Cyrillic font to represent English.
Nobody here suggested using the wrong font, it's completely
irrelevant.
you suggested that unicode designers should make similar-looking
glyp
On 6/3/2016 5:42 PM, ketmar wrote:
sometimes used Cyrillic font to represent English.
Nobody here suggested using the wrong font, it's completely irrelevant.
On Fri, Jun 03, 2016 at 03:35:18PM -0700, Walter Bright via Digitalmars-d wrote:
> On 6/3/2016 1:53 PM, H. S. Teoh via Digitalmars-d wrote:
[...]
> > 'Cos by that argument, serif and sans serif letters should have
> > different encodings, because in languages like Hebrew, a tiny little
> > serif co
On Friday, 3 June 2016 at 18:43:07 UTC, Walter Bright wrote:
It's almost as if printed documents and books have never
existed!
some old xUSSR books which has some English text sometimes used
Cyrillic font to represent English. it was awful, and barely
readable. this was done to ease the work of
On Friday, 3 June 2016 at 22:38:38 UTC, Walter Bright wrote:
If a font choice changes the meaning then it is not a font.
Nah, then it is an Awesome Font that is totally Web Scale!
i wish i was making that up http://fontawesome.io/ i hate that
thing
But, it is kinda legal: gotta love the Uni
On 6/3/2016 2:10 PM, Jonathan M Davis via Digitalmars-d wrote:
Actually, I would argue that the moment that Unicode is concerned with what
the character actually looks like rather than what character it logically is
that it's gone outside of its charter. The way that characters actually look
is f
On 6/3/2016 1:53 PM, H. S. Teoh via Digitalmars-d wrote:
But if we were to encode appearance instead of logical meaning, that
would mean the *same* lowercase Cyrillic ь would have multiple,
different encodings depending on which font was in use.
I don't see that consequence at all.
That does
On Friday, June 03, 2016 03:08:43 Walter Bright via Digitalmars-d wrote:
> On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote:
> > At the time
> > Unicode also had to grapple with tricky issues like what to do with
> > lookalike characters that served different purposes or had different
> > me
On Fri, Jun 03, 2016 at 11:43:07AM -0700, Walter Bright via Digitalmars-d wrote:
> On 6/3/2016 9:28 AM, H. S. Teoh via Digitalmars-d wrote:
> > Eventually you have no choice but to encode by logical meaning
> > rather than by appearance, since there are many lookalikes between
> > different languag
On 6/3/2016 11:54 AM, Timon Gehr wrote:
On 03.06.2016 20:41, Walter Bright wrote:
How did people ever get by with printed books and documents?
They can disambiguate the letters based on context well enough.
Characters do not have semantic meaning. Their meaning is always inferred from
the co
On 02-Jun-2016 23:27, Walter Bright wrote:
On 6/2/2016 12:34 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote:
Pretty much everything. Consider s and s1 string variables with possibly
different encodings (UTF8/UTF16).
* s.all!(c => c == 'ö') works only w
On Friday, 3 June 2016 at 18:41:36 UTC, Walter Bright wrote:
How did people ever get by with printed books and documents?
Printed books pick one font and one layout, then is read by
people. It doesn't have to be represented in some format where
end users can change the font and size etc.
On 03.06.2016 20:41, Walter Bright wrote:
On 6/3/2016 3:14 AM, Vladimir Panteleev wrote:
That's not right either. Cyrillic letters can look slightly different
from their
latin lookalikes in some circumstances.
I'm sure there are extremely good reasons for not using the latin
lookalikes in
the C
On 6/3/2016 9:28 AM, H. S. Teoh via Digitalmars-d wrote:
Eventually you have no choice but to encode by logical meaning rather
than by appearance, since there are many lookalikes between different
languages that actually mean something completely different, and often
behaves completely differentl
On 6/3/2016 3:14 AM, Vladimir Panteleev wrote:
That's not right either. Cyrillic letters can look slightly different from their
latin lookalikes in some circumstances.
I'm sure there are extremely good reasons for not using the latin lookalikes in
the Cyrillic alphabets, because most (all?) 8-bi
On 6/3/2016 3:10 AM, Vladimir Panteleev wrote:
I don't think it would work (or at least, the analogy doesn't hold). It would
mean that you can't add new precomposited characters, because that means that
previously valid sequences are now invalid.
So don't add new precomposited characters when a
On Fri, Jun 03, 2016 at 10:14:15AM +, Vladimir Panteleev via Digitalmars-d
wrote:
> On Friday, 3 June 2016 at 10:08:43 UTC, Walter Bright wrote:
> > On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote:
> > > At the time Unicode also had to grapple with tricky issues like
> > > what to do w
On 06/02/2016 05:37 PM, Andrei Alexandrescu wrote:
On 6/2/16 5:35 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote:
On 6/2/16 5:20 PM, deadalnix wrote:
The good thing when you define works by whatever it does right now
No, it works as it was designed.
On Friday, 3 June 2016 at 11:46:50 UTC, Jonathan M Davis wrote:
On Friday, June 03, 2016 10:10:18 Vladimir Panteleev via
Digitalmars-d wrote:
On Friday, 3 June 2016 at 10:05:11 UTC, Walter Bright wrote:
> On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote:
>> However, this
>> meant that som
On Friday, June 03, 2016 10:10:18 Vladimir Panteleev via Digitalmars-d wrote:
> On Friday, 3 June 2016 at 10:05:11 UTC, Walter Bright wrote:
> > On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote:
> >> However, this
> >> meant that some precomposed characters were "redundant": they
> >> repres
On Friday, 3 June 2016 at 10:08:43 UTC, Walter Bright wrote:
On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote:
At the time
Unicode also had to grapple with tricky issues like what to do
with
lookalike characters that served different purposes or had
different
meanings, e.g., the mu sign
On Friday, 3 June 2016 at 10:05:11 UTC, Walter Bright wrote:
On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote:
However, this
meant that some precomposed characters were "redundant": they
represented character + diacritic combinations that could
equally well
be expressed separately. Norma
On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote:
At the time
Unicode also had to grapple with tricky issues like what to do with
lookalike characters that served different purposes or had different
meanings, e.g., the mu sign in the math block vs. the real letter mu in
the Greek block, or
On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote:
However, this
meant that some precomposed characters were "redundant": they
represented character + diacritic combinations that could equally well
be expressed separately. Normalization was the inevitable consequence.
It is not inevitable
On Thursday, June 02, 2016 15:05:44 Andrei Alexandrescu via Digitalmars-d
wrote:
> The intent of autodecoding was to make std.algorithm work meaningfully
> with strings. As it's easy to see I just went through
> std.algorithm.searching alphabetically and found issues literally with
> every primiti
On Thu, Jun 02, 2016 at 05:19:48PM -0700, Walter Bright via Digitalmars-d wrote:
> On 6/2/2016 3:27 PM, John Colvin wrote:
> > > I wonder what rationale there is for Unicode to have two different
> > > sequences of codepoints be treated as the same. It's madness.
> >
> > There are languages that m
Am Thu, 2 Jun 2016 18:54:21 -0400
schrieb Andrei Alexandrescu :
> On 06/02/2016 06:10 PM, Marco Leise wrote:
> > Am Thu, 2 Jun 2016 15:05:44 -0400
> > schrieb Andrei Alexandrescu :
> >
> >> On 06/02/2016 01:54 PM, Marc Schütz wrote:
> >>> Which practical tasks are made possible (and work _corr
On Thu, Jun 02, 2016 at 04:29:48PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
> On 06/02/2016 04:22 PM, cym13 wrote:
> >
> > A:“We should decode to code points”
> > B:“No, decoding to code points is a stupid idea.”
> > A:“No it's not!”
> > B:“Can you show a concrete example where it does
On Thu, Jun 02, 2016 at 04:28:45PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
> On 06/02/2016 04:17 PM, Timon Gehr wrote:
> > I.e. you are saying that 'works' means 'operates on code points'.
>
> Affirmative. -- Andrei
Again, a ridiculous position. I can use exactly the same line of
ar
On Thu, Jun 02, 2016 at 04:38:28PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
> On 06/02/2016 04:36 PM, tsbockman wrote:
> > Your examples will pass or fail depending on how (and whether) the
> > 'ö' grapheme is normalized.
>
> And that's fine. Want graphemes, .byGrapheme wags its tail i
On Thursday, 2 June 2016 at 21:00:17 UTC, tsbockman wrote:
However, this document is very old - from Unicode 3.0 and the
year 2000:
While there are no surrogate characters in Unicode 3.0
(outside of private use characters), future versions of
Unicode will contain them...
Perhaps level 1 has
On 6/2/2016 3:27 PM, John Colvin wrote:
I wonder what rationale there is for Unicode to have two different sequences
of codepoints be treated as the same. It's madness.
There are languages that make heavy use of diacritics, often several on a single
"character". Hebrew is a good example. Should
On 6/2/2016 2:25 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote:
I wonder what rationale there is for Unicode to have two different sequences
of codepoints be treated as the same. It's madness.
To be able to convert back and forth from/to unicode in a lossles
On 6/2/2016 4:29 PM, Jonathan M Davis via Digitalmars-d wrote:
How do you suggest that we handle the normalization issue?
Started a new thread for that one.
On Thursday, June 02, 2016 15:48:03 Walter Bright via Digitalmars-d wrote:
> On 6/2/2016 3:23 PM, Andrei Alexandrescu wrote:
> > On 06/02/2016 05:58 PM, Walter Bright wrote:
> >> > * s.balancedParens('〈', '〉') works only with autodecoding.
> >> > * s.canFind('ö') works only with autodecoding. It
On Thursday, June 02, 2016 22:27:16 John Colvin via Digitalmars-d wrote:
> On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote:
> > I wonder what rationale there is for Unicode to have two
> > different sequences of codepoints be treated as the same. It's
> > madness.
>
> There are langua
On Thursday, June 02, 2016 18:23:19 Andrei Alexandrescu via Digitalmars-d
wrote:
> On 06/02/2016 05:58 PM, Walter Bright wrote:
> > On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote:
> >> The lambda returns bool. -- Andrei
> >
> > Yes, I was wrong about that. But the point still stands with:
> > > *
On Thursday, 2 June 2016 at 21:56:10 UTC, Walter Bright wrote:
Yes, you have a good point. But we do allow things like:
byte b;
if (b == 1) ...
Why allowing char/wchar/dchar comparisons is wrong:
void main()
{
string s = "Привет";
foreach (c; s)
assert(c != 'Ñ');
}
On 06/02/2016 06:10 PM, Marco Leise wrote:
Am Thu, 2 Jun 2016 15:05:44 -0400
schrieb Andrei Alexandrescu :
On 06/02/2016 01:54 PM, Marc Schütz wrote:
Which practical tasks are made possible (and work _correctly_) if you
decode to code points, that don't already work with code units?
Pretty m
On 6/2/2016 3:23 PM, Andrei Alexandrescu wrote:
On 06/02/2016 05:58 PM, Walter Bright wrote:
> * s.balancedParens('〈', '〉') works only with autodecoding.
> * s.canFind('ö') works only with autodecoding. It returns always
false without.
Can be made to work without autodecoding.
By special ca
On 03.06.2016 00:23, Andrei Alexandrescu wrote:
On 06/02/2016 05:58 PM, Walter Bright wrote:
On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote:
The lambda returns bool. -- Andrei
Yes, I was wrong about that. But the point still stands with:
> * s.balancedParens('〈', '〉') works only with autode
On 03.06.2016 00:26, Walter Bright wrote:
On 6/2/2016 3:11 PM, Timon Gehr wrote:
Well, this is a somewhat different case, because 1 is just not
representable
as a byte. Every value that fits in a byte fits in an int though.
It's different for code units. They are incompatible both ways.
N
On Thursday, 2 June 2016 at 22:20:49 UTC, Walter Bright wrote:
On 6/2/2016 2:05 PM, tsbockman wrote:
Presumably if someone marks their own
PR as "do not merge", it means they're planning to either
close it themselves
after it has served its purpose, or they plan to fix/finish it
and then remov
On 6/2/2016 3:10 PM, Marco Leise wrote:
we haven't looked into borrowing/scoped enough
That's my fault.
As for scoped, the idea is to make scope work analogously to DIP25's 'return
ref'. I don't believe we need borrowing, we've worked out another solution that
will work for ref counting.
P
On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote:
On 6/2/2016 12:34 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu
wrote:
Pretty much everything. Consider s and s1 string variables
with possibly
different encodings (UTF8/UTF16).
* s.all!(c => c
On 6/2/2016 3:11 PM, Timon Gehr wrote:
Well, this is a somewhat different case, because 1 is just not representable
as a byte. Every value that fits in a byte fits in an int though.
It's different for code units. They are incompatible both ways.
Not exactly. (c == 'ö') is always false for
On 06/02/2016 05:58 PM, Walter Bright wrote:
On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote:
The lambda returns bool. -- Andrei
Yes, I was wrong about that. But the point still stands with:
> * s.balancedParens('〈', '〉') works only with autodecoding.
> * s.canFind('ö') works only with autod
On 6/2/2016 2:05 PM, tsbockman wrote:
On Thursday, 2 June 2016 at 20:56:26 UTC, Walter Bright wrote:
What is supposed to be done with "do not merge" PRs other than close them?
Occasionally people need to try something on the auto tester (not sure if that's
relevant to that particular PR, thoug
On 02.06.2016 23:29, Andrei Alexandrescu wrote:
On 6/2/16 5:23 PM, Timon Gehr wrote:
On 02.06.2016 22:51, Andrei Alexandrescu wrote:
On 06/02/2016 04:50 PM, Timon Gehr wrote:
On 02.06.2016 22:28, Andrei Alexandrescu wrote:
On 06/02/2016 04:12 PM, Timon Gehr wrote:
It is not meaningful to com
On Thursday, 2 June 2016 at 22:03:01 UTC, default0 wrote:
*sigh* reading comprehension.
...
Please do not take what I say out of context, thank you.
Earlier you said:
The level 2 support description noted that it should be opt-in
because its slow.
My main point is simply that you mischaract
Am Thu, 2 Jun 2016 15:05:44 -0400
schrieb Andrei Alexandrescu :
> On 06/02/2016 01:54 PM, Marc Schütz wrote:
> > Which practical tasks are made possible (and work _correctly_) if you
> > decode to code points, that don't already work with code units?
>
> Pretty much everything.
>
> s.all!(c =>
On 02.06.2016 23:56, Walter Bright wrote:
On 6/2/2016 1:12 PM, Timon Gehr wrote:
...
It is not
meaningful to compare utf-8 and utf-16 code units directly.
Yes, you have a good point. But we do allow things like:
byte b;
if (b == 1) ...
Well, this is a somewhat different case, b
On Thursday, 2 June 2016 at 21:51:51 UTC, tsbockman wrote:
On Thursday, 2 June 2016 at 21:38:02 UTC, default0 wrote:
On Thursday, 2 June 2016 at 21:30:51 UTC, tsbockman wrote:
1) It does not say that level 2 should be opt-in; it says
that level 2 should be toggle-able. Nowhere does it say which
On 02.06.2016 23:46, Andrei Alexandrescu wrote:
On 6/2/16 5:43 PM, Timon Gehr wrote:
.̂ ̪.̂
(Copy-paste it somewhere else, I think it might not be rendered
correctly on the forum.)
The point is that if I do:
".̂ ̪.̂".normalize!NFC.byGrapheme.findAmong([Grapheme("."),Grapheme(",")])
no match
On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote:
The lambda returns bool. -- Andrei
Yes, I was wrong about that. But the point still stands with:
> * s.balancedParens('〈', '〉') works only with autodecoding.
> * s.canFind('ö') works only with autodecoding. It returns always false
without.
Can
On 6/2/2016 1:12 PM, Timon Gehr wrote:
On 02.06.2016 22:07, Walter Bright wrote:
On 6/2/2016 12:05 PM, Andrei Alexandrescu wrote:
* s.all!(c => c == 'ö') works only with autodecoding. It returns
always false
without.
The o is inferred as a wchar. The lamda then is inferred to return a
wchar.
On Thursday, 2 June 2016 at 21:38:02 UTC, default0 wrote:
On Thursday, 2 June 2016 at 21:30:51 UTC, tsbockman wrote:
1) It does not say that level 2 should be opt-in; it says that
level 2 should be toggle-able. Nowhere does it say which of
level 1 and 2 should be the default.
2) It says that
On 6/2/16 5:43 PM, Timon Gehr wrote:
.̂ ̪.̂
(Copy-paste it somewhere else, I think it might not be rendered
correctly on the forum.)
The point is that if I do:
".̂ ̪.̂".normalize!NFC.byGrapheme.findAmong([Grapheme("."),Grapheme(",")])
no match is returned.
If I use your method with dchars,
On 6/2/16 5:38 PM, cym13 wrote:
Allow me to try another angle:
- There are different levels of unicode support and you don't want to
support them all transparently. That's understandable.
Cool.
- The level you choose to support is the code point level. There are
many good arguments about why
On 02.06.2016 23:23, Andrei Alexandrescu wrote:
On 6/2/16 5:19 PM, Timon Gehr wrote:
On 02.06.2016 23:16, Timon Gehr wrote:
On 02.06.2016 23:06, Andrei Alexandrescu wrote:
As the examples show, the examples would be entirely meaningless at
code
unit level.
So far, I needed to count the numbe
On 6/2/16 5:38 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 21:37:11 UTC, Andrei Alexandrescu wrote:
On 6/2/16 5:35 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote:
On 6/2/16 5:20 PM, deadalnix wrote:
The good thing when you define works by whateve
On 6/2/16 5:37 PM, Andrei Alexandrescu wrote:
On 6/2/16 5:35 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote:
On 6/2/16 5:20 PM, deadalnix wrote:
The good thing when you define works by whatever it does right now
No, it works as it was designed. -- An
On Thursday, 2 June 2016 at 20:29:48 UTC, Andrei Alexandrescu
wrote:
On 06/02/2016 04:22 PM, cym13 wrote:
A:“We should decode to code points”
B:“No, decoding to code points is a stupid idea.”
A:“No it's not!”
B:“Can you show a concrete example where it does something
useful?”
A:“Sure, look at
On Thursday, 2 June 2016 at 21:37:11 UTC, Andrei Alexandrescu
wrote:
On 6/2/16 5:35 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu
wrote:
On 6/2/16 5:20 PM, deadalnix wrote:
The good thing when you define works by whatever it does
right now
No, it works as
On Thursday, 2 June 2016 at 21:30:51 UTC, tsbockman wrote:
On Thursday, 2 June 2016 at 21:07:19 UTC, default0 wrote:
The level 2 support description noted that it should be opt-in
because its slow.
1) It does not say that level 2 should be opt-in; it says that
level 2 should be toggle-able. N
On 6/2/16 5:35 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote:
On 6/2/16 5:20 PM, deadalnix wrote:
The good thing when you define works by whatever it does right now
No, it works as it was designed. -- Andrei
Nobody says it doesn't. Everybody says t
On 6/2/16 5:35 PM, ag0aep6g wrote:
On 06/02/2016 11:27 PM, Andrei Alexandrescu wrote:
On 6/2/16 5:24 PM, ag0aep6g wrote:
On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote:
Nope, that's a radically different matter. As the examples show, the
examples would be entirely meaningless at code unit l
On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu
wrote:
On 6/2/16 5:20 PM, deadalnix wrote:
The good thing when you define works by whatever it does right
now
No, it works as it was designed. -- Andrei
Nobody says it doesn't. Everybody says the design is crap.
On 06/02/2016 11:27 PM, Andrei Alexandrescu wrote:
On 6/2/16 5:24 PM, ag0aep6g wrote:
On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote:
Nope, that's a radically different matter. As the examples show, the
examples would be entirely meaningless at code unit level.
They're simply not possible.
On Thursday, 2 June 2016 at 21:07:19 UTC, default0 wrote:
The level 2 support description noted that it should be opt-in
because its slow.
1) It does not say that level 2 should be opt-in; it says that
level 2 should be toggle-able. Nowhere does it say which of level
1 and 2 should be the def
On 6/2/16 5:27 PM, Andrei Alexandrescu wrote:
On 6/2/16 5:24 PM, ag0aep6g wrote:
Just like there is no single code point for 'a⃗' so you can't
search for it in a range of code points.
Of course you can.
Correx, indeed you can't. -- Andrei
On 02.06.2016 22:51, Andrei Alexandrescu wrote:
On 06/02/2016 04:50 PM, Timon Gehr wrote:
On 02.06.2016 22:28, Andrei Alexandrescu wrote:
On 06/02/2016 04:12 PM, Timon Gehr wrote:
It is not meaningful to compare utf-8 and utf-16 code units directly.
But it is meaningful to compare Unicode co
On 6/2/16 5:20 PM, deadalnix wrote:
The good thing when you define works by whatever it does right now
No, it works as it was designed. -- Andrei
On 6/2/16 5:23 PM, Timon Gehr wrote:
On 02.06.2016 22:51, Andrei Alexandrescu wrote:
On 06/02/2016 04:50 PM, Timon Gehr wrote:
On 02.06.2016 22:28, Andrei Alexandrescu wrote:
On 06/02/2016 04:12 PM, Timon Gehr wrote:
It is not meaningful to compare utf-8 and utf-16 code units directly.
But
On 02.06.2016 23:20, deadalnix wrote:
The sample code won't count the instance of the grapheme 'ö' as some of
its encoding won't be counted, which definitively count as doesn't work.
It also has false positives (you can combine 'ö' with some combining
character in order to get some strange ch
On 6/2/16 5:24 PM, ag0aep6g wrote:
On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote:
Nope, that's a radically different matter. As the examples show, the
examples would be entirely meaningless at code unit level.
They're simply not possible. Won't compile.
They do compile.
There is no sin
On 06/02/2016 11:24 PM, ag0aep6g wrote:
They're simply not possible. Won't compile. There is no single UTF-8
code unit for 'ö', so you can't (easily) search for it in a range for
code units. Just like there is no single code point for 'a⃗' so you can't
search for it in a range of code points.
Yo
On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote:
On 6/2/2016 12:34 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu
wrote:
Pretty much everything. Consider s and s1 string variables
with possibly
different encodings (UTF8/UTF16).
* s.all!(c => c
On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote:
Nope, that's a radically different matter. As the examples show, the
examples would be entirely meaningless at code unit level.
They're simply not possible. Won't compile. There is no single UTF-8
code unit for 'ö', so you can't (easily) searc
On 6/2/16 5:19 PM, Timon Gehr wrote:
On 02.06.2016 23:16, Timon Gehr wrote:
On 02.06.2016 23:06, Andrei Alexandrescu wrote:
As the examples show, the examples would be entirely meaningless at code
unit level.
So far, I needed to count the number of characters 'ö' inside some
string exactly ze
On Thursday, 2 June 2016 at 20:13:52 UTC, Andrei Alexandrescu
wrote:
On 06/02/2016 03:34 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu
wrote:
Pretty much everything. Consider s and s1 string variables
with
possibly different encodings (UTF8/UTF16).
* s.all
On 02.06.2016 23:16, Timon Gehr wrote:
On 02.06.2016 23:06, Andrei Alexandrescu wrote:
As the examples show, the examples would be entirely meaningless at code
unit level.
So far, I needed to count the number of characters 'ö' inside some
string exactly zero times,
(Obviously this isn't even
On 02.06.2016 23:06, Andrei Alexandrescu wrote:
As the examples show, the examples would be entirely meaningless at code
unit level.
So far, I needed to count the number of characters 'ö' inside some
string exactly zero times, but I wanted to chain or join strings
relatively often.
On 6/2/16 5:05 PM, tsbockman wrote:
On Thursday, 2 June 2016 at 20:56:26 UTC, Walter Bright wrote:
What is supposed to be done with "do not merge" PRs other than close
them?
Occasionally people need to try something on the auto tester (not sure
if that's relevant to that particular PR, though)
On Thursday, 2 June 2016 at 20:52:29 UTC, ag0aep6g wrote:
On 06/02/2016 10:36 PM, Andrei Alexandrescu wrote:
By whom? The "support level 1" folks yonder at the Unicode
standard? :o)
-- Andrei
Do they say that level 1 should be the default, and do they
give a rationale for that? Would you kin
On Thursday, 2 June 2016 at 20:56:26 UTC, Walter Bright wrote:
What is supposed to be done with "do not merge" PRs other than
close them?
Occasionally people need to try something on the auto tester (not
sure if that's relevant to that particular PR, though).
Presumably if someone marks their
On 6/2/16 5:01 PM, ag0aep6g wrote:
On 06/02/2016 10:50 PM, Andrei Alexandrescu wrote:
It does not fall apart for code points.
Yes it does. You've been given plenty examples where it falls apart.
There weren't any.
Your answer to that was that it operates on code points, not graphemes.
Th
1 - 100 of 441 matches
Mail list logo