On Sun, Mar 20, 2016, at 10:55, Ben Bacarisse wrote:
> It's 21. The reason being (or at least part of the reason being) that
> 21 bits can be UTF-8 encoded in 4 bytes: 0xxx 10xx 10xx
> 10xx (3 + 3*6).
The reason is the UTF-16 limit. Prior to that, UTF-8 had no such limit
(it
Ben Bacarisse :
> It's 21. The reason being (or at least part of the reason being) that
> 21 bits can be UTF-8 encoded in 4 bytes: 0xxx 10xx 10xx
> 10xx (3 + 3*6).
I bet the reason is UTF-16. Microsoft and Sun/Oracle would have insisted
on a maximum of 4
Rustom Mody writes:
> On Sunday, March 20, 2016 at 10:32:07 AM UTC+5:30, Steven D'Aprano wrote:
>> Unicode (the character set part of it) is a set of abstract 23-bit numbers,
>
> 23? Or 21?
It's 21. The reason being (or at least part of the reason being) that
21 bits
On Sun, Mar 20, 2016 at 11:14 PM, Steven D'Aprano wrote:
>>> On the other hand, I believe that the output of the UTF transformations
>>> is explicitly described in terms of 8-bit bytes and 16- or 32-bit words.
>>> For instance, the UTF-8 encoding of "A" has to be a single
On Sun, 20 Mar 2016 10:22 pm, Chris Angelico wrote:
> On Sun, Mar 20, 2016 at 10:06 PM, Steven D'Aprano
> wrote:
>> The Unicode standard does not, as far as I am aware, care how you
>> represent code points in memory, only that there are 0x11 of them,
>> numbered from
On Sun, Mar 20, 2016 at 10:06 PM, Steven D'Aprano wrote:
> The Unicode standard does not, as far as I am aware, care how you represent
> code points in memory, only that there are 0x11 of them, numbered from
> U+ to U+10. That's what I mean by abstract. The
Chris Angelico :
> Like every language *including* English. You can pretend that ASCII is
> enough, but you do lose some information.
Hold it, I'll quickly update my résumé before we resume the
conversation. What does this exposé expose? At least it gives a coup de
grâce to
On Sun, 20 Mar 2016 05:20 pm, Rustom Mody wrote:
> On Sunday, March 20, 2016 at 10:32:07 AM UTC+5:30, Steven D'Aprano wrote:
>> On Sun, 20 Mar 2016 03:12 am, Marko Rauhamaa wrote:
>>
>> > Steven D'Aprano :
>> >
>> >> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote:
>> >>> Yes, but UTF-16
On 17/03/2016 21:26, BartC wrote:
On 17/03/2016 21:11, Marko Rauhamaa wrote:
Chris Angelico :
Like every language *including* English. You can pretend that ASCII is
enough, but you do lose some information.
Hold it, I'll quickly update my résumé before we resume the
Chris Angelico writes:
> You can pretend that only 1 and 0 are enough. Good luck making THAT work.
YOU had ONES??? Back in the day, my folks had to do everything with
just zeros.
--
https://mail.python.org/mailman/listinfo/python-list
On Fri, 18 Mar 2016 10:46 pm, Steven D'Aprano wrote:
> I think it is typical of JMF that his idea of a language where Unicode
> "just works" is one where it *does work at all* (at least not as strings).
Er, does NOT work at all.
> Python 1.5 strings supported Unicode just as well as Go's string
Steven D'Aprano :
> On Sun, 20 Mar 2016 03:12 am, Marko Rauhamaa wrote:
>> Steven D'Aprano :
>>> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote:
Yes, but UTF-16 produces 16-bit values that are outside Unicode.
>>>
>>> Show me.
>>>
>>> Before you
On Sunday, March 20, 2016 at 10:32:07 AM UTC+5:30, Steven D'Aprano wrote:
> On Sun, 20 Mar 2016 03:12 am, Marko Rauhamaa wrote:
>
> > Steven D'Aprano :
> >
> >> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote:
> >>> Yes, but UTF-16 produces 16-bit values that are outside Unicode.
> >>
> >>
On Sun, 20 Mar 2016 03:12 am, Marko Rauhamaa wrote:
> Steven D'Aprano :
>
>> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote:
>>> Yes, but UTF-16 produces 16-bit values that are outside Unicode.
>>
>> Show me.
>>
>> Before you answer, if your answer is "surrogate pairs",
On Thursday, March 17, 2016 at 7:34:46 AM UTC-7, wxjm...@gmail.com wrote:
> Very simple. Use Python and its (buggy) character encoding
> model.
>
> How to save memory?
> It's also very simple. Use a programming language, which
> handles Unicode correctly.
*looks at the other messages in this
On Fri, Mar 18, 2016, at 11:17, Ian Kelly wrote:
> > Just to play devil's advocate, here, why is it so bad for indexing to be
> > O(n)? Some simple caching is all that's needed to prevent it from making
> > iteration O(n^2), if that's what you're worried about.
>
> What kind of caching do you
On Fri, Mar 18, 2016 at 8:08 AM, Grant Edwards wrote:
> On 2016-03-17, Chris Angelico wrote:
>> On Fri, Mar 18, 2016 at 7:31 AM, wrote:
>>> Rick Johnson wrote:
In the event that i change my mind
On Sat, 19 Mar 2016 02:31 am, Random832 wrote:
> On Fri, Mar 18, 2016, at 11:17, Ian Kelly wrote:
>> > Just to play devil's advocate, here, why is it so bad for indexing to
>> > be O(n)? Some simple caching is all that's needed to prevent it from
>> > making iteration O(n^2), if that's what
On 3/18/2016 7:58 AM, Steven D'Aprano wrote:
On Fri, 18 Mar 2016 10:46 pm, Steven D'Aprano wrote:
I think it is typical of JMF that his idea of a language where Unicode
"just works" is one where it *does work at all* (at least not as strings).
Er, does NOT work at all.
Python 1.5 strings
On 17/03/2016 21:13, Chris Angelico wrote:
You can pretend that only 1 and 0 are enough. Good luck making THAT work.
ChrisA
The sales and marketing "thing", for lack of a better expression, that
was used in the UK by Racal Telecommunications during the 1990s. Well
I'm telling a fib, IIRC
On 18/03/2016 21:02, Marko Rauhamaa wrote:
Chris Angelico :
On Sat, Mar 19, 2016 at 2:26 AM, Marko Rauhamaa wrote:
It may be that Python's Unicode abstraction is an untenable illusion
because the underlying reality is 8-bit and there's no way to hide it
gt; >>> starting with the best first. Thanks.
> > >>
> > >> How about a list of languages that Unicode handles better than
> > >> ASCII? Like almost every language *except* English.
> > >
> > > Like every language *including* English. You ca
Chris Angelico :
> On Sat, Mar 19, 2016 at 2:26 AM, Marko Rauhamaa wrote:
>> It may be that Python's Unicode abstraction is an untenable illusion
>> because the underlying reality is 8-bit and there's no way to hide it
>> completely.
>
> The underlying reality
Grant Edwards wrote:
> On 2016-03-17, Chris Angelico wrote:
> > On Fri, Mar 18, 2016 at 7:31 AM, wrote:
> >> Rick Johnson wrote:
> >>>
> >>> In the event that i change my mind about Unicode, and/or for
>
On Fri, Mar 18, 2016, at 15:46, Tim Golden wrote:
> Speaking for a moment as the list owner. Posts by this OP are usually
> blatant provocation and I usually filter them out before they hit the
> list. (They'll still appear if you're reading via Usenet). In this case
> I approved a post
On 2016-03-17, Chris Angelico wrote:
> On Fri, Mar 18, 2016 at 7:31 AM, wrote:
>> Rick Johnson wrote:
>>>
>>> In the event that i change my mind about Unicode, and/or for
>>> the sake of others, who may want to know, please provide
Chris Angelico :
> The problem is not Python's Unicode strings, then. The problem is the
> notion that path names are text. If they're text, they should be
> exclusively text (although, for low-level efficiency, they're more
> likely to be defined as "valid UTF-8 sequences"
On Thursday, March 17, 2016 at 9:34:46 AM UTC-5, wxjm...@gmail.com wrote:
> Very simple. Use Python and its (buggy) character encoding
> model. How to save memory? It's also very simple. Use a
> programming language, which handles Unicode correctly.
I personally don't have much use for Unicode,
On 17/03/2016 21:11, Marko Rauhamaa wrote:
Chris Angelico :
Like every language *including* English. You can pretend that ASCII is
enough, but you do lose some information.
Hold it, I'll quickly update my résumé before we resume the
conversation. What does this exposé
Chris Angelico :
> On Sat, Mar 19, 2016 at 6:49 PM, Marko Rauhamaa wrote:
>> Speaking of the low level, the classic UNIX file system doesn't make
>> use of pathnames. Rather, the files are nameless. They are identified
>> by the device (= file system) number
Michael Torrie :
> On 03/18/2016 02:26 AM, Jussi Piitulainen wrote:
>> I think Julia's way of dealing with its strings-as-UTF-8 [2] is more
>> promising. Indexing is by bytes (1-based in Julia) but the value at a
>> valid index is the whole UTF-8 character at that point, and an
Like almost every language *except* English.
> > > >
> > > > Like every language *including* English. You can pretend that
> > > > ASCII is enough, but you do lose some information.
> > > >
> > > > ChrisA
> > >
> > > as we
On Fri, Mar 18, 2016 at 8:26 AM, BartC wrote:
> On 17/03/2016 21:11, Marko Rauhamaa wrote:
>>
>> Chris Angelico :
>>
>>> Like every language *including* English. You can pretend that ASCII is
>>> enough, but you do lose some information.
>>
>>
>> Hold it, I'll
On Sun, Mar 20, 2016 at 3:12 AM, Marko Rauhamaa wrote:
> Steven D'Aprano :
>
>> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote:
>>> Yes, but UTF-16 produces 16-bit values that are outside Unicode.
>>
>> Show me.
>>
>> Before you answer, if your answer is
Steven D'Aprano :
> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote:
>> Yes, but UTF-16 produces 16-bit values that are outside Unicode.
>
> Show me.
>
> Before you answer, if your answer is "surrogate pairs", that is
> incorrect. Surrogate pairs is how UTF-16 encodes
On Sun, Mar 20, 2016 at 2:05 AM, Michael Torrie wrote:
> Of course not. Shells already associate specific meaning with certain
> characters that can be used in file names. For example the various
> quoting characters, such as ' or ". These can be used in file names but
> when
On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote:
> Steven D'Aprano :
>
>> On Sat, 19 Mar 2016 08:31 pm, Marko Rauhamaa wrote:
>>
>>
>>>Using the surrogate mechanism, UTF-16 can support all 1,114,112
>>>potential Unicode characters.
>>>
>>> But Unicode doesn't
On 19/03/2016 15:14, BartC wrote:
Which is about 3000 decimal digits, slightly more than 1KB in packed
binary. In BCD it would be 1.5KB. At one-byte per digit (eg. ASCII) it's
3KB. At 4 bytes per (eg. UCS4), it's 12KB.
The comment refers to this which inexplicably got snipped (not my fault
On Sun, Mar 20, 2016 at 1:56 AM, Marko Rauhamaa wrote:
> Steven D'Aprano :
>
>> On Sat, 19 Mar 2016 11:42 pm, Marko Rauhamaa wrote:
>>> When glorifying Python's advanced Unicode capabilities, are we
>>> careful to emphasize the necessity of
On 19/03/2016 14:18, Steven D'Aprano wrote:
On Sat, 19 Mar 2016 11:24 pm, BartC wrote about combining characters:
And occupy somewhere between 50 and 200 bytes? Or is that 400?
OK...
You say that as if 400 bytes was a lot.
No, just unpredictable.
Besides, this is hardly any different
On 03/19/2016 02:38 AM, Steven D'Aprano wrote:
> On Sat, 19 Mar 2016 01:30 pm, Random832 wrote:
>
>> On Fri, Mar 18, 2016, at 20:55, Chris Angelico wrote:
>>> On Sat, Mar 19, 2016 at 9:03 AM, Marko Rauhamaa wrote:
Also, special-casing '\0' and '/' is
lame. Why can't I
Steven D'Aprano :
> On Sat, 19 Mar 2016 08:31 pm, Marko Rauhamaa wrote:
>
>
>>Using the surrogate mechanism, UTF-16 can support all 1,114,112
>>potential Unicode characters.
>>
>> But Unicode doesn't contain 1,114,112 characters—the surrogates are
>> excluded from
On Sat, 19 Mar 2016 08:31 pm, Marko Rauhamaa wrote:
>Using the surrogate mechanism, UTF-16 can support all 1,114,112
>potential Unicode characters.
>
> But Unicode doesn't contain 1,114,112 characters—the surrogates are
> excluded from Unicode, and definitely cannot be encoded using
>
Steven D'Aprano :
> On Sat, 19 Mar 2016 11:42 pm, Marko Rauhamaa wrote:
>> When glorifying Python's advanced Unicode capabilities, are we
>> careful to emphasize the necessity of unicodedata.normalize()
>> everywhere? Should Python normalize strings unconditionally and
>>
On 2016-03-19 12:24, BartC wrote:
> So a string that looks like:
>
> "ññ"
>
> can have 2**50 different representations? And occupy somewhere
> between 50 and 200 bytes? Or is that 400?
And moreover, they're all distinct if you don't normalize
On Sat, 19 Mar 2016 11:42 pm, Marko Rauhamaa wrote:
> The problem is not theoretical. If I implement a web form and someone
> enters "Aña" as their name, how do I make sure queries find the name
> regardless of the unicode code point sequence? I have to normalize using
> unicodedata.normalize().
On Sat, 19 Mar 2016 11:24 pm, BartC wrote about combining characters:
> So a string that looks like:
>
> "ññ"
>
> can have 2**50 different representations?
Yes.
> And occupy somewhere between 50 and 200 bytes? Or is that 400?
The minimum
Steven D'Aprano writes:
> And I don't understand this meme that indexing strings is not
> important. Have people never (say) taken a slice of a string, or a
> look-ahead, or something similar?
>
> i = mystring.find(":")
> next_char = mystring[i+1]
The point is that O(1) indexing and slicing
On Sat, Mar 19, 2016 at 11:42 PM, Marko Rauhamaa wrote:
>> The problem is not so much the existence of combining characters, but that
>> *some* but not all accented characters are available in two forms: a
>> composed single code point, and a decomposed pair of code points.
>
>
On 2016-03-18, c...@isbd.net wrote:
> However I doubt it's still being used, a year or two after I wrote it
> we migrated to a Tektronix development system that ran Unix (wow!).
The PDP-11 one that ran TNIX (a thinly disguised port of v7)? Back in
the early 80's we used a copule
BartC :
> So a string that looks like:
>
> "ññ"
>
> can have 2**50 different representations? And occupy somewhere between
> 50 and 200 bytes? Or is that 400?
>
> OK...
You are on the right track!
Marko
--
Steven D'Aprano :
> As usual, Unicode problems are generally due to backwards
> compatibility. Blame the old legacy encodings, which invented the
> "dead keys" a.k.a. "combining character" technique. Of course, they
> had a reasonable excuse at the time, but Unicode's
On 19/03/2016 11:07, Marko Rauhamaa wrote:
Chris Angelico :
On Sat, Mar 19, 2016 at 8:31 PM, Marko Rauhamaa wrote:
Unicode made several (understandable but grave) mistakes along the way:
* normalization
Elaborate please? What's such a big mistake
On Fri, Mar 18, 2016, at 12:44, Steven D'Aprano wrote:
> And I don't understand this meme that indexing strings is not important.
> Have people never (say) taken a slice of a string, or a look-ahead, or
> something similar?
>
> i = mystring.find(":")
find is already O(N).
> next_char =
On Fri, 18 Mar 2016 06:00 pm, Ian Kelly wrote:
> On Thu, Mar 17, 2016 at 1:21 PM, Rick Johnson
> wrote:
>> In the event that i change my mind about Unicode, and/or for
>> the sake of others, who may want to know, please provide a
>> list of languages that *YOU*
Rick Johnson wrote:
>
> In the event that i change my mind about Unicode, and/or for
> the sake of others, who may want to know, please provide a
> list of languages that *YOU* think handle Unicode better than
> Python, starting with the best first. Thanks.
>
How
On Sat, 19 Mar 2016 09:18 pm, Chris Angelico wrote:
> On Sat, Mar 19, 2016 at 8:31 PM, Marko Rauhamaa wrote:
>> Unicode made several (understandable but grave) mistakes along the way:
>>
>>* normalization
>>
>
> Elaborate please? What's such a big mistake here?
As usual,
On Fri, Mar 18, 2016 at 7:31 AM, wrote:
> Rick Johnson wrote:
>>
>> In the event that i change my mind about Unicode, and/or for
>> the sake of others, who may want to know, please provide a
>> list of languages that *YOU* think handle Unicode better
every language *except* English.
>
> Like every language *including* English. You can pretend that ASCII is
> enough, but you do lose some information.
>
> ChrisA
as we all seam to have bitten the troll's thread
"how to waste computer memory"
give it to an delusion
On Fri, Mar 18, 2016, at 10:59, Michael Torrie wrote:
> This seems to me to be a leaky abstraction. Julia's approach is
> interesting, but it strikes me as somewhat broken as it pretends to do
> O(1) indexing, but in reality it's still O(n) because you still have to
> iterate through the bytes
Chris Angelico :
> On Sat, Mar 19, 2016 at 8:31 PM, Marko Rauhamaa wrote:
>> Unicode made several (understandable but grave) mistakes along the way:
>>
>>* normalization
>
> Elaborate please? What's such a big mistake here?
Unicode shouldn't have allowed
On Sat, Mar 19, 2016 at 8:02 AM, Marko Rauhamaa wrote:
> Chris Angelico :
>> On Sat, Mar 19, 2016 at 2:26 AM, Marko Rauhamaa wrote:
>>> It may be that Python's Unicode abstraction is an untenable illusion
>>> because the underlying reality is
On Sat, Mar 19, 2016 at 8:31 PM, Marko Rauhamaa wrote:
> Unicode made several (understandable but grave) mistakes along the way:
>
>* normalization
>
Elaborate please? What's such a big mistake here?
ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Chris Angelico :
> On Sat, Mar 19, 2016 at 7:22 PM, Marko Rauhamaa wrote:
>> Not all files have pathnames. Those that do have numerous pathnames. You
>> can't tell by looking at a file what pathnames, if any, it might have.
>> You need an exhaustive, recursive
On 19/03/2016 04:05, Ian Kelly wrote:
On Fri, Mar 18, 2016 at 3:19 PM, Mark Lawrence wrote:
I have no idea at what the above can mean, other than that you are agreeing
with the RUE.
Mark, are you aware that this is a rather classic ad hominem of guilt
by
Steven D'Aprano :
> One thing that NTFS gets right is that all path names are guaranteed
> to be well-formed, valid Unicode. I believe that they are stored in
> UTF-16, and unlike the ext file systems used on Linux, they are not
> arbitrary bytes.
On Fri, Mar 18, 2016 at 10:46 PM, Steven D'Aprano wrote:
> On Fri, 18 Mar 2016 06:00 pm, Ian Kelly wrote:
>
>> On Thu, Mar 17, 2016 at 1:21 PM, Rick Johnson
>> wrote:
>>> In the event that i change my mind about Unicode, and/or for
>>> the sake
On Sat, Mar 19, 2016 at 7:38 PM, Steven D'Aprano wrote:
> ls -l /home/user/documents/stuff/foo
>
>
> ls -l "home","user","documents","stuff","foo"
>
>
> I think users of command line tools and shells will hate you.
You misunderstand him. He doesn't want path names like that.
On Thursday, March 17, 2016 at 7:52:26 PM UTC-5, Gene Heskett wrote:
> So the obvious question then is, will any of your python code still be
> running and doing its labor saving and dead on the video frame timing
> job several times daily, 17 years hence?
Well, let me put it this way folks: As
On Sat, 19 Mar 2016 01:30 pm, Random832 wrote:
> On Fri, Mar 18, 2016, at 20:55, Chris Angelico wrote:
>> On Sat, Mar 19, 2016 at 9:03 AM, Marko Rauhamaa wrote:
>> > Also, special-casing '\0' and '/' is
>> > lame. Why can't I have "Results 1/2016" as a filename?
>>
>> Would
On Sat, Mar 19, 2016 at 7:22 PM, Marko Rauhamaa wrote:
> Not all files have pathnames. Those that do have numerous pathnames. You
> can't tell by looking at a file what pathnames, if any, it might have.
> You need an exhaustive, recursive search of the file system for the
>
On Fri, Mar 18, 2016 at 8:11 AM, Marko Rauhamaa wrote:
> Chris Angelico :
>
>> Like every language *including* English. You can pretend that ASCII is
>> enough, but you do lose some information.
>
> Hold it, I'll quickly update my résumé before we resume the
>
On Sat, Mar 19, 2016 at 8:28 AM, Marko Rauhamaa wrote:
> Chris Angelico :
>
>> The problem is not Python's Unicode strings, then. The problem is the
>> notion that path names are text. If they're text, they should be
>> exclusively text (although, for low-level
On Sat, 19 Mar 2016 08:08 am, Chris Angelico wrote:
> On Sat, Mar 19, 2016 at 8:02 AM, Marko Rauhamaa wrote:
>> Chris Angelico :
>>> On Sat, Mar 19, 2016 at 2:26 AM, Marko Rauhamaa
>>> wrote:
It may be that Python's Unicode abstraction
On Sat, Mar 19, 2016 at 6:49 PM, Marko Rauhamaa wrote:
> Speaking of the low level, the classic UNIX file system doesn't make use
> of pathnames. Rather, the files are nameless. They are identified by the
> device (= file system) number plus the inode number.
Not entirely fair.
Random832 :
> On Fri, Mar 18, 2016, at 20:55, Chris Angelico wrote:
>> On Sat, Mar 19, 2016 at 9:03 AM, Marko Rauhamaa wrote:
>> > Also, special-casing '\0' and '/' is
>> > lame. Why can't I have "Results 1/2016" as a filename?
>>
>> Would you be
On Sat, Mar 19, 2016 at 2:26 AM, Marko Rauhamaa wrote:
> Michael Torrie :
>
>> On 03/18/2016 02:26 AM, Jussi Piitulainen wrote:
>>> I think Julia's way of dealing with its strings-as-UTF-8 [2] is more
>>> promising. Indexing is by bytes (1-based in Julia) but
On Thu, Mar 17, 2016 at 1:21 PM, Rick Johnson
wrote:
> In the event that i change my mind about Unicode, and/or for
> the sake of others, who may want to know, please provide a
> list of languages that *YOU* think handle Unicode better than
> Python, starting with
On 3/18/2016 12:44 PM, Steven D'Aprano wrote:
Hmmm, well, nobody uses UCS-2 any more, since that only covers the first
65536 code points.
Unfortunately, tcl, or at least tk, still uses ucs-2. Hence tkinter and
applications thereof, like IDLE, can only display BMP code points. A
real
On Fri, Mar 18, 2016 at 6:37 AM, Chris Angelico wrote:
> On Fri, Mar 18, 2016 at 10:46 PM, Steven D'Aprano wrote:
>> Technically, UTF-8 doesn't *necessarily* imply indexing is O(n). For
>> instance, your UTF-8 string might consist of an array of bytes
>>> How about a list of languages that Unicode handles better than ASCII?
>>> Like almost every language *except* English.
>>
>> Like every language *including* English. You can pretend that ASCII is
>> enough, but you do lose some information.
>>
On Sat, 19 Mar 2016 02:26 am, Marko Rauhamaa wrote:
> Michael Torrie :
>
>> On 03/18/2016 02:26 AM, Jussi Piitulainen wrote:
>>> I think Julia's way of dealing with its strings-as-UTF-8 [2] is more
>>> promising. Indexing is by bytes (1-based in Julia) but the value at a
>>>
On 18/03/2016 18:18, sohcahto...@gmail.com wrote:
On Thursday, March 17, 2016 at 7:34:46 AM UTC-7, wxjm...@gmail.com
wrote:
Very simple. Use Python and its (buggy) character encoding model.
How to save memory? It's also very simple. Use a programming
language, which handles Unicode correctly.
bout a list of languages that Unicode handles better than
> >> ASCII? Like almost every language *except* English.
> >
> > Like every language *including* English. You can pretend that ASCII
> > is enough, but you do lose some information.
> >
> > Chris
On 3/18/2016 11:26 AM, Marko Rauhamaa wrote:
There's no problem providing pure Unicode strings. Things get iffy when
Python's OS abstraction pretends sys.stdin is text or filenames are
strings.
On Windows, filenames are arrays of wide chars, not bytes, and are
better modeled as 3.x strings
On Fri, Mar 18, 2016, at 03:00, Ian Kelly wrote:
> jmf has been asked this before, and as I recall he seems to feel that
> UTF-8 should be used for all purposes, ignoring the limitations of
> that encoding such as that indexing becomes a O(n) operation.
Just to play devil's advocate, here, why is
On Sat, Mar 19, 2016 at 3:05 PM, Ian Kelly wrote:
> On Fri, Mar 18, 2016 at 3:19 PM, Mark Lawrence
> wrote:
>>
>> I have no idea at what the above can mean, other than that you are agreeing
>> with the RUE.
>
> Mark, are you aware that this is a
On Fri, Mar 18, 2016 at 3:19 PM, Mark Lawrence wrote:
>
> I have no idea at what the above can mean, other than that you are agreeing
> with the RUE.
Mark, are you aware that this is a rather classic ad hominem of guilt
by association? "I didn't pay any attention to your
On Sat, Mar 19, 2016 at 9:03 AM, Marko Rauhamaa wrote:
> Also, special-casing '\0' and '/' is
> lame. Why can't I have "Results 1/2016" as a filename?
Would you be allowed to have a directory named "Results 1" as well?
ChrisA
--
Ian Kelly writes:
> On Thu, Mar 17, 2016 at 1:21 PM, Rick Johnson
> wrote:
>> In the event that i change my mind about Unicode, and/or for
>> the sake of others, who may want to know, please provide a
>> list of languages that *YOU* think handle Unicode better than
On Fri, Mar 18, 2016 at 10:44 AM, Steven D'Aprano wrote:
> On Sat, 19 Mar 2016 02:31 am, Random832 wrote:
>
>> On Fri, Mar 18, 2016, at 11:17, Ian Kelly wrote:
>>> If the string is simple UCS-2, that's easy.
>
> Hmmm, well, nobody uses UCS-2 any more, since that only covers
On Fri, Mar 18, 2016 at 8:56 AM, Random832 wrote:
> On Fri, Mar 18, 2016, at 03:00, Ian Kelly wrote:
>> jmf has been asked this before, and as I recall he seems to feel that
>> UTF-8 should be used for all purposes, ignoring the limitations of
>> that encoding such as that
On 03/18/2016 02:26 AM, Jussi Piitulainen wrote:
> I think Julia's way of dealing with its strings-as-UTF-8 [2] is more
> promising. Indexing is by bytes (1-based in Julia) but the value at a
> valid index is the whole UTF-8 character at that point, and an invalid
> index raises an exception.
Chris Angelico :
> On Sat, Mar 19, 2016 at 8:28 AM, Marko Rauhamaa wrote:
>> The file system does not have a problem. Python has a problem because it
>> tries to present pathnames as Unicode strings, which isn't always
>> possible.
>
> But what does a file
On Fri, Mar 18, 2016, at 20:55, Chris Angelico wrote:
> On Sat, Mar 19, 2016 at 9:03 AM, Marko Rauhamaa wrote:
> > Also, special-casing '\0' and '/' is
> > lame. Why can't I have "Results 1/2016" as a filename?
>
> Would you be allowed to have a directory named "Results 1" as
95 matches
Mail list logo