Re: How to waste computer memory?

2016-03-20 Thread Random832
On Sun, Mar 20, 2016, at 10:55, Ben Bacarisse wrote: > It's 21. The reason being (or at least part of the reason being) that > 21 bits can be UTF-8 encoded in 4 bytes: 0xxx 10xx 10xx > 10xx (3 + 3*6). The reason is the UTF-16 limit. Prior to that, UTF-8 had no such limit (it

Re: How to waste computer memory?

2016-03-20 Thread Marko Rauhamaa
Ben Bacarisse : > It's 21. The reason being (or at least part of the reason being) that > 21 bits can be UTF-8 encoded in 4 bytes: 0xxx 10xx 10xx > 10xx (3 + 3*6). I bet the reason is UTF-16. Microsoft and Sun/Oracle would have insisted on a maximum of 4

Re: How to waste computer memory?

2016-03-20 Thread Ben Bacarisse
Rustom Mody writes: > On Sunday, March 20, 2016 at 10:32:07 AM UTC+5:30, Steven D'Aprano wrote: >> Unicode (the character set part of it) is a set of abstract 23-bit numbers, > > 23? Or 21? It's 21. The reason being (or at least part of the reason being) that 21 bits

Re: How to waste computer memory?

2016-03-20 Thread Chris Angelico
On Sun, Mar 20, 2016 at 11:14 PM, Steven D'Aprano wrote: >>> On the other hand, I believe that the output of the UTF transformations >>> is explicitly described in terms of 8-bit bytes and 16- or 32-bit words. >>> For instance, the UTF-8 encoding of "A" has to be a single

Re: How to waste computer memory?

2016-03-20 Thread Steven D'Aprano
On Sun, 20 Mar 2016 10:22 pm, Chris Angelico wrote: > On Sun, Mar 20, 2016 at 10:06 PM, Steven D'Aprano > wrote: >> The Unicode standard does not, as far as I am aware, care how you >> represent code points in memory, only that there are 0x11 of them, >> numbered from

Re: How to waste computer memory?

2016-03-20 Thread Chris Angelico
On Sun, Mar 20, 2016 at 10:06 PM, Steven D'Aprano wrote: > The Unicode standard does not, as far as I am aware, care how you represent > code points in memory, only that there are 0x11 of them, numbered from > U+ to U+10. That's what I mean by abstract. The

Re: How to waste computer memory?

2016-03-20 Thread Marko Rauhamaa
Chris Angelico : > Like every language *including* English. You can pretend that ASCII is > enough, but you do lose some information. Hold it, I'll quickly update my résumé before we resume the conversation. What does this exposé expose? At least it gives a coup de grâce to

Re: How to waste computer memory?

2016-03-20 Thread Steven D'Aprano
On Sun, 20 Mar 2016 05:20 pm, Rustom Mody wrote: > On Sunday, March 20, 2016 at 10:32:07 AM UTC+5:30, Steven D'Aprano wrote: >> On Sun, 20 Mar 2016 03:12 am, Marko Rauhamaa wrote: >> >> > Steven D'Aprano : >> > >> >> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote: >> >>> Yes, but UTF-16

Re: How to waste computer memory?

2016-03-20 Thread Mark Lawrence
On 17/03/2016 21:26, BartC wrote: On 17/03/2016 21:11, Marko Rauhamaa wrote: Chris Angelico : Like every language *including* English. You can pretend that ASCII is enough, but you do lose some information. Hold it, I'll quickly update my résumé before we resume the

Re: How to waste computer memory?

2016-03-20 Thread Paul Rubin
Chris Angelico writes: > You can pretend that only 1 and 0 are enough. Good luck making THAT work. YOU had ONES??? Back in the day, my folks had to do everything with just zeros. -- https://mail.python.org/mailman/listinfo/python-list

Re: How to waste computer memory?

2016-03-20 Thread Steven D'Aprano
On Fri, 18 Mar 2016 10:46 pm, Steven D'Aprano wrote: > I think it is typical of JMF that his idea of a language where Unicode > "just works" is one where it *does work at all* (at least not as strings). Er, does NOT work at all. > Python 1.5 strings supported Unicode just as well as Go's string

Re: How to waste computer memory?

2016-03-20 Thread Marko Rauhamaa
Steven D'Aprano : > On Sun, 20 Mar 2016 03:12 am, Marko Rauhamaa wrote: >> Steven D'Aprano : >>> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote: Yes, but UTF-16 produces 16-bit values that are outside Unicode. >>> >>> Show me. >>> >>> Before you

Re: How to waste computer memory?

2016-03-20 Thread Rustom Mody
On Sunday, March 20, 2016 at 10:32:07 AM UTC+5:30, Steven D'Aprano wrote: > On Sun, 20 Mar 2016 03:12 am, Marko Rauhamaa wrote: > > > Steven D'Aprano : > > > >> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote: > >>> Yes, but UTF-16 produces 16-bit values that are outside Unicode. > >> > >>

Re: How to waste computer memory?

2016-03-19 Thread Steven D'Aprano
On Sun, 20 Mar 2016 03:12 am, Marko Rauhamaa wrote: > Steven D'Aprano : > >> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote: >>> Yes, but UTF-16 produces 16-bit values that are outside Unicode. >> >> Show me. >> >> Before you answer, if your answer is "surrogate pairs",

Re: How to waste computer memory?

2016-03-19 Thread sohcahtoa82
On Thursday, March 17, 2016 at 7:34:46 AM UTC-7, wxjm...@gmail.com wrote: > Very simple. Use Python and its (buggy) character encoding > model. > > How to save memory? > It's also very simple. Use a programming language, which > handles Unicode correctly. *looks at the other messages in this

Re: How to waste computer memory?

2016-03-19 Thread Random832
On Fri, Mar 18, 2016, at 11:17, Ian Kelly wrote: > > Just to play devil's advocate, here, why is it so bad for indexing to be > > O(n)? Some simple caching is all that's needed to prevent it from making > > iteration O(n^2), if that's what you're worried about. > > What kind of caching do you

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Fri, Mar 18, 2016 at 8:08 AM, Grant Edwards wrote: > On 2016-03-17, Chris Angelico wrote: >> On Fri, Mar 18, 2016 at 7:31 AM, wrote: >>> Rick Johnson wrote: In the event that i change my mind

Re: How to waste computer memory?

2016-03-19 Thread Steven D'Aprano
On Sat, 19 Mar 2016 02:31 am, Random832 wrote: > On Fri, Mar 18, 2016, at 11:17, Ian Kelly wrote: >> > Just to play devil's advocate, here, why is it so bad for indexing to >> > be O(n)? Some simple caching is all that's needed to prevent it from >> > making iteration O(n^2), if that's what

Re: How to waste computer memory?

2016-03-19 Thread Terry Reedy
On 3/18/2016 7:58 AM, Steven D'Aprano wrote: On Fri, 18 Mar 2016 10:46 pm, Steven D'Aprano wrote: I think it is typical of JMF that his idea of a language where Unicode "just works" is one where it *does work at all* (at least not as strings). Er, does NOT work at all. Python 1.5 strings

Re: How to waste computer memory?

2016-03-19 Thread Mark Lawrence
On 17/03/2016 21:13, Chris Angelico wrote: You can pretend that only 1 and 0 are enough. Good luck making THAT work. ChrisA The sales and marketing "thing", for lack of a better expression, that was used in the UK by Racal Telecommunications during the 1990s. Well I'm telling a fib, IIRC

Re: How to waste computer memory?

2016-03-19 Thread Mark Lawrence
On 18/03/2016 21:02, Marko Rauhamaa wrote: Chris Angelico : On Sat, Mar 19, 2016 at 2:26 AM, Marko Rauhamaa wrote: It may be that Python's Unicode abstraction is an untenable illusion because the underlying reality is 8-bit and there's no way to hide it

Re: How to waste computer memory?

2016-03-19 Thread cl
gt; >>> starting with the best first. Thanks. > > >> > > >> How about a list of languages that Unicode handles better than > > >> ASCII? Like almost every language *except* English. > > > > > > Like every language *including* English. You ca

Re: How to waste computer memory?

2016-03-19 Thread Marko Rauhamaa
Chris Angelico : > On Sat, Mar 19, 2016 at 2:26 AM, Marko Rauhamaa wrote: >> It may be that Python's Unicode abstraction is an untenable illusion >> because the underlying reality is 8-bit and there's no way to hide it >> completely. > > The underlying reality

Re: How to waste computer memory?

2016-03-19 Thread cl
Grant Edwards wrote: > On 2016-03-17, Chris Angelico wrote: > > On Fri, Mar 18, 2016 at 7:31 AM, wrote: > >> Rick Johnson wrote: > >>> > >>> In the event that i change my mind about Unicode, and/or for >

Usenet Message-ID (was Re: How to waste computer memory?)

2016-03-19 Thread Random832
On Fri, Mar 18, 2016, at 15:46, Tim Golden wrote: > Speaking for a moment as the list owner. Posts by this OP are usually > blatant provocation and I usually filter them out before they hit the > list. (They'll still appear if you're reading via Usenet). In this case > I approved a post

Re: How to waste computer memory?

2016-03-19 Thread Grant Edwards
On 2016-03-17, Chris Angelico wrote: > On Fri, Mar 18, 2016 at 7:31 AM, wrote: >> Rick Johnson wrote: >>> >>> In the event that i change my mind about Unicode, and/or for >>> the sake of others, who may want to know, please provide

Re: How to waste computer memory?

2016-03-19 Thread Marko Rauhamaa
Chris Angelico : > The problem is not Python's Unicode strings, then. The problem is the > notion that path names are text. If they're text, they should be > exclusively text (although, for low-level efficiency, they're more > likely to be defined as "valid UTF-8 sequences"

Re: How to waste computer memory?

2016-03-19 Thread Rick Johnson
On Thursday, March 17, 2016 at 9:34:46 AM UTC-5, wxjm...@gmail.com wrote: > Very simple. Use Python and its (buggy) character encoding > model. How to save memory? It's also very simple. Use a > programming language, which handles Unicode correctly. I personally don't have much use for Unicode,

Re: How to waste computer memory?

2016-03-19 Thread BartC
On 17/03/2016 21:11, Marko Rauhamaa wrote: Chris Angelico : Like every language *including* English. You can pretend that ASCII is enough, but you do lose some information. Hold it, I'll quickly update my résumé before we resume the conversation. What does this exposé

Re: How to waste computer memory?

2016-03-19 Thread Marko Rauhamaa
Chris Angelico : > On Sat, Mar 19, 2016 at 6:49 PM, Marko Rauhamaa wrote: >> Speaking of the low level, the classic UNIX file system doesn't make >> use of pathnames. Rather, the files are nameless. They are identified >> by the device (= file system) number

Re: How to waste computer memory?

2016-03-19 Thread Marko Rauhamaa
Michael Torrie : > On 03/18/2016 02:26 AM, Jussi Piitulainen wrote: >> I think Julia's way of dealing with its strings-as-UTF-8 [2] is more >> promising. Indexing is by bytes (1-based in Julia) but the value at a >> valid index is the whole UTF-8 character at that point, and an

Re: How to waste computer memory?

2016-03-19 Thread Gene Heskett
Like almost every language *except* English. > > > > > > > > Like every language *including* English. You can pretend that > > > > ASCII is enough, but you do lose some information. > > > > > > > > ChrisA > > > > > > as we

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Fri, Mar 18, 2016 at 8:26 AM, BartC wrote: > On 17/03/2016 21:11, Marko Rauhamaa wrote: >> >> Chris Angelico : >> >>> Like every language *including* English. You can pretend that ASCII is >>> enough, but you do lose some information. >> >> >> Hold it, I'll

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Sun, Mar 20, 2016 at 3:12 AM, Marko Rauhamaa wrote: > Steven D'Aprano : > >> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote: >>> Yes, but UTF-16 produces 16-bit values that are outside Unicode. >> >> Show me. >> >> Before you answer, if your answer is

Re: How to waste computer memory?

2016-03-19 Thread Marko Rauhamaa
Steven D'Aprano : > On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote: >> Yes, but UTF-16 produces 16-bit values that are outside Unicode. > > Show me. > > Before you answer, if your answer is "surrogate pairs", that is > incorrect. Surrogate pairs is how UTF-16 encodes

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Sun, Mar 20, 2016 at 2:05 AM, Michael Torrie wrote: > Of course not. Shells already associate specific meaning with certain > characters that can be used in file names. For example the various > quoting characters, such as ' or ". These can be used in file names but > when

Re: How to waste computer memory?

2016-03-19 Thread Steven D'Aprano
On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote: > Steven D'Aprano : > >> On Sat, 19 Mar 2016 08:31 pm, Marko Rauhamaa wrote: >> >> >>>Using the surrogate mechanism, UTF-16 can support all 1,114,112 >>>potential Unicode characters. >>> >>> But Unicode doesn't

Re: How to waste computer memory?

2016-03-19 Thread BartC
On 19/03/2016 15:14, BartC wrote: Which is about 3000 decimal digits, slightly more than 1KB in packed binary. In BCD it would be 1.5KB. At one-byte per digit (eg. ASCII) it's 3KB. At 4 bytes per (eg. UCS4), it's 12KB. The comment refers to this which inexplicably got snipped (not my fault

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Sun, Mar 20, 2016 at 1:56 AM, Marko Rauhamaa wrote: > Steven D'Aprano : > >> On Sat, 19 Mar 2016 11:42 pm, Marko Rauhamaa wrote: >>> When glorifying Python's advanced Unicode capabilities, are we >>> careful to emphasize the necessity of

Re: How to waste computer memory?

2016-03-19 Thread BartC
On 19/03/2016 14:18, Steven D'Aprano wrote: On Sat, 19 Mar 2016 11:24 pm, BartC wrote about combining characters: And occupy somewhere between 50 and 200 bytes? Or is that 400? OK... You say that as if 400 bytes was a lot. No, just unpredictable. Besides, this is hardly any different

Re: How to waste computer memory?

2016-03-19 Thread Michael Torrie
On 03/19/2016 02:38 AM, Steven D'Aprano wrote: > On Sat, 19 Mar 2016 01:30 pm, Random832 wrote: > >> On Fri, Mar 18, 2016, at 20:55, Chris Angelico wrote: >>> On Sat, Mar 19, 2016 at 9:03 AM, Marko Rauhamaa wrote: Also, special-casing '\0' and '/' is lame. Why can't I

Re: How to waste computer memory?

2016-03-19 Thread Marko Rauhamaa
Steven D'Aprano : > On Sat, 19 Mar 2016 08:31 pm, Marko Rauhamaa wrote: > > >>Using the surrogate mechanism, UTF-16 can support all 1,114,112 >>potential Unicode characters. >> >> But Unicode doesn't contain 1,114,112 characters—the surrogates are >> excluded from

Re: How to waste computer memory?

2016-03-19 Thread Steven D'Aprano
On Sat, 19 Mar 2016 08:31 pm, Marko Rauhamaa wrote: >Using the surrogate mechanism, UTF-16 can support all 1,114,112 >potential Unicode characters. > > But Unicode doesn't contain 1,114,112 characters—the surrogates are > excluded from Unicode, and definitely cannot be encoded using >

Re: How to waste computer memory?

2016-03-19 Thread Marko Rauhamaa
Steven D'Aprano : > On Sat, 19 Mar 2016 11:42 pm, Marko Rauhamaa wrote: >> When glorifying Python's advanced Unicode capabilities, are we >> careful to emphasize the necessity of unicodedata.normalize() >> everywhere? Should Python normalize strings unconditionally and >>

Re: How to waste computer memory?

2016-03-19 Thread Tim Chase
On 2016-03-19 12:24, BartC wrote: > So a string that looks like: > > "ññ" > > can have 2**50 different representations? And occupy somewhere > between 50 and 200 bytes? Or is that 400? And moreover, they're all distinct if you don't normalize

Re: How to waste computer memory?

2016-03-19 Thread Steven D'Aprano
On Sat, 19 Mar 2016 11:42 pm, Marko Rauhamaa wrote: > The problem is not theoretical. If I implement a web form and someone > enters "Aña" as their name, how do I make sure queries find the name > regardless of the unicode code point sequence? I have to normalize using > unicodedata.normalize().

Re: How to waste computer memory?

2016-03-19 Thread Steven D'Aprano
On Sat, 19 Mar 2016 11:24 pm, BartC wrote about combining characters: > So a string that looks like: > > "ññ" > > can have 2**50 different representations? Yes. > And occupy somewhere between 50 and 200 bytes? Or is that 400? The minimum

Re: How to waste computer memory?

2016-03-19 Thread Jussi Piitulainen
Steven D'Aprano writes: > And I don't understand this meme that indexing strings is not > important. Have people never (say) taken a slice of a string, or a > look-ahead, or something similar? > > i = mystring.find(":") > next_char = mystring[i+1] The point is that O(1) indexing and slicing

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Sat, Mar 19, 2016 at 11:42 PM, Marko Rauhamaa wrote: >> The problem is not so much the existence of combining characters, but that >> *some* but not all accented characters are available in two forms: a >> composed single code point, and a decomposed pair of code points. > >

Re: How to waste computer memory?

2016-03-19 Thread Grant Edwards
On 2016-03-18, c...@isbd.net wrote: > However I doubt it's still being used, a year or two after I wrote it > we migrated to a Tektronix development system that ran Unix (wow!). The PDP-11 one that ran TNIX (a thinly disguised port of v7)? Back in the early 80's we used a copule

Re: How to waste computer memory?

2016-03-19 Thread Marko Rauhamaa
BartC : > So a string that looks like: > > "ññ" > > can have 2**50 different representations? And occupy somewhere between > 50 and 200 bytes? Or is that 400? > > OK... You are on the right track! Marko --

Re: How to waste computer memory?

2016-03-19 Thread Marko Rauhamaa
Steven D'Aprano : > As usual, Unicode problems are generally due to backwards > compatibility. Blame the old legacy encodings, which invented the > "dead keys" a.k.a. "combining character" technique. Of course, they > had a reasonable excuse at the time, but Unicode's

Re: How to waste computer memory?

2016-03-19 Thread BartC
On 19/03/2016 11:07, Marko Rauhamaa wrote: Chris Angelico : On Sat, Mar 19, 2016 at 8:31 PM, Marko Rauhamaa wrote: Unicode made several (understandable but grave) mistakes along the way: * normalization Elaborate please? What's such a big mistake

Re: How to waste computer memory?

2016-03-19 Thread Random832
On Fri, Mar 18, 2016, at 12:44, Steven D'Aprano wrote: > And I don't understand this meme that indexing strings is not important. > Have people never (say) taken a slice of a string, or a look-ahead, or > something similar? > > i = mystring.find(":") find is already O(N). > next_char =

Re: How to waste computer memory?

2016-03-19 Thread Steven D'Aprano
On Fri, 18 Mar 2016 06:00 pm, Ian Kelly wrote: > On Thu, Mar 17, 2016 at 1:21 PM, Rick Johnson > wrote: >> In the event that i change my mind about Unicode, and/or for >> the sake of others, who may want to know, please provide a >> list of languages that *YOU*

Re: How to waste computer memory?

2016-03-19 Thread cl
Rick Johnson wrote: > > In the event that i change my mind about Unicode, and/or for > the sake of others, who may want to know, please provide a > list of languages that *YOU* think handle Unicode better than > Python, starting with the best first. Thanks. > How

Re: How to waste computer memory?

2016-03-19 Thread Steven D'Aprano
On Sat, 19 Mar 2016 09:18 pm, Chris Angelico wrote: > On Sat, Mar 19, 2016 at 8:31 PM, Marko Rauhamaa wrote: >> Unicode made several (understandable but grave) mistakes along the way: >> >>* normalization >> > > Elaborate please? What's such a big mistake here? As usual,

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Fri, Mar 18, 2016 at 7:31 AM, wrote: > Rick Johnson wrote: >> >> In the event that i change my mind about Unicode, and/or for >> the sake of others, who may want to know, please provide a >> list of languages that *YOU* think handle Unicode better

Re: How to waste computer memory?

2016-03-19 Thread alister
every language *except* English. > > Like every language *including* English. You can pretend that ASCII is > enough, but you do lose some information. > > ChrisA as we all seam to have bitten the troll's thread "how to waste computer memory" give it to an delusion

Re: How to waste computer memory?

2016-03-19 Thread Random832
On Fri, Mar 18, 2016, at 10:59, Michael Torrie wrote: > This seems to me to be a leaky abstraction. Julia's approach is > interesting, but it strikes me as somewhat broken as it pretends to do > O(1) indexing, but in reality it's still O(n) because you still have to > iterate through the bytes

Re: How to waste computer memory?

2016-03-19 Thread Marko Rauhamaa
Chris Angelico : > On Sat, Mar 19, 2016 at 8:31 PM, Marko Rauhamaa wrote: >> Unicode made several (understandable but grave) mistakes along the way: >> >>* normalization > > Elaborate please? What's such a big mistake here? Unicode shouldn't have allowed

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Sat, Mar 19, 2016 at 8:02 AM, Marko Rauhamaa wrote: > Chris Angelico : >> On Sat, Mar 19, 2016 at 2:26 AM, Marko Rauhamaa wrote: >>> It may be that Python's Unicode abstraction is an untenable illusion >>> because the underlying reality is

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Sat, Mar 19, 2016 at 8:31 PM, Marko Rauhamaa wrote: > Unicode made several (understandable but grave) mistakes along the way: > >* normalization > Elaborate please? What's such a big mistake here? ChrisA -- https://mail.python.org/mailman/listinfo/python-list

Re: How to waste computer memory?

2016-03-19 Thread Marko Rauhamaa
Chris Angelico : > On Sat, Mar 19, 2016 at 7:22 PM, Marko Rauhamaa wrote: >> Not all files have pathnames. Those that do have numerous pathnames. You >> can't tell by looking at a file what pathnames, if any, it might have. >> You need an exhaustive, recursive

Re: How to waste computer memory?

2016-03-19 Thread Mark Lawrence
On 19/03/2016 04:05, Ian Kelly wrote: On Fri, Mar 18, 2016 at 3:19 PM, Mark Lawrence wrote: I have no idea at what the above can mean, other than that you are agreeing with the RUE. Mark, are you aware that this is a rather classic ad hominem of guilt by

Re: How to waste computer memory?

2016-03-19 Thread Marko Rauhamaa
Steven D'Aprano : > One thing that NTFS gets right is that all path names are guaranteed > to be well-formed, valid Unicode. I believe that they are stored in > UTF-16, and unlike the ext file systems used on Linux, they are not > arbitrary bytes.

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Fri, Mar 18, 2016 at 10:46 PM, Steven D'Aprano wrote: > On Fri, 18 Mar 2016 06:00 pm, Ian Kelly wrote: > >> On Thu, Mar 17, 2016 at 1:21 PM, Rick Johnson >> wrote: >>> In the event that i change my mind about Unicode, and/or for >>> the sake

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Sat, Mar 19, 2016 at 7:38 PM, Steven D'Aprano wrote: > ls -l /home/user/documents/stuff/foo > > > ls -l "home","user","documents","stuff","foo" > > > I think users of command line tools and shells will hate you. You misunderstand him. He doesn't want path names like that.

Re: How to waste computer memory?

2016-03-19 Thread Rick Johnson
On Thursday, March 17, 2016 at 7:52:26 PM UTC-5, Gene Heskett wrote: > So the obvious question then is, will any of your python code still be > running and doing its labor saving and dead on the video frame timing > job several times daily, 17 years hence? Well, let me put it this way folks: As

Re: How to waste computer memory?

2016-03-19 Thread Steven D'Aprano
On Sat, 19 Mar 2016 01:30 pm, Random832 wrote: > On Fri, Mar 18, 2016, at 20:55, Chris Angelico wrote: >> On Sat, Mar 19, 2016 at 9:03 AM, Marko Rauhamaa wrote: >> > Also, special-casing '\0' and '/' is >> > lame. Why can't I have "Results 1/2016" as a filename? >> >> Would

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Sat, Mar 19, 2016 at 7:22 PM, Marko Rauhamaa wrote: > Not all files have pathnames. Those that do have numerous pathnames. You > can't tell by looking at a file what pathnames, if any, it might have. > You need an exhaustive, recursive search of the file system for the >

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Fri, Mar 18, 2016 at 8:11 AM, Marko Rauhamaa wrote: > Chris Angelico : > >> Like every language *including* English. You can pretend that ASCII is >> enough, but you do lose some information. > > Hold it, I'll quickly update my résumé before we resume the >

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Sat, Mar 19, 2016 at 8:28 AM, Marko Rauhamaa wrote: > Chris Angelico : > >> The problem is not Python's Unicode strings, then. The problem is the >> notion that path names are text. If they're text, they should be >> exclusively text (although, for low-level

Re: How to waste computer memory?

2016-03-19 Thread Steven D'Aprano
On Sat, 19 Mar 2016 08:08 am, Chris Angelico wrote: > On Sat, Mar 19, 2016 at 8:02 AM, Marko Rauhamaa wrote: >> Chris Angelico : >>> On Sat, Mar 19, 2016 at 2:26 AM, Marko Rauhamaa >>> wrote: It may be that Python's Unicode abstraction

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Sat, Mar 19, 2016 at 6:49 PM, Marko Rauhamaa wrote: > Speaking of the low level, the classic UNIX file system doesn't make use > of pathnames. Rather, the files are nameless. They are identified by the > device (= file system) number plus the inode number. Not entirely fair.

Re: How to waste computer memory?

2016-03-19 Thread Marko Rauhamaa
Random832 : > On Fri, Mar 18, 2016, at 20:55, Chris Angelico wrote: >> On Sat, Mar 19, 2016 at 9:03 AM, Marko Rauhamaa wrote: >> > Also, special-casing '\0' and '/' is >> > lame. Why can't I have "Results 1/2016" as a filename? >> >> Would you be

Re: How to waste computer memory?

2016-03-19 Thread Chris Angelico
On Sat, Mar 19, 2016 at 2:26 AM, Marko Rauhamaa wrote: > Michael Torrie : > >> On 03/18/2016 02:26 AM, Jussi Piitulainen wrote: >>> I think Julia's way of dealing with its strings-as-UTF-8 [2] is more >>> promising. Indexing is by bytes (1-based in Julia) but

Re: How to waste computer memory?

2016-03-19 Thread Ian Kelly
On Thu, Mar 17, 2016 at 1:21 PM, Rick Johnson wrote: > In the event that i change my mind about Unicode, and/or for > the sake of others, who may want to know, please provide a > list of languages that *YOU* think handle Unicode better than > Python, starting with

Re: How to waste computer memory?

2016-03-19 Thread Terry Reedy
On 3/18/2016 12:44 PM, Steven D'Aprano wrote: Hmmm, well, nobody uses UCS-2 any more, since that only covers the first 65536 code points. Unfortunately, tcl, or at least tk, still uses ucs-2. Hence tkinter and applications thereof, like IDLE, can only display BMP code points. A real

Re: How to waste computer memory?

2016-03-19 Thread Ian Kelly
On Fri, Mar 18, 2016 at 6:37 AM, Chris Angelico wrote: > On Fri, Mar 18, 2016 at 10:46 PM, Steven D'Aprano wrote: >> Technically, UTF-8 doesn't *necessarily* imply indexing is O(n). For >> instance, your UTF-8 string might consist of an array of bytes

Re: How to waste computer memory?

2016-03-19 Thread alister
>>> How about a list of languages that Unicode handles better than ASCII? >>> Like almost every language *except* English. >> >> Like every language *including* English. You can pretend that ASCII is >> enough, but you do lose some information. >>

Re: How to waste computer memory?

2016-03-19 Thread Steven D'Aprano
On Sat, 19 Mar 2016 02:26 am, Marko Rauhamaa wrote: > Michael Torrie : > >> On 03/18/2016 02:26 AM, Jussi Piitulainen wrote: >>> I think Julia's way of dealing with its strings-as-UTF-8 [2] is more >>> promising. Indexing is by bytes (1-based in Julia) but the value at a >>>

Re: How to waste computer memory?

2016-03-19 Thread Tim Golden
On 18/03/2016 18:18, sohcahto...@gmail.com wrote: On Thursday, March 17, 2016 at 7:34:46 AM UTC-7, wxjm...@gmail.com wrote: Very simple. Use Python and its (buggy) character encoding model. How to save memory? It's also very simple. Use a programming language, which handles Unicode correctly.

Re: How to waste computer memory?

2016-03-19 Thread Gene Heskett
bout a list of languages that Unicode handles better than > >> ASCII? Like almost every language *except* English. > > > > Like every language *including* English. You can pretend that ASCII > > is enough, but you do lose some information. > > > > Chris

Re: How to waste computer memory?

2016-03-18 Thread Terry Reedy
On 3/18/2016 11:26 AM, Marko Rauhamaa wrote: There's no problem providing pure Unicode strings. Things get iffy when Python's OS abstraction pretends sys.stdin is text or filenames are strings. On Windows, filenames are arrays of wide chars, not bytes, and are better modeled as 3.x strings

Re: How to waste computer memory?

2016-03-18 Thread Random832
On Fri, Mar 18, 2016, at 03:00, Ian Kelly wrote: > jmf has been asked this before, and as I recall he seems to feel that > UTF-8 should be used for all purposes, ignoring the limitations of > that encoding such as that indexing becomes a O(n) operation. Just to play devil's advocate, here, why is

Re: How to waste computer memory?

2016-03-18 Thread Chris Angelico
On Sat, Mar 19, 2016 at 3:05 PM, Ian Kelly wrote: > On Fri, Mar 18, 2016 at 3:19 PM, Mark Lawrence > wrote: >> >> I have no idea at what the above can mean, other than that you are agreeing >> with the RUE. > > Mark, are you aware that this is a

Re: How to waste computer memory?

2016-03-18 Thread Ian Kelly
On Fri, Mar 18, 2016 at 3:19 PM, Mark Lawrence wrote: > > I have no idea at what the above can mean, other than that you are agreeing > with the RUE. Mark, are you aware that this is a rather classic ad hominem of guilt by association? "I didn't pay any attention to your

Re: How to waste computer memory?

2016-03-18 Thread Chris Angelico
On Sat, Mar 19, 2016 at 9:03 AM, Marko Rauhamaa wrote: > Also, special-casing '\0' and '/' is > lame. Why can't I have "Results 1/2016" as a filename? Would you be allowed to have a directory named "Results 1" as well? ChrisA --

Re: How to waste computer memory?

2016-03-18 Thread Jussi Piitulainen
Ian Kelly writes: > On Thu, Mar 17, 2016 at 1:21 PM, Rick Johnson > wrote: >> In the event that i change my mind about Unicode, and/or for >> the sake of others, who may want to know, please provide a >> list of languages that *YOU* think handle Unicode better than

Re: How to waste computer memory?

2016-03-18 Thread Ian Kelly
On Fri, Mar 18, 2016 at 10:44 AM, Steven D'Aprano wrote: > On Sat, 19 Mar 2016 02:31 am, Random832 wrote: > >> On Fri, Mar 18, 2016, at 11:17, Ian Kelly wrote: >>> If the string is simple UCS-2, that's easy. > > Hmmm, well, nobody uses UCS-2 any more, since that only covers

Re: How to waste computer memory?

2016-03-18 Thread Ian Kelly
On Fri, Mar 18, 2016 at 8:56 AM, Random832 wrote: > On Fri, Mar 18, 2016, at 03:00, Ian Kelly wrote: >> jmf has been asked this before, and as I recall he seems to feel that >> UTF-8 should be used for all purposes, ignoring the limitations of >> that encoding such as that

Re: How to waste computer memory?

2016-03-18 Thread Michael Torrie
On 03/18/2016 02:26 AM, Jussi Piitulainen wrote: > I think Julia's way of dealing with its strings-as-UTF-8 [2] is more > promising. Indexing is by bytes (1-based in Julia) but the value at a > valid index is the whole UTF-8 character at that point, and an invalid > index raises an exception.

Re: How to waste computer memory?

2016-03-18 Thread Marko Rauhamaa
Chris Angelico : > On Sat, Mar 19, 2016 at 8:28 AM, Marko Rauhamaa wrote: >> The file system does not have a problem. Python has a problem because it >> tries to present pathnames as Unicode strings, which isn't always >> possible. > > But what does a file

Re: How to waste computer memory?

2016-03-18 Thread Random832
On Fri, Mar 18, 2016, at 20:55, Chris Angelico wrote: > On Sat, Mar 19, 2016 at 9:03 AM, Marko Rauhamaa wrote: > > Also, special-casing '\0' and '/' is > > lame. Why can't I have "Results 1/2016" as a filename? > > Would you be allowed to have a directory named "Results 1" as