In Python 2.x, is it possible to make unicode as default like in Python 3.x?

2011-06-08 Thread G00gle and Python Lover
Hello. I almost like everything in Python. Code shrinking, logic of processes, libraries, code design etc. But, we... - everybody knows that Python 2.x has lack of unicode support. In Python 3.x, this has been fixed :) And I like 3.x more than 2.x But, still major applications haven't been ported

Re: In Python 2.x, is it possible to make unicode as default like in Python 3.x?

2011-06-08 Thread Benjamin Kaplan
...' # And finally, most annoying thing isn't it? Thanks... I think you're misunderstanding what Unicode support means. Python 2 does have unicode support, but it doesn't do Unicode by default. And a lack of Unicode by default does not mean ASCII either. There are two ways of looking at strings: as a sequence

Re: unicode by default

2011-05-14 Thread harrismh777
Terry Reedy wrote: Is there a unix linux package that can be installed that drops at least 'one' default standard font that will be able to render all or 'most' (whatever I mean by that) code points in unicode? Is this a Python issue at all? Easy, practical use of unicode is still a work in

Re: unicode by default

2011-05-14 Thread Nobody
On Fri, 13 May 2011 14:53:50 -0500, harrismh777 wrote: The unicode consortium is very careful to make sure that thousands of symbols have a unique code point (that's great !) but how do these thousands of symbols actually get displayed if there is no font consortium? Are there

Re: unicode by default

2011-05-14 Thread jmfauth
On 14 mai, 09:41, harrismh777 harrismh...@charter.net wrote: ... I'm getting much closer here, ... You should really understand, that Unicode is a domain per se. It is independent from any os's, programming languages or applications. It is up to these tools to be unicode compliant. Working

Re: unicode by default

2011-05-14 Thread Terry Reedy
On 5/14/2011 3:41 AM, harrismh777 wrote: Terry Reedy wrote: Easy, practical use of unicode is still a work in progress. Apparently... the good news for me is that SBL provides their unicode font here: http://www.sbl-site.org/educational/biblicalfonts.aspx I'm getting much closer here, but

Re: unicode by default

2011-05-14 Thread Ben Finney
Terry Reedy tjre...@udel.edu writes: You need what is called, at least with Windows, an IME -- Input Method Editor. For a GNOME or KDE environment you want an input method framework; I recommend IBus URL:http://code.google.com/p/ibus/ which comes with the major GNU+Linux operating systems

Re: unicode by default

2011-05-13 Thread jmfauth
On 12 mai, 18:17, Ian Kelly ian.g.ke...@gmail.com wrote: ... to worry about encodings are when you're encoding unicode characters to byte strings, or decoding bytes to unicode characters A small but important correction/clarification: In Unicode, unicode does not encode a *character*. It

Re: unicode by default

2011-05-13 Thread harrismh777
jmfauth wrote: to worry about encodings are when you're encoding unicode characters to byte strings, or decoding bytes to unicode characters A small but important correction/clarification: In Unicode, unicode does not encode a*character*. It encodes a*code point*, a number, the integer

Re: unicode by default

2011-05-13 Thread Robert Kern
On 5/13/11 2:53 PM, harrismh777 wrote: The unicode consortium is very careful to make sure that thousands of symbols have a unique code point (that's great !) but how do these thousands of symbols actually get displayed if there is no font consortium? Are there collections of 'standard' fonts

Re: unicode by default

2011-05-13 Thread Terry Reedy
On 5/13/2011 3:53 PM, harrismh777 wrote: The unicode consortium is very careful to make sure that thousands of symbols have a unique code point (that's great !) but how do these thousands of symbols actually get displayed if there is no font consortium? Are there collections of 'standard' fonts

Re: unicode by default

2011-05-12 Thread harrismh777
John Machin wrote: On Thu, May 12, 2011 2:14 pm, Benjamin Kaplan wrote: If the file you're writing to doesn't specify an encoding, Python will default to locale.getdefaultencoding(), No such attribute. Perhaps you mean locale.getpreferredencoding() import locale

Re: unicode by default

2011-05-12 Thread harrismh777
Ben Finney wrote: I'd phrase that as: * Text is a sequence of characters. Most inputs to the program, including files, sockets, etc., contain a sequence of bytes. * Always know whether you're dealing with text or with bytes. No object can be both. * In Python 2, ‘str’ is the type

Re: unicode by default

2011-05-12 Thread harrismh777
Terry Reedy wrote: It does not matter how Python stored the unicode internally. Does this help? Your intent is signalled by how you open the file. Very much, actually, thanks. I was missing the 'internal' piece, and did not realize that if I didn't specify the encoding on the open that

Re: unicode by default

2011-05-12 Thread John Machin
On Thu, May 12, 2011 4:31 pm, harrismh777 wrote: So, the UTF-16 UTF-32 is INTERNAL only, for Python NO. See one of my previous messages. UTF-16 and UTF-32, like UTF-8 are encodings for the EXTERNAL representation of Unicode characters in byte streams. I also was not aware that UTF-8 chars

Re: unicode by default

2011-05-12 Thread TheSaint
John Machin wrote: On Thu, May 12, 2011 2:14 pm, Benjamin Kaplan wrote: If the file you're writing to doesn't specify an encoding, Python will default to locale.getdefaultencoding(), No such attribute. Perhaps you mean locale.getpreferredencoding() what about sys.getfilesystemencoding()

Re: unicode by default

2011-05-12 Thread Ian Kelly
On Thu, May 12, 2011 at 1:58 AM, John Machin sjmac...@lexicon.net wrote: On Thu, May 12, 2011 4:31 pm, harrismh777 wrote: So, the UTF-16 UTF-32 is INTERNAL only, for Python NO. See one of my previous messages. UTF-16 and UTF-32, like UTF-8 are encodings for the EXTERNAL representation of

Re: unicode by default

2011-05-12 Thread Terry Reedy
On 5/12/2011 12:17 PM, Ian Kelly wrote: On Thu, May 12, 2011 at 1:58 AM, John Machinsjmac...@lexicon.net wrote: On Thu, May 12, 2011 4:31 pm, harrismh777 wrote: So, the UTF-16 UTF-32 is INTERNAL only, for Python NO. See one of my previous messages. UTF-16 and UTF-32, like UTF-8 are

Re: unicode by default

2011-05-12 Thread Ian Kelly
On Thu, May 12, 2011 at 2:42 PM, Terry Reedy tjre...@udel.edu wrote: On 5/12/2011 12:17 PM, Ian Kelly wrote: Right.  *Under the hood* Python uses UCS-2 (which is not exactly the same thing as UTF-16, by the way) to represent Unicode strings. I know some people say that, but according to the

unicode by default

2011-05-11 Thread harrismh777
hi folks, I am puzzled by unicode generally, and within the context of python specifically. For one thing, what do we mean that unicode is used in python 3.x by default. (I know what default means, I mean, what changed?) I think part of my problem is that I'm spoiled (American, ascii

Re: unicode by default

2011-05-11 Thread Ian Kelly
that UCS-2 has always been the default unicode width for CPython, although the exact representation used internally is an implementation detail.   The books say that the .py sources are UTF-8 by default... and that 3.x is either UCS-2 or UCS-4.  If I use the file handling capabilities of Python

Re: unicode by default

2011-05-11 Thread Benjamin Kaplan
On Wed, May 11, 2011 at 2:37 PM, harrismh777 harrismh...@charter.net wrote: hi folks,   I am puzzled by unicode generally, and within the context of python specifically. For one thing, what do we mean that unicode is used in python 3.x by default. (I know what default means, I mean, what

Re: unicode by default

2011-05-11 Thread harrismh777
Ian Kelly wrote: Ian, Benjamin, thanks much. The `unicode' class was renamed to `str', and a stripped-down version of the 2.X `str' class was renamed to `bytes'. ... thank you, this is very helpful. If I do not specify any code points above ascii 0xFF does any of this matter

Re: unicode by default

2011-05-11 Thread John Machin
On Thu, May 12, 2011 8:51 am, harrismh777 wrote: Is it true that if I am working without using bytes sequences that I will not need to care about the encoding anyway, unless of course I need to specify a unicode code point? Quite the contrary. (1) You cannot work without using bytes

Re: unicode by default

2011-05-11 Thread harrismh777
John Machin wrote: (1) You cannot work without using bytes sequences. Files are byte sequences. Web communication is in bytes. You need to (know / assume / be able to extract / guess) the input encoding. You need to encode your output using an encoding that is expected by the consumer (or use an

Re: unicode by default

2011-05-11 Thread MRAB
On 12/05/2011 02:22, harrismh777 wrote: John Machin wrote: (1) You cannot work without using bytes sequences. Files are byte sequences. Web communication is in bytes. You need to (know / assume / be able to extract / guess) the input encoding. You need to encode your output using an encoding

Re: unicode by default

2011-05-11 Thread Steven D'Aprano
On Thu, 12 May 2011 03:31:18 +0100, MRAB wrote: Another question... in mail I'm receiving many small blocks that look like sprites with four small hex codes, scattered about the mail... mostly punctuation, maybe? ... guessing, are these unicode code points, and if so what is the best way to

Re: unicode by default

2011-05-11 Thread harrismh777
Steven D'Aprano wrote: You need to understand the difference between characters and bytes. http://www.joelonsoftware.com/articles/Unicode.html is also a good resource. Thanks for being patient guys, here's what I've done: astr=pound sign asym= \u00A3 afile=open(myfile, mode='w')

Re: unicode by default

2011-05-11 Thread John Machin
On Thu, May 12, 2011 11:22 am, harrismh777 wrote: John Machin wrote: (1) You cannot work without using bytes sequences. Files are byte sequences. Web communication is in bytes. You need to (know / assume / be able to extract / guess) the input encoding. You need to encode your output using

Re: unicode by default

2011-05-11 Thread Ben Finney
MRAB pyt...@mrabarnett.plus.com writes: You need to understand the difference between characters and bytes. Yep. Those who don't need to join us in the third millennium, and the resources pointed out in this thread are good to help that. A string contains characters, a file contains bytes.

Re: unicode by default

2011-05-11 Thread Terry Reedy
On 5/11/2011 11:44 PM, harrismh777 wrote: Steven D'Aprano wrote: You need to understand the difference between characters and bytes. http://www.joelonsoftware.com/articles/Unicode.html is also a good resource. Thanks for being patient guys, here's what I've done: astr=pound sign asym=

Re: unicode by default

2011-05-11 Thread John Machin
On Thu, May 12, 2011 1:44 pm, harrismh777 wrote: By default it looks like Python3 is writing output with UTF-8 as default... and I thought that by default Python3 was using either UTF-16 or UTF-32. So, I'm confused here... also, I used the character sequence \u00A3 which I thought was

Re: unicode by default

2011-05-11 Thread Benjamin Kaplan
On Wed, May 11, 2011 at 8:44 PM, harrismh777 harrismh...@charter.net wrote: Steven D'Aprano wrote: You need to understand the difference between characters and bytes. http://www.joelonsoftware.com/articles/Unicode.html is also a good resource. Thanks for being patient guys, here's what

Re: unicode by default

2011-05-11 Thread John Machin
On Thu, May 12, 2011 2:14 pm, Benjamin Kaplan wrote: If the file you're writing to doesn't specify an encoding, Python will default to locale.getdefaultencoding(), No such attribute. Perhaps you mean locale.getpreferredencoding() -- http://mail.python.org/mailman/listinfo/python-list

Re: Unicode again ... default codec ...

2009-10-30 Thread Gabriel Genellina
that dictionaries in Python are used almost everywhere, breaking this basic asumption is a really bad problem. Of course, all of this applies to Python 2.x; in Python 3.0 the problem was solved differently; strings are unicode by default, and the default encoding IS utf-8. As far as I've seen

Re: Unicode again ... default codec ...

2009-10-22 Thread Lele Gaifax
Gabriel Genellina gagsl-...@yahoo.com.ar writes: En Wed, 21 Oct 2009 06:24:55 -0300, Lele Gaifax l...@metapensiero.it escribió: Gabriel Genellina gagsl-...@yahoo.com.ar writes: nosetest should do nothing special. You should configure the environment so Python *knows* that your console

Re: Unicode again ... default codec ...

2009-10-22 Thread Gabriel Genellina
En Thu, 22 Oct 2009 05:25:16 -0300, Lele Gaifax l...@metapensiero.it escribió: Gabriel Genellina gagsl-...@yahoo.com.ar writes: En Wed, 21 Oct 2009 06:24:55 -0300, Lele Gaifax l...@metapensiero.it escribió: Gabriel Genellina gagsl-...@yahoo.com.ar writes: nosetest should do nothing

Re: Unicode again ... default codec ...

2009-10-22 Thread Lele Gaifax
Gabriel Genellina gagsl-...@yahoo.com.ar writes: En Thu, 22 Oct 2009 05:25:16 -0300, Lele Gaifax l...@metapensiero.it escribió: Who is the culprit here? unittest, or ultimately, this bug: http://bugs.python.org/issue4947 Thank you. In particular I found

Re: Unicode again ... default codec ...

2009-10-22 Thread Wolodja Wentland
On Thu, Oct 22, 2009 at 13:59 +0200, Lele Gaifax wrote: Gabriel Genellina gagsl-...@yahoo.com.ar writes: unittest, or ultimately, this bug: http://bugs.python.org/issue4947 http://bugs.python.org/issue4947#msg87637 as the best fit, I think You might also want to have a look at:

Re: Unicode again ... default codec ...

2009-10-21 Thread Lele Gaifax
Gabriel Genellina gagsl-...@yahoo.com.ar writes: DON'T do that. Really. Changing the default encoding is a horrible, horrible hack and causes a lot of problems. ... More reasons: http://tarekziade.wordpress.com/2008/01/08/syssetdefaultencoding-is-evil/ See also this recent thread in

Re: Unicode again ... default codec ...

2009-10-21 Thread Gabriel Genellina
En Wed, 21 Oct 2009 06:24:55 -0300, Lele Gaifax l...@metapensiero.it escribió: Gabriel Genellina gagsl-...@yahoo.com.ar writes: DON'T do that. Really. Changing the default encoding is a horrible, horrible hack and causes a lot of problems. ... More reasons:

Unicode again ... default codec ...

2009-10-20 Thread Stef Mientki
hello, As someone else already said, every time I think : now I understand it completely, and a few weeks later ... Form the thread how to write a unicode string to a file ? and my specific situation: - reading data from Excel, Delphi and other Windows programs and unicode Python - using

Re: Unicode again ... default codec ...

2009-10-20 Thread Gabriel Genellina
En Tue, 20 Oct 2009 17:13:52 -0300, Stef Mientki stef.mien...@gmail.com escribió: Form the thread how to write a unicode string to a file ? and my specific situation: - reading data from Excel, Delphi and other Windows programs and unicode Python - using wxPython, which forces unicode -

Re: why isn't Unicode the default encoding?

2006-03-21 Thread Jon Ribbens
In article [EMAIL PROTECTED], Martin v. Löwis wrote: In any case, it doesn't matter what encoding the document is in: read(2) always returns two bytes. It returns *up to* two bytes. Sorry to be picky but I think it's relevant to the topic because it illustrates how it's difficult to change the

why isn't Unicode the default encoding?

2006-03-20 Thread John Salerno
Forgive my newbieness, but I don't quite understand why Unicode is still something that needs special treatment in Python (and perhaps elsewhere). I'm reading Dive Into Python right now, and it constantly refers to a 'regular string' versus a 'Unicode string' and how you need to convert back

Re: why isn't Unicode the default encoding?

2006-03-20 Thread Robert Kern
or the encoding tricks? It would break a hell of a lot of code. Try using the -U command line argument to the Python interpreter. That makes unicode strings default. [~]$ python -U Python 2.4.1 (#2, Mar 31 2005, 00:05:10) [GCC 3.3 20030304 (Apple Computer, Inc. build 1666)] on darwin Type help, copyright

Re: why isn't Unicode the default encoding?

2006-03-20 Thread John Salerno
a hell of a lot of code. Try using the -U command line argument to the Python interpreter. That makes unicode strings default. I figured this might have something to do with it, but then again I thought that Unicode was created as a subset of ASCII and Latin-1 so that they would be compatible

Re: why isn't Unicode the default encoding?

2006-03-20 Thread Jan Niklas Fingerle
John Salerno [EMAIL PROTECTED] wrote: to convert back and forth. But why isn't Unicode considered a regular string by now? Is it for historical reasons that we still use ASCII and Latin-1? The point is, that, with a regular string, you don't know its encoding or whether it has an encoding

Re: why isn't Unicode the default encoding?

2006-03-20 Thread Robert Kern
either. Why can't Unicode replace them so we no longer need the 'u' prefix or the encoding tricks? It would break a hell of a lot of code. Try using the -U command line argument to the Python interpreter. That makes unicode strings default. I figured this might have something to do

Re: why isn't Unicode the default encoding?

2006-03-20 Thread Jan Niklas Fingerle
Robert Kern [EMAIL PROTECTED] wrote: I see UTF-8 a lot, but this particular book also mentions that UTF-16 is the most common. Is that true? I think it unlikely, but I have no numbers to give. And I'll bet that that book doesn't either. I haven't got any numbers, but my guess would be

Re: why isn't Unicode the default encoding?

2006-03-20 Thread John Salerno
Robert Kern wrote: I figured this might have something to do with it, but then again I thought that Unicode was created as a subset of ASCII and Latin-1 so that they would be compatible...but I guess it's never that easy. :) No, it isn't. You seem to be somewhat confused about Unicode. At

Re: why isn't Unicode the default encoding?

2006-03-20 Thread John Salerno
Robert Kern wrote: http://www.joelonsoftware.com/articles/Unicode.html That was fascinating. Thank you. So as it turns out, Unicode and UTF-8 are not the same thing? Am I right to say that UTF-8 stores the first 128 Unicode code points in a single byte, and then stores higher code points

Re: why isn't Unicode the default encoding?

2006-03-20 Thread Martin v. Löwis
John Salerno wrote: Robert Kern wrote: http://www.joelonsoftware.com/articles/Unicode.html That was fascinating. Thank you. So as it turns out, Unicode and UTF-8 are not the same thing? Am I right to say that UTF-8 stores the first 128 Unicode code points in a single byte, and then

Re: why isn't Unicode the default encoding?

2006-03-20 Thread Martin v. Löwis
I figured this might have something to do with it, but then again I thought that Unicode was created as a subset of ASCII and Latin-1 so that they would be compatible...but I guess it's never that easy. :) The real problem is that the Python string type is used to represent two very

Re: why isn't Unicode the default encoding?

2006-03-20 Thread John Salerno
Martin v. Löwis wrote: John Salerno wrote: Robert Kern wrote: http://www.joelonsoftware.com/articles/Unicode.html That was fascinating. Thank you. So as it turns out, Unicode and UTF-8 are not the same thing? Am I right to say that UTF-8 stores the first 128 Unicode code points in a

Re: why isn't Unicode the default encoding?

2006-03-20 Thread John Salerno
Martin v. Löwis wrote: The real problem is that the Python string type is used to represent two very different concepts: bytes, and characters. You can't just drop the current Python string type, and use the Unicode type instead - then you would have no good way to represent sequences of

Re: why isn't Unicode the default encoding?

2006-03-20 Thread and-google
John Salerno wrote: So as it turns out, Unicode and UTF-8 are not the same thing? Well yes. UTF-8 is one scheme in which the whole Unicode character repertoire can be represented as bytes. Confusion arises because Windows uses the name 'Unicode' in character encoding lists, to mean UTF-16_LE,

Re: why isn't Unicode the default encoding?

2006-03-20 Thread Matt Goodall
John Salerno wrote: Martin v. Löwis wrote: The real problem is that the Python string type is used to represent two very different concepts: bytes, and characters. You can't just drop the current Python string type, and use the Unicode type instead - then you would have no good way to

Re: why isn't Unicode the default encoding?

2006-03-20 Thread Martin v. Löwis
John Salerno wrote: Interesting. So then the read() method, if given a numeric argument for bytes to read, would act differently depending on if you were using Unicode or not? The read method currently returns a byte string, not a Unicode string. It's not clear to me how the numeric argument