Hello.
I almost like everything in Python. Code shrinking, logic of processes,
libraries, code design etc.
But, we... - everybody knows that Python 2.x has lack of unicode support.
In Python 3.x, this has been fixed :) And I like 3.x more than 2.x
But, still major applications haven't been ported
...'
# And finally, most annoying thing isn't it?
Thanks...
I think you're misunderstanding what Unicode support means. Python 2
does have unicode support, but it doesn't do Unicode by default. And a
lack of Unicode by default does not mean ASCII either.
There are two ways of looking at strings: as a sequence
Terry Reedy wrote:
Is there a unix linux package that can be installed that
drops at least 'one' default standard font that will be able to render
all or 'most' (whatever I mean by that) code points in unicode? Is this
a Python issue at all?
Easy, practical use of unicode is still a work in
On Fri, 13 May 2011 14:53:50 -0500, harrismh777 wrote:
The unicode consortium is very careful to make sure that thousands
of symbols have a unique code point (that's great !) but how do these
thousands of symbols actually get displayed if there is no font
consortium? Are there
On 14 mai, 09:41, harrismh777 harrismh...@charter.net wrote:
...
I'm getting much closer here,
...
You should really understand, that Unicode is a domain per
se. It is independent from any os's, programming languages
or applications. It is up to these tools to be unicode
compliant.
Working
On 5/14/2011 3:41 AM, harrismh777 wrote:
Terry Reedy wrote:
Easy, practical use of unicode is still a work in progress.
Apparently... the good news for me is that SBL provides their unicode
font here:
http://www.sbl-site.org/educational/biblicalfonts.aspx
I'm getting much closer here, but
Terry Reedy tjre...@udel.edu writes:
You need what is called, at least with Windows, an IME -- Input Method
Editor.
For a GNOME or KDE environment you want an input method framework; I
recommend IBus URL:http://code.google.com/p/ibus/ which comes with the
major GNU+Linux operating systems
On 12 mai, 18:17, Ian Kelly ian.g.ke...@gmail.com wrote:
...
to worry about encodings are when you're encoding unicode characters
to byte strings, or decoding bytes to unicode characters
A small but important correction/clarification:
In Unicode, unicode does not encode a *character*. It
jmfauth wrote:
to worry about encodings are when you're encoding unicode characters
to byte strings, or decoding bytes to unicode characters
A small but important correction/clarification:
In Unicode, unicode does not encode a*character*. It
encodes a*code point*, a number, the integer
On 5/13/11 2:53 PM, harrismh777 wrote:
The unicode consortium is very careful to make sure that thousands of symbols
have a unique code point (that's great !) but how do these thousands of symbols
actually get displayed if there is no font consortium? Are there collections of
'standard' fonts
On 5/13/2011 3:53 PM, harrismh777 wrote:
The unicode consortium is very careful to make sure that thousands of
symbols have a unique code point (that's great !) but how do these
thousands of symbols actually get displayed if there is no font
consortium? Are there collections of 'standard' fonts
John Machin wrote:
On Thu, May 12, 2011 2:14 pm, Benjamin Kaplan wrote:
If the file you're writing to doesn't specify an encoding, Python will
default to locale.getdefaultencoding(),
No such attribute. Perhaps you mean locale.getpreferredencoding()
import locale
Ben Finney wrote:
I'd phrase that as:
* Text is a sequence of characters. Most inputs to the program,
including files, sockets, etc., contain a sequence of bytes.
* Always know whether you're dealing with text or with bytes. No object
can be both.
* In Python 2, ‘str’ is the type
Terry Reedy wrote:
It does not matter how Python stored the unicode internally. Does this
help? Your intent is signalled by how you open the file.
Very much, actually, thanks. I was missing the 'internal' piece, and
did not realize that if I didn't specify the encoding on the open that
On Thu, May 12, 2011 4:31 pm, harrismh777 wrote:
So, the UTF-16 UTF-32 is INTERNAL only, for Python
NO. See one of my previous messages. UTF-16 and UTF-32, like UTF-8 are
encodings for the EXTERNAL representation of Unicode characters in byte
streams.
I also was not aware that UTF-8 chars
John Machin wrote:
On Thu, May 12, 2011 2:14 pm, Benjamin Kaplan wrote:
If the file you're writing to doesn't specify an encoding, Python will
default to locale.getdefaultencoding(),
No such attribute. Perhaps you mean locale.getpreferredencoding()
what about sys.getfilesystemencoding()
On Thu, May 12, 2011 at 1:58 AM, John Machin sjmac...@lexicon.net wrote:
On Thu, May 12, 2011 4:31 pm, harrismh777 wrote:
So, the UTF-16 UTF-32 is INTERNAL only, for Python
NO. See one of my previous messages. UTF-16 and UTF-32, like UTF-8 are
encodings for the EXTERNAL representation of
On 5/12/2011 12:17 PM, Ian Kelly wrote:
On Thu, May 12, 2011 at 1:58 AM, John Machinsjmac...@lexicon.net wrote:
On Thu, May 12, 2011 4:31 pm, harrismh777 wrote:
So, the UTF-16 UTF-32 is INTERNAL only, for Python
NO. See one of my previous messages. UTF-16 and UTF-32, like UTF-8 are
On Thu, May 12, 2011 at 2:42 PM, Terry Reedy tjre...@udel.edu wrote:
On 5/12/2011 12:17 PM, Ian Kelly wrote:
Right. *Under the hood* Python uses UCS-2 (which is not exactly the
same thing as UTF-16, by the way) to represent Unicode strings.
I know some people say that, but according to the
hi folks,
I am puzzled by unicode generally, and within the context of python
specifically. For one thing, what do we mean that unicode is used in
python 3.x by default. (I know what default means, I mean, what changed?)
I think part of my problem is that I'm spoiled (American, ascii
that UCS-2 has always been the default unicode width for
CPython, although the exact representation used internally is an
implementation detail.
The books say that the .py sources are UTF-8 by default... and that 3.x is
either UCS-2 or UCS-4. If I use the file handling capabilities of Python
On Wed, May 11, 2011 at 2:37 PM, harrismh777 harrismh...@charter.net wrote:
hi folks,
I am puzzled by unicode generally, and within the context of python
specifically. For one thing, what do we mean that unicode is used in python
3.x by default. (I know what default means, I mean, what
Ian Kelly wrote:
Ian, Benjamin, thanks much.
The `unicode' class was renamed to `str', and a stripped-down version
of the 2.X `str' class was renamed to `bytes'.
... thank you, this is very helpful.
If I do not specify any code points above ascii 0xFF does any of this
matter
On Thu, May 12, 2011 8:51 am, harrismh777 wrote:
Is it true that if I am
working without using bytes sequences that I will not need to care about
the encoding anyway, unless of course I need to specify a unicode code
point?
Quite the contrary.
(1) You cannot work without using bytes
John Machin wrote:
(1) You cannot work without using bytes sequences. Files are byte
sequences. Web communication is in bytes. You need to (know / assume / be
able to extract / guess) the input encoding. You need to encode your
output using an encoding that is expected by the consumer (or use an
On 12/05/2011 02:22, harrismh777 wrote:
John Machin wrote:
(1) You cannot work without using bytes sequences. Files are byte
sequences. Web communication is in bytes. You need to (know / assume / be
able to extract / guess) the input encoding. You need to encode your
output using an encoding
On Thu, 12 May 2011 03:31:18 +0100, MRAB wrote:
Another question... in mail I'm receiving many small blocks that look
like sprites with four small hex codes, scattered about the mail...
mostly punctuation, maybe? ... guessing, are these unicode code points,
and if so what is the best way to
Steven D'Aprano wrote:
You need to understand the difference between characters and bytes.
http://www.joelonsoftware.com/articles/Unicode.html
is also a good resource.
Thanks for being patient guys, here's what I've done:
astr=pound sign
asym= \u00A3
afile=open(myfile, mode='w')
On Thu, May 12, 2011 11:22 am, harrismh777 wrote:
John Machin wrote:
(1) You cannot work without using bytes sequences. Files are byte
sequences. Web communication is in bytes. You need to (know / assume /
be
able to extract / guess) the input encoding. You need to encode your
output using
MRAB pyt...@mrabarnett.plus.com writes:
You need to understand the difference between characters and bytes.
Yep. Those who don't need to join us in the third millennium, and the
resources pointed out in this thread are good to help that.
A string contains characters, a file contains bytes.
On 5/11/2011 11:44 PM, harrismh777 wrote:
Steven D'Aprano wrote:
You need to understand the difference between characters and bytes.
http://www.joelonsoftware.com/articles/Unicode.html
is also a good resource.
Thanks for being patient guys, here's what I've done:
astr=pound sign
asym=
On Thu, May 12, 2011 1:44 pm, harrismh777 wrote:
By
default it looks like Python3 is writing output with UTF-8 as default...
and I thought that by default Python3 was using either UTF-16 or UTF-32.
So, I'm confused here... also, I used the character sequence \u00A3
which I thought was
On Wed, May 11, 2011 at 8:44 PM, harrismh777 harrismh...@charter.net wrote:
Steven D'Aprano wrote:
You need to understand the difference between characters and bytes.
http://www.joelonsoftware.com/articles/Unicode.html
is also a good resource.
Thanks for being patient guys, here's what
On Thu, May 12, 2011 2:14 pm, Benjamin Kaplan wrote:
If the file you're writing to doesn't specify an encoding, Python will
default to locale.getdefaultencoding(),
No such attribute. Perhaps you mean locale.getpreferredencoding()
--
http://mail.python.org/mailman/listinfo/python-list
that dictionaries in Python are used almost everywhere,
breaking this basic asumption is a really bad problem.
Of course, all of this applies to Python 2.x; in Python 3.0 the problem
was solved differently; strings are unicode by default, and the default
encoding IS utf-8.
As far as I've seen
Gabriel Genellina gagsl-...@yahoo.com.ar writes:
En Wed, 21 Oct 2009 06:24:55 -0300, Lele Gaifax l...@metapensiero.it
escribió:
Gabriel Genellina gagsl-...@yahoo.com.ar writes:
nosetest should do nothing special. You should configure the environment
so Python *knows* that your console
En Thu, 22 Oct 2009 05:25:16 -0300, Lele Gaifax l...@metapensiero.it
escribió:
Gabriel Genellina gagsl-...@yahoo.com.ar writes:
En Wed, 21 Oct 2009 06:24:55 -0300, Lele Gaifax l...@metapensiero.it
escribió:
Gabriel Genellina gagsl-...@yahoo.com.ar writes:
nosetest should do nothing
Gabriel Genellina gagsl-...@yahoo.com.ar writes:
En Thu, 22 Oct 2009 05:25:16 -0300, Lele Gaifax l...@metapensiero.it
escribió:
Who is the culprit here?
unittest, or ultimately, this bug: http://bugs.python.org/issue4947
Thank you. In particular I found
On Thu, Oct 22, 2009 at 13:59 +0200, Lele Gaifax wrote:
Gabriel Genellina gagsl-...@yahoo.com.ar writes:
unittest, or ultimately, this bug: http://bugs.python.org/issue4947
http://bugs.python.org/issue4947#msg87637 as the best fit, I think
You might also want to have a look at:
Gabriel Genellina gagsl-...@yahoo.com.ar writes:
DON'T do that. Really. Changing the default encoding is a horrible,
horrible hack and causes a lot of problems.
...
More reasons:
http://tarekziade.wordpress.com/2008/01/08/syssetdefaultencoding-is-evil/
See also this recent thread in
En Wed, 21 Oct 2009 06:24:55 -0300, Lele Gaifax l...@metapensiero.it
escribió:
Gabriel Genellina gagsl-...@yahoo.com.ar writes:
DON'T do that. Really. Changing the default encoding is a horrible,
horrible hack and causes a lot of problems.
...
More reasons:
hello,
As someone else already said,
every time I think : now I understand it completely, and a few weeks
later ...
Form the thread how to write a unicode string to a file ?
and my specific situation:
- reading data from Excel, Delphi and other Windows programs and unicode
Python
- using
En Tue, 20 Oct 2009 17:13:52 -0300, Stef Mientki stef.mien...@gmail.com
escribió:
Form the thread how to write a unicode string to a file ?
and my specific situation:
- reading data from Excel, Delphi and other Windows programs and unicode
Python
- using wxPython, which forces unicode
-
In article [EMAIL PROTECTED], Martin v. Löwis wrote:
In any case, it doesn't matter what encoding the document is in:
read(2) always returns two bytes.
It returns *up to* two bytes. Sorry to be picky but I think it's
relevant to the topic because it illustrates how it's difficult
to change the
Forgive my newbieness, but I don't quite understand why Unicode is still
something that needs special treatment in Python (and perhaps
elsewhere). I'm reading Dive Into Python right now, and it constantly
refers to a 'regular string' versus a 'Unicode string' and how you need
to convert back
or the encoding tricks?
It would break a hell of a lot of code. Try using the -U command line argument
to the Python interpreter. That makes unicode strings default.
[~]$ python -U
Python 2.4.1 (#2, Mar 31 2005, 00:05:10)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1666)] on darwin
Type help, copyright
a hell of a lot of code. Try using the -U command line argument
to the Python interpreter. That makes unicode strings default.
I figured this might have something to do with it, but then again I
thought that Unicode was created as a subset of ASCII and Latin-1 so
that they would be compatible
John Salerno [EMAIL PROTECTED] wrote:
to convert back and forth. But why isn't Unicode considered a regular
string by now? Is it for historical reasons that we still use ASCII and
Latin-1?
The point is, that, with a regular string, you don't know its encoding
or whether it has an encoding
either.
Why can't Unicode replace them so we no longer need the 'u'
prefix or the encoding tricks?
It would break a hell of a lot of code. Try using the -U command line argument
to the Python interpreter. That makes unicode strings default.
I figured this might have something to do
Robert Kern [EMAIL PROTECTED] wrote:
I see UTF-8 a lot, but this particular book also mentions that UTF-16 is
the most common. Is that true?
I think it unlikely, but I have no numbers to give. And I'll bet that that
book
doesn't either.
I haven't got any numbers, but my guess would be
Robert Kern wrote:
I figured this might have something to do with it, but then again I
thought that Unicode was created as a subset of ASCII and Latin-1 so
that they would be compatible...but I guess it's never that easy. :)
No, it isn't. You seem to be somewhat confused about Unicode. At
Robert Kern wrote:
http://www.joelonsoftware.com/articles/Unicode.html
That was fascinating. Thank you. So as it turns out, Unicode and UTF-8
are not the same thing? Am I right to say that UTF-8 stores the first
128 Unicode code points in a single byte, and then stores higher code
points
John Salerno wrote:
Robert Kern wrote:
http://www.joelonsoftware.com/articles/Unicode.html
That was fascinating. Thank you. So as it turns out, Unicode and UTF-8
are not the same thing? Am I right to say that UTF-8 stores the first
128 Unicode code points in a single byte, and then
I figured this might have something to do with it, but then again I
thought that Unicode was created as a subset of ASCII and Latin-1 so
that they would be compatible...but I guess it's never that easy. :)
The real problem is that the Python string type is used to represent
two very
Martin v. Löwis wrote:
John Salerno wrote:
Robert Kern wrote:
http://www.joelonsoftware.com/articles/Unicode.html
That was fascinating. Thank you. So as it turns out, Unicode and UTF-8
are not the same thing? Am I right to say that UTF-8 stores the first
128 Unicode code points in a
Martin v. Löwis wrote:
The real problem is that the Python string type is used to represent
two very different concepts: bytes, and characters. You can't just drop
the current Python string type, and use the Unicode type instead - then
you would have no good way to represent sequences of
John Salerno wrote:
So as it turns out, Unicode and UTF-8 are not the same thing?
Well yes. UTF-8 is one scheme in which the whole Unicode character
repertoire can be represented as bytes.
Confusion arises because Windows uses the name 'Unicode' in character
encoding lists, to mean UTF-16_LE,
John Salerno wrote:
Martin v. Löwis wrote:
The real problem is that the Python string type is used to represent
two very different concepts: bytes, and characters. You can't just drop
the current Python string type, and use the Unicode type instead - then
you would have no good way to
John Salerno wrote:
Interesting. So then the read() method, if given a numeric argument for
bytes to read, would act differently depending on if you were using
Unicode or not?
The read method currently returns a byte string, not a Unicode string.
It's not clear to me how the numeric argument
59 matches
Mail list logo