Hi Peter Otten
re:
There is no assignment
soup_atag = whatever
but there is one to atag. The whole session should when you omit the
offending line
atag = soup_atag.a
or insert
soup_atag = soup
before it.
Marko Rauhamaa wrote:
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info:
Marko Rauhamaa wrote:
Py3's byte strings are still strings, though.
Hm. I don't think so. In a plain English sense, maybe, but that kind of
usage can lead to confusion.
Only if you are determined to confuse
On Tue, Nov 25, 2014 at 10:56 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
I think this conversation is going nowhere, so it's probably best to end it.
\0
ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Marko Rauhamaa wrote:
Py3's byte strings are still strings, though.
Hm. I don't think so. In a plain English sense, maybe, but that kind of
usage can lead to confusion.
Only if you are determined to confuse yourself.
People are quite capable of interpreting correctly sentences like:
My
On Tue, Nov 25, 2014 at 9:56 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
In all cases apart from an explicit byte string, the word string is
always used for the native array-of-characters type delimited by plain
quotation marks, as used for error messages, user prompts,
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info:
Marko Rauhamaa wrote:
Py3's byte strings are still strings, though.
Hm. I don't think so. In a plain English sense, maybe, but that kind of
usage can lead to confusion.
Only if you are determined to confuse yourself.
{...]
In
On Mon, Nov 24, 2014 at 3:33 AM, Dennis Lee Bieber
wlfr...@ix.netcom.com wrote:
On Sat, 22 Nov 2014 20:52:37 -0500, random...@fastmail.us declaimed the
following:
On Sat, Nov 22, 2014, at 18:38, Mark Lawrence wrote:
...
That is a standard Windows build. He is again conflating problems with
On Sun, Nov 23, 2014, at 11:33, Dennis Lee Bieber wrote:
Why would that be possible? Many truetype fonts only supply glyphs for
single-byte encodings (ISO-Latin-1, for example -- pop up the Windows
character map utility and see what some of the font files contain.
With a bitmap font
On 11/23/2014 01:13 PM, random...@fastmail.us wrote:
On Sun, Nov 23, 2014, at 11:33, Dennis Lee Bieber wrote:
Why would that be possible? Many truetype fonts only supply glyphs for
single-byte encodings (ISO-Latin-1, for example -- pop up the Windows
character map utility and see what
On Mon, Nov 24, 2014 at 7:31 AM, Dave Angel d...@davea.name wrote:
On 11/23/2014 01:13 PM, random...@fastmail.us wrote:
On Sun, Nov 23, 2014, at 11:33, Dennis Lee Bieber wrote:
Why would that be possible? Many truetype fonts only supply
glyphs for
single-byte encodings (ISO-Latin-1,
Marko Rauhamaa wrote:
Unicode strings is not wrong but the technical emphasis on Unicode is as
strange as a tire car or rectangular door when car and door are
what you usually mean.
The reason Unicode gets emphasised so much is that
until relatively recently, it *wasn't* what string
usually
On Mon, Nov 24, 2014 at 9:51 AM, Gregory Ewing
greg.ew...@canterbury.ac.nz wrote:
Marko Rauhamaa wrote:
Unicode strings is not wrong but the technical emphasis on Unicode is as
strange as a tire car or rectangular door when car and door are
what you usually mean.
The reason Unicode gets
On Sun, Nov 23, 2014, at 15:31, Dave Angel wrote:
I didn't realize Windows shell (DOS box) had that bug. Course I don't
use Windows much the last few years.
it's one thing to not display it properly. It's quite another to supply
faulty data to the clipboard. Especially since the Windows
Gregory Ewing greg.ew...@canterbury.ac.nz:
Marko Rauhamaa wrote:
Unicode strings is not wrong but the technical emphasis on Unicode is as
strange as a tire car or rectangular door when car and door are
what you usually mean.
The reason Unicode gets emphasised so much is that until relatively
On Mon, Nov 24, 2014 at 5:57 PM, Marko Rauhamaa ma...@pacujo.net wrote:
Yes, people call strings Unicdoe strings because Python2 *did have*
unicode strings separate from regular strings:
Python2Python3
--
string bytes
Chris Angelico ros...@gmail.com:
Py3's byte strings are still strings, though.
Hm. I don't think so. In a plain English sense, maybe, but that kind of
usage can lead to confusion.
For example,
A subscription selects an item of a sequence (string, tuple or list)
or mapping (dictionary)
Marko Rauhamaa wrote:
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info:
In Python, we have Unicode strings and byte strings.
No, you don't. You have strings and bytes:
Python has strings of Unicode code points, a.k.a. Unicode strings,
or text strings, and strings of bytes, a.k.a. byte
On Sun, Nov 23, 2014 at 12:50 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
Tire car makes no sense. Rectangular door makes perfect sense, and in a
world where there are dozens of legacy non-rectangular doors, it would be
very sensible to specify the kind of door. Just as we
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info:
You haven't given any good reason for objecting to calling Unicode
strings by what they are. Maybe you think that it is an implementation
detail, and that some version of Python might suddenly and without
warning change to only supporting
In article 87y4r348uf@elektro.pacujo.net,
Marko Rauhamaa ma...@pacujo.net wrote:
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info:
You haven't given any good reason for objecting to calling Unicode
strings by what they are. Maybe you think that it is an implementation
detail,
Roy Smith r...@panix.com:
For that matter, we will eventually get to the point where when people
say, just plain text, they will mean Unicode, in the same way that
just plain text today really means ASCII (and the text/plain MIME
type will become a historical curiosity).
MIME has:
On Saturday, November 22, 2014 8:14:15 PM UTC+5:30, Roy Smith wrote:
Marko Rauhamaa wrote:
Steven D'Aprano:
You haven't given any good reason for objecting to calling Unicode
strings by what they are. Maybe you think that it is an implementation
detail, and that some version of
wxjmfa...@gmail.com:
- By chance, I found on the web a German py dev who was commenting and
he had not an updated DUDEN (a German dictionnary).
That... leaves me utterly speachless!
Marko
--
https://mail.python.org/mailman/listinfo/python-list
On 22/11/2014 17:49, Marko Rauhamaa wrote:
wxjmfa...@gmail.com:
- By chance, I found on the web a German py dev who was commenting and
he had not an updated DUDEN (a German dictionnary).
That... leaves me utterly speachless!
Marko
Please don't feed him. Your average troll is bad enough
On Sun, Nov 23, 2014 at 5:17 AM, Mark Lawrence breamore...@yahoo.co.uk wrote:
Please don't feed him. Your average troll is bad enough but he really takes
the biscuit.
... someone was feeding him biscuits?
ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
On 22/11/2014 20:17, Chris Angelico wrote:
On Sun, Nov 23, 2014 at 5:17 AM, Mark Lawrence breamore...@yahoo.co.uk wrote:
Please don't feed him. Your average troll is bad enough but he really takes
the biscuit.
... someone was feeding him biscuits?
ChrisA
Surely it's better than feeding
On Sun, Nov 23, 2014 at 9:04 AM, Mark Lawrence breamore...@yahoo.co.uk wrote:
My favourite find thousand and one ways to make Python crashing or
failing. but I don't recall a single bug report in the last two years from
anybody regarding problems with the FSR, or have I missed something?
What
On 22/11/2014 22:31, Chris Angelico wrote:
On Sun, Nov 23, 2014 at 9:04 AM, Mark Lawrence breamore...@yahoo.co.uk wrote:
My favourite find thousand and one ways to make Python crashing or
failing. but I don't recall a single bug report in the last two years from
anybody regarding problems with
On Fri, Nov 21, 2014, at 23:38, Steven D'Aprano wrote:
I really don't understand what bothers you about this. In Python, we have
Unicode strings and byte strings. In computing in general, strings can
consist of Unicode characters, ASCII characters, Tron characters, EBCDID
characters,
On Sat, Nov 22, 2014, at 18:38, Mark Lawrence wrote:
...
That is a standard Windows build. He is again conflating problems with
using the Windows command line for a given code page with the FSR.
The thing is, with a truetype font selected, a correctly written win32
console problem should be
On Sun, Nov 23, 2014 at 12:52 PM, random...@fastmail.us wrote:
On Sat, Nov 22, 2014, at 18:38, Mark Lawrence wrote:
...
That is a standard Windows build. He is again conflating problems with
using the Windows command line for a given code page with the FSR.
The thing is, with a truetype
On Sat, Nov 22, 2014, at 21:11, Chris Angelico wrote:
Is that true? Does WriteConsoleW support every Unicode character? It's
not obvious from the docs whether it uses UCS-2 or UTF-16 (or maybe
something else).
I was defining every unicode character loosely. There are certainly
display problems
random...@fastmail.us wrote:
On Fri, Nov 21, 2014, at 23:38, Steven D'Aprano wrote:
I really don't understand what bothers you about this. In Python, we have
Unicode strings and byte strings. In computing in general, strings can
consist of Unicode characters, ASCII characters, Tron
On Sun, Nov 23, 2014 at 5:17 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
If Python treated the character set as an implementation detail, the
programmer would have no way of knowing whether
s = uö
is legal or not, since you cannot know whether or not ö is a supported
Chris Angelico ros...@gmail.com:
Then you need to read more about Unicode. The *codepoint* for the
letter 'A' is 65. That is not Unicode, that is one part of the Unicode
spec.
I don't think Python users need to know anything more about Unicode than
they need to know about IEEE-754.
How many
On Fri, Nov 21, 2014 at 7:16 PM, Marko Rauhamaa ma...@pacujo.net wrote:
Chris Angelico ros...@gmail.com:
Then you need to read more about Unicode. The *codepoint* for the
letter 'A' is 65. That is not Unicode, that is one part of the Unicode
spec.
I don't think Python users need to know
Chris Angelico wrote:
On Fri, Nov 21, 2014 at 11:32 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
(E.g. there are millions of existing files across the world containing
text which use legacy encodings that are not compatible with Unicode.)
Not compatible with Unicode?
On Sat, Nov 22, 2014 at 2:23 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
Chris Angelico wrote:
On Fri, Nov 21, 2014 at 11:32 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
(E.g. there are millions of existing files across the world containing
text which
On 2014-11-22 02:23, Steven D'Aprano wrote:
LATIN SMALL LETTER E
COMBINING CIRCUMFLEX ACCENT
then my application should treat that as a single character and
display it as:
LATIN SMALL LETTER E WITH CIRCUMFLEX
which looks like this: ê
rather than two distinct characters eˆ
Now,
On Friday, November 21, 2014 12:06:54 PM UTC+5:30, Marko Rauhamaa wrote:
Chris Angelico :
On Fri, Nov 21, 2014 at 5:56 AM, Marko Rauhamaa wrote:
I don't really like it how Unicode is equated with text, or even
character strings.
[...]
Do you have actual text that you're unable to
On 11/20/2014 04:15 PM, Chris Angelico wrote:
On Fri, Nov 21, 2014 at 1:14 AM, Francis Moreau francis.m...@gmail.com
wrote:
Hi,
Thanks for the from __future__ import unicode_literals trick, it makes
that switch much less intrusive.
However it seems that I will suddenly be trapped by all
On Sat, Nov 22, 2014 at 3:11 AM, Francis Moreau francis.m...@gmail.com wrote:
Yes I finally used str() since only setlocale() reported to have some
issues with unicode_literals active in my appliction.
Thanks Chris for your useful insight.
My pleasure. Unicode is a bit of a hobby-horse of
Rustom Mody rustompm...@gmail.com:
Likewise in 2014, and given the arguments, inconsistencies, etc
remembering the nuts-n-bolts below the strings-represented-as-unicode
abstraction may be in order.
No need to hide Unicode, but talking about a
Unicode string
is like talking about an
On Sat, Nov 22, 2014 at 3:36 AM, Marko Rauhamaa ma...@pacujo.net wrote:
No need to hide Unicode, but talking about a
Unicode string
is like talking about an
electronic computer
visible spectrum display
mouse user interface
ethernet socket
magnetic file
Marko Rauhamaa wrote:
Rustom Mody rustompm...@gmail.com:
Likewise in 2014, and given the arguments, inconsistencies, etc
remembering the nuts-n-bolts below the strings-represented-as-unicode
abstraction may be in order.
No need to hide Unicode, but talking about a
Unicode string
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info:
In Python, we have Unicode strings and byte strings.
No, you don't. You have strings and bytes:
Textual data in Python is handled with str objects, or strings.
Strings are immutable sequences of Unicode code points. String
literals
Hello,
My application is using gettext module to do the translation
stuff. Translated messages are unicode on both python 2 and
3 (with python2.7 I had to explicitely asked for unicode).
A problem arises when formatting those messages before logging
them. For example:
log.debug(%s: %s %
On Thu, Nov 20, 2014 at 8:40 PM, Francis Moreau francis.m...@gmail.com wrote:
My question is: how should this be fixed properly ?
A simple solution would be to force all strings passed to the
logger to be unicode:
log.debug(u%s: %s % ...)
and more generally force all string in my code to
Francis Moreau wrote:
Hello,
My application is using gettext module to do the translation
stuff. Translated messages are unicode on both python 2 and
3 (with python2.7 I had to explicitely asked for unicode).
A problem arises when formatting those messages before logging
them. For
On Thu, Nov 20, 2014 at 11:35 PM, Peter Otten __pete...@web.de wrote:
You don't need to change an all-ascii bytestring to unicode.
Lo and behold:
%s %s % (uüblich, uähnlich)
u'\xfcblich \xe4hnlich'
u%s %s % (uüblich, uähnlich)
u'\xfcblich \xe4hnlich'
Only non-ascii bytestrings mean
On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote:
%s nötig %s % (uüblich, uähnlich)
Traceback (most recent call last):
File stdin, line 1, in module
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4:
ordinal not in range(128)
This is surprising to me - why is it
Hi,
On 11/20/2014 11:47 AM, Chris Angelico wrote:
On Thu, Nov 20, 2014 at 8:40 PM, Francis Moreau francis.m...@gmail.com
wrote:
My question is: how should this be fixed properly ?
A simple solution would be to force all strings passed to the
logger to be unicode:
log.debug(u%s: %s %
On Fri, Nov 21, 2014 at 12:59 AM, random...@fastmail.us wrote:
On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote:
%s nötig %s % (uüblich, uähnlich)
Traceback (most recent call last):
File stdin, line 1, in module
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4:
On Fri, Nov 21, 2014 at 1:14 AM, Francis Moreau francis.m...@gmail.com wrote:
Hi,
Thanks for the from __future__ import unicode_literals trick, it makes
that switch much less intrusive.
However it seems that I will suddenly be trapped by all modules which
are not prepared to handle unicode.
random...@fastmail.us wrote:
On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote:
%s nötig %s % (uüblich, uähnlich)
Traceback (most recent call last):
File stdin, line 1, in module
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4:
ordinal not in range(128)
This is
On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten __pete...@web.de wrote:
I think that you may get a Unicode/Encode/Error when you try to /decode/ a
unicode string is more confusing...
Hang on a minute, what does it even mean to decode a Unicode string?
That's where the problem is. Fortunately
Chris Angelico wrote:
On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten __pete...@web.de wrote:
I think that you may get a Unicode/Encode/Error when you try to /decode/
a unicode string is more confusing...
Hang on a minute, what does it even mean to decode a Unicode string?
Let's not get
On Fri, Nov 21, 2014 at 3:32 AM, Peter Otten __pete...@web.de wrote:
Chris Angelico wrote:
On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten __pete...@web.de wrote:
I think that you may get a Unicode/Encode/Error when you try to /decode/
a unicode string is more confusing...
Hang on a minute,
On 11/20/2014 09:32 AM, Peter Otten wrote:
Chris Angelico wrote:
On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten __pete...@web.de wrote:
I think that you may get a Unicode/Encode/Error when you try to /decode/
a unicode string is more confusing...
Hang on a minute, what does it even mean to
Chris Angelico wrote:
On Fri, Nov 21, 2014 at 3:32 AM, Peter Otten __pete...@web.de wrote:
Chris Angelico wrote:
On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten __pete...@web.de wrote:
I think that you may get a Unicode/Encode/Error when you try to
/decode/ a unicode string is more
On Thu, Nov 20, 2014, at 09:59, Chris Angelico wrote:
On Fri, Nov 21, 2014 at 12:59 AM, random...@fastmail.us wrote:
On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote:
%s nötig %s % (uüblich, uähnlich)
Traceback (most recent call last):
File stdin, line 1, in module
On Thu, Nov 20, 2014 at 10:42 AM, random...@fastmail.us wrote:
and it means you can't safely
blindly use %s with an unknown object.
You can't safely do this anyway. Whether it's %s with a str and a
unicode, or %s with a unicode and a str, *something* is going to have
to be implicitly encoded
On Thu, Nov 20, 2014 at 11:06 AM, Ian Kelly ian.g.ke...@gmail.com wrote:
On Thu, Nov 20, 2014 at 10:42 AM, random...@fastmail.us wrote:
and it means you can't safely
blindly use %s with an unknown object.
You can't safely do this anyway. Whether it's %s with a str and a
unicode, or %s with
random...@fastmail.us wrote:
On Thu, Nov 20, 2014, at 09:59, Chris Angelico wrote:
On Fri, Nov 21, 2014 at 12:59 AM, random...@fastmail.us wrote:
On Thu, Nov 20, 2014, at 07:35, Peter Otten wrote:
%s nötig %s % (uüblich, uähnlich)
Traceback (most recent call last):
File stdin, line
Michael Torrie torr...@gmail.com:
Unicode can only be encoded to bytes.
Bytes can only be decoded to unicode.
I don't really like it how Unicode is equated with text, or even
character strings.
There's barely any difference between the truth value of these
statements:
Python strings are
On 20/11/2014 18:06, Ian Kelly wrote:
On Thu, Nov 20, 2014 at 10:42 AM, random...@fastmail.us wrote:
and it means you can't safely
blindly use %s with an unknown object.
You can't safely do this anyway. Whether it's %s with a str and a
unicode, or %s with a unicode and a str, *something* is
On 11/20/2014 07:53 AM, Chris Angelico wrote:
On Fri, Nov 21, 2014 at 2:40 AM, Peter Otten __pete...@web.de wrote:
I think that you may get a Unicode/Encode/Error when you try to /decode/ a
unicode string is more confusing...
Hang on a minute, what does it even mean to decode a Unicode
Ethan Furman et...@stoneleaf.us:
If your unicode string happens to contain a base64 encoded .png, then
you could decode that into bytes. ;)
You could embed your PNG file in XML in binary form as CDATA. Then, your
characters would represent 8- or 16-bit integers. You just need to
replace all
On Thu, Nov 20, 2014, at 16:29, Ethan Furman wrote:
If your unicode string happens to contain a base64 encoded .png, then you
could decode that into bytes. ;)
Bytes of the PNG, or of the raw pixels?
--
https://mail.python.org/mailman/listinfo/python-list
On Fri, Nov 21, 2014 at 4:42 AM, random...@fastmail.us wrote:
On Thu, Nov 20, 2014, at 09:59, Chris Angelico wrote:
Why should it encode to bytes?
Because a bytes format string suggests a bytes result. Why does unicode
always win, rather than the type of the format string always winning?
On Fri, Nov 21, 2014 at 5:56 AM, Marko Rauhamaa ma...@pacujo.net wrote:
Michael Torrie torr...@gmail.com:
Unicode can only be encoded to bytes.
Bytes can only be decoded to unicode.
I don't really like it how Unicode is equated with text, or even
character strings.
There's barely any
Marko Rauhamaa wrote:
Michael Torrie torr...@gmail.com:
Unicode can only be encoded to bytes.
Bytes can only be decoded to unicode.
I don't really like it how Unicode is equated with text, or even
character strings.
That surely depends on the context. To be technically correct, Unicode
On Fri, Nov 21, 2014 at 11:32 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
(E.g. there are millions of existing files across the world containing text
which use legacy encodings that are not compatible with Unicode.)
Not compatible with Unicode? There aren't many character
On Thu, Nov 20, 2014, at 20:10, Chris Angelico wrote:
2) Languages which use a different alphabet (eg Cyrillic - Russian,
Bulgarian). You could possibly cram them into an eight-bit encoding
without tipping ASCII out, but I'm not sure. In Unicode, these
languages are all easily supported by the
On Fri, Nov 21, 2014 at 12:31 PM, random...@fastmail.us wrote:
On Thu, Nov 20, 2014, at 20:10, Chris Angelico wrote:
2) Languages which use a different alphabet (eg Cyrillic - Russian,
Bulgarian). You could possibly cram them into an eight-bit encoding
without tipping ASCII out, but I'm not
Chris Angelico ros...@gmail.com:
On Fri, Nov 21, 2014 at 5:56 AM, Marko Rauhamaa ma...@pacujo.net wrote:
I don't really like it how Unicode is equated with text, or even
character strings.
[...]
Do you have actual text that you're unable to represent in Unicode?
Not my point at all.
I'm
On Fri, Nov 21, 2014 at 5:36 PM, Marko Rauhamaa ma...@pacujo.net wrote:
Chris Angelico ros...@gmail.com:
On Fri, Nov 21, 2014 at 5:56 AM, Marko Rauhamaa ma...@pacujo.net wrote:
I don't really like it how Unicode is equated with text, or even
character strings.
[...]
Do you have actual text
Chris Angelico ros...@gmail.com:
On Fri, Nov 21, 2014 at 5:36 PM, Marko Rauhamaa ma...@pacujo.net wrote:
I'm saying equating an abstract data type (string) with its
representation (Unicode vector) is bad taste.
What about sequence of Unicode code points is representation? What
is your
On Fri, Nov 21, 2014 at 6:14 PM, Marko Rauhamaa ma...@pacujo.net wrote:
Chris Angelico ros...@gmail.com:
On Fri, Nov 21, 2014 at 5:36 PM, Marko Rauhamaa ma...@pacujo.net wrote:
I'm saying equating an abstract data type (string) with its
representation (Unicode vector) is bad taste.
What
79 matches
Mail list logo