New submission from Bill Fenner fen...@gmail.com:
In python 2.5, shlex handled unicode input fine:
Python 2.5.1 (r251:54863, Jun 15 2008, 18:24:51)
[GCC 4.3.0 20080428 (Red Hat 4.3.0-8)] on linux2
Type help, copyright, credits or license for more information.
import shlex
shlex.split(
Bill Fenner fen...@gmail.com added the comment:
A colleague pointed out that the bad behavior was introduced in 2.5.2:
Python 2.5.2 (r252:60911, Sep 30 2008, 15:42:03)
[GCC 4.3.2 20080917 (Red Hat 4.3.2-4)] on linux2
Type help, copyright, credits or license for more information.
import shlex
Amaury Forgeot d'Arc amaur...@gmail.com added the comment:
I'll take the opposite point of view:
the bad behavior was introduced with 2.5.1 (issue1548891, r52302), and
reverted for 2.5.2 because it broke backwards compatibility with
arbitrary read buffers (issue1730114, r53831)
The difference
Bill Fenner fen...@gmail.com added the comment:
so, just to be clear, your position is that the output of shlex.split(
u'Hello, World!' ) is *supposed* to be
['H\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00,\x00\x00\x00',
Antoine Pitrou pit...@free.fr added the comment:
Hm, while the StringIO behaviour supposedly cannot be changed for
backwards-compatibility reasons, we can probably improve shlex behaviour
with unicode strings.
--
nosy: +pitrou
___
Python tracker
Amaury Forgeot d'Arc amaur...@gmail.com added the comment:
(Presented this way, my opinion becomes difficult to stand...
OTOH the docs say that the module does not support Unicode, so it's not
strictly a bug)
http://docs.python.org/library/shlex.html
Yes, shlex could be improved and encode
Marc-Andre Lemburg m...@egenix.com added the comment:
Amaury Forgeot d'Arc wrote:
Amaury Forgeot d'Arc amaur...@gmail.com added the comment:
(Presented this way, my opinion becomes difficult to stand...
OTOH the docs say that the module does not support Unicode, so it's not
strictly a