[issue6988] shlex.split() converts unicode input to UCS-4 output with varying byte order

2009-09-24 Thread Bill Fenner
New submission from Bill Fenner fen...@gmail.com: In python 2.5, shlex handled unicode input fine: Python 2.5.1 (r251:54863, Jun 15 2008, 18:24:51) [GCC 4.3.0 20080428 (Red Hat 4.3.0-8)] on linux2 Type help, copyright, credits or license for more information. import shlex shlex.split(

[issue6988] shlex.split() converts unicode input to UCS-4 output with varying byte order

2009-09-24 Thread Bill Fenner
Bill Fenner fen...@gmail.com added the comment: A colleague pointed out that the bad behavior was introduced in 2.5.2: Python 2.5.2 (r252:60911, Sep 30 2008, 15:42:03) [GCC 4.3.2 20080917 (Red Hat 4.3.2-4)] on linux2 Type help, copyright, credits or license for more information. import shlex

[issue6988] shlex.split() converts unicode input to UCS-4 output with varying byte order

2009-09-24 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc amaur...@gmail.com added the comment: I'll take the opposite point of view: the bad behavior was introduced with 2.5.1 (issue1548891, r52302), and reverted for 2.5.2 because it broke backwards compatibility with arbitrary read buffers (issue1730114, r53831) The difference

[issue6988] shlex.split() converts unicode input to UCS-4 output with varying byte order

2009-09-24 Thread Bill Fenner
Bill Fenner fen...@gmail.com added the comment: so, just to be clear, your position is that the output of shlex.split( u'Hello, World!' ) is *supposed* to be ['H\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00,\x00\x00\x00',

[issue6988] shlex.split() converts unicode input to UCS-4 output with varying byte order

2009-09-24 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Hm, while the StringIO behaviour supposedly cannot be changed for backwards-compatibility reasons, we can probably improve shlex behaviour with unicode strings. -- nosy: +pitrou ___ Python tracker

[issue6988] shlex.split() converts unicode input to UCS-4 output with varying byte order

2009-09-24 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc amaur...@gmail.com added the comment: (Presented this way, my opinion becomes difficult to stand... OTOH the docs say that the module does not support Unicode, so it's not strictly a bug) http://docs.python.org/library/shlex.html Yes, shlex could be improved and encode

[issue6988] shlex.split() converts unicode input to UCS-4 output with varying byte order

2009-09-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: Amaury Forgeot d'Arc wrote: Amaury Forgeot d'Arc amaur...@gmail.com added the comment: (Presented this way, my opinion becomes difficult to stand... OTOH the docs say that the module does not support Unicode, so it's not strictly a