On Aug 27, 5:42 am, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> George Sakkis wrote:
> >> if you meant to write "encode", you can indeed safely do
> >> [s.encode('utf8') for s in strings] as long as all strings are returned
> >> by an ET implementation.
>
> > I was replying to the general assertion
George Sakkis wrote:
if you meant to write "encode", you can indeed safely do
[s.encode('utf8') for s in strings] as long as all strings are returned
by an ET implementation.
I was replying to the general assertion that "in 2.x ASCII byte
strings and unicode strings are compatible", not specif
On Aug 25, 4:45 pm, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> George Sakkis wrote:
> > It depends on what you mean by "compatible"; e.g. you can't safely do
> > [s.decode('utf8') for s in strings] if you have byte strings mixed
> > with unicode.
>
> why would you want to decode strings given to yo
George Sakkis wrote:
It depends on what you mean by "compatible"; e.g. you can't safely do
[s.decode('utf8') for s in strings] if you have byte strings mixed
with unicode.
why would you want to decode strings given to you by a library that
returns decoded strings?
if you meant to write "enc
On Aug 24, 1:12 am, Stefan Behnel <[EMAIL PROTECTED]> wrote:
> George Sakkis wrote:
> > On Aug 21, 1:48 am, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
>
> >> George Sakkis wrote:
> >>> It's interesting that the element text attributes after a successful
> >>> parse do not necessarily have the same ty
George Sakkis wrote:
> It seems xml.etree.cElementTree.iterparse() is not unicode aware:
>
from StringIO import StringIO
from xml.etree.cElementTree import iterparse
s =
u'\u03a0\u03b1\u03bd\u03b1\u03b3\u03b9\u03ce\u03c4\u03b7\u03c2'
for event,elem in iterparse(StringIO(s
George Sakkis wrote:
> On Aug 21, 1:48 am, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
>
>> George Sakkis wrote:
>>> It's interesting that the element text attributes after a successful
>>> parse do not necessarily have the same type, i.e. all be str or all
>>> unicode. I ported some text extraction
On Aug 21, 1:48 am, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> George Sakkis wrote:
> > It's interesting that the element text attributes after a successful
> > parse do not necessarily have the same type, i.e. all be str or all
> > unicode. I ported some text extraction code from BeautifulSoup (
George Sakkis wrote:
> Thank you both for the suggestions. I made a few more experiments to
> understand how iterparse behaves with respect to three dimensions:
Spending time researching undefined behaviour is pretty pointless. ET
parsers expect byte streams, because that's what XML files are.
George Sakkis wrote:
Traceback (most recent call last):
File "", line 1, in
File "", line 64, in __iter__
UnicodeEncodeError: 'ascii' codec can't encode characters in position
6-15: ordinal not in range(128)
Am I using it incorrectly or it doesn't currently support unicode ?
iterparse pa
Thank you both for the suggestions. I made a few more experiments to
understand how iterparse behaves with respect to three dimensions:
a. Is the encoding declared in the header (if there is one) ?
b. Is the text ascii-encodable (i.e. within range(128)) ?
c. Does the passed file object's read() me
On Wed, 2008-08-20 at 15:36 -0700, George Sakkis wrote:
> It seems xml.etree.cElementTree.iterparse() is not unicode aware:
>
> >>> from StringIO import StringIO
> >>> from xml.etree.cElementTree import iterparse
> >>> s =
> >>> u'\u03a0\u03b1\u03bd\u03b1\u03b3\u03b9\u03ce\u03c4\u03b7\u03c2'
> >>
On Aug 21, 8:36 am, George Sakkis <[EMAIL PROTECTED]> wrote:
> It seems xml.etree.cElementTree.iterparse() is not unicode aware:
>
> >>> from StringIO import StringIO
> >>> from xml.etree.cElementTree import iterparse
> >>> s =
> >>> u'\u03a0\u03b1\u03bd\u03b1\u03b3\u03b9\u03ce\u03c4\u03b7\u03c2'
It seems xml.etree.cElementTree.iterparse() is not unicode aware:
>>> from StringIO import StringIO
>>> from xml.etree.cElementTree import iterparse
>>> s =
>>> u'\u03a0\u03b1\u03bd\u03b1\u03b3\u03b9\u03ce\u03c4\u03b7\u03c2'
>>> for event,elem in iterparse(StringIO(s)):
... print elem.text
..
14 matches
Mail list logo