[issue1140] re.sub returns str when processing empty unicode string

2007-09-10 Thread Fredrik Lundh
Fredrik Lundh added the comment: Looks good to me. I still subscribe to the idea that robust code should accept 8-bit *ASCII* strings any- where it accepts Unicode (especially when the 8-bit string is empty), but that's me. Feel free to check this in (or assign back to you if you don&#

[issue1140] re.sub returns str when processing empty unicode string

2007-09-10 Thread Fredrik Lundh
Fredrik Lundh added the comment: (is there a way to just add a comment in the new tracker, btw, or is everything a "change note", even if nothing has changed?) __ Tracker <[EMAIL PROTECTED]> <http://bugs.p

[issue1140] re.sub returns str when processing empty unicode string

2007-09-10 Thread Fredrik Lundh
Fredrik Lundh added the comment: Well, I spent a minute hunting around for a "comment" field or an "add comment" button. Guess this is a "you only need to learn this once" thing... __ Tracker <[EMAIL PROTECTED]&g

[issue1140] re.sub returns str when processing empty unicode string

2007-09-10 Thread Fredrik Lundh
Changes by Fredrik Lundh: __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1140> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mai

[issue1123] split(None, maxsplit) does not strip whitespace correctly

2007-09-10 Thread Fredrik Lundh
Fredrik Lundh added the comment: Looks like a *documentation* bug to me; at the implementation level, None just means "no empty parts, treat runs of whitespace as separators". -- nosy: +effbot __ Tracker <[EMAIL PROTECTED]> <htt

[issue1123] split(None, maxsplit) does not strip whitespace correctly

2007-09-10 Thread Fredrik Lundh
Fredrik Lundh added the comment: But wasn't your complaint that the implementation didn't match the documentation? As I said, the *implementation* treats "runs of whitespace" as separators, except for whitespace at the beginning or end (or in other words, it never returns e

[issue1143] Updated to latest ElementTree in 2.6

2007-09-10 Thread Fredrik Lundh
New submission from Fredrik Lundh: The xml.etree package should be updated to ElementTree 1.3/cElementTree 1.0.6 (or later). -- assignee: effbot components: XML messages: 55811 nosy: effbot priority: normal severity: minor status: open title: Updated to latest ElementTree in 2.6 type

[issue1602189] Suggest a textlist() method for ElementTree

2007-09-10 Thread Fredrik Lundh
Fredrik Lundh added the comment: ElementTree 1.3 provides a variant of this (tentatively called "itertext"). -- resolution: -> accepted superseder: -> Updated to latest ElementTree in 2.6 _ Tracker <[EMAIL PROTECTED]> &

[issue1143] Update to latest ElementTree in Python 2.6

2007-09-10 Thread Fredrik Lundh
Changes by Fredrik Lundh: -- title: Updated to latest ElementTree in 2.6 -> Update to latest ElementTree in Python 2.6 __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.o

[issue1602189] Suggest a textlist() method for ElementTree

2007-09-10 Thread Fredrik Lundh
Changes by Fredrik Lundh: -- status: open -> closed _ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1602189> _ ___ Python-bugs-list mailing li

[issue1745722] please add wsgi to SimpleXMLRPCServer

2007-09-10 Thread Fredrik Lundh
Fredrik Lundh added the comment: A proper patch, including tests (if possible) and documentation, would be nice. (also note that SimpleXMLRPCServer was written by Brian Quinlan.) -- assignee: effbot -> _ Tracker <[EMAIL PROTECTED]&

[issue1690840] xmlrpclib methods submit call on __str__, __repr__

2007-09-10 Thread Fredrik Lundh
Fredrik Lundh added the comment: I'm trying to think of a reason for actually providing __repr__ over RPC, but I cannot find any. Not quite as sure about __str__, though; I suggest adding a __repr__ method, but leaving the rest as is. -- assignee: effbot -> coll

[issue814253] Grouprefs in lookbehind assertions

2007-09-10 Thread Fredrik Lundh
Changes by Fredrik Lundh: -- type: -> behavior versions: +Python 2.4, Python 2.5 Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue814253> ___ P

[issue1160] Medium size regexp crashes python

2007-09-23 Thread Fredrik Lundh
Fredrik Lundh added the comment: Well, I'm not sure 81k qualifies as "medium sized", really. If you look at the size distribution for typical RE:s (which are usually handwritten, not machine generated), that's one or two orders of magnitude larger than "medium

[issue1700] Regular Expression inline flags not handled correctly for some unicode characters

2008-01-02 Thread Fredrik Lundh
Fredrik Lundh added the comment: Looks like the wrong execution flags are being passed to the function that creates the actual pattern object; the SRE compiler does the right thing, but the engine isn't running with the right flags in the last case. Changing the call to _sre.compi

[issue1327] Python 2.4+ spends too much time in PyEval_EvalFrame w/ xmlrpmclib

2008-01-03 Thread Fredrik Lundh
Fredrik Lundh added the comment: That changes to ceval should have introduced some kind of XML-RPC package limit seems a bit unlikely. If you can still reproduce this, can you try instrumenting the xmlrpclib.py library to see where it gets stuck? (passing in verbose=True to the Server[Proxy

[issue1698167] xml.etree document element.tag

2008-01-03 Thread Fredrik Lundh
Fredrik Lundh added the comment: This is fixed in the development version, so I'm closing this for now. The updated docs can be found here: http://docs.python.org/dev/library/xml.etree.elementtree.html -- resolution: -> fixed status: open -

[issue1761] Bug in re.sub()

2008-01-08 Thread Fredrik Lundh
Fredrik Lundh added the comment: re.findall has the same behaviour. Without looking at the code, I'm not sure if this is a bug in the code or in the documentation, really. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python

[issue1761] Bug in re.sub()

2008-01-09 Thread Fredrik Lundh
Fredrik Lundh added the comment: For the record, $ is defined to match "before a newline at the end of the string, or at the end of the string" in normal mode, and "before any newline, or at the end of the string" in multiline mode. (and I have a vague memory that t

[issue1777] ElementTree/cElementTree findtext inconsistency

2008-01-09 Thread Fredrik Lundh
Fredrik Lundh added the comment: Looks like the mechanisms used decide when to invoke the full ElementPath machinery differs somewhat. I've added this to the TODO list for ET 1.3; in the meantime, my advice is "don't do that". (adding a check for '.' to the PAT

[issue1327] Python 2.4+ spends too much time in PyEval_EvalFrame w/ xmlrpmclib

2008-01-15 Thread Fredrik Lundh
Fredrik Lundh added the comment: Can you switch on verbose mode in xmlrpclib, so you can see *where* the transfer hangs? Arguing that a hanging Python program must be caused by a bug in the code that *executes* the Python program isn't that meaningful, really. After all, that code is us

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Fredrik Lundh
Fredrik Lundh added the comment: The "no header" thing is very much done on purpose, and it's documented in the upstream ElementTree documentation. I suggest dropping this "Python 3 exists in its own universe" nonsense; it's not very professional, and it's

[issue7114] HTMLParser doesn't handle

2010-03-11 Thread Fredrik Lundh
Fredrik Lundh added the comment: And to clarify, XHTML is an reformulation of HTML4 using XML syntax, so you should use an XML parser to parse it, not an HTML parser. The formats are related, but not identical. -- ___ Python tracker <h

[issue5100] ElementTree.iterparse and Element.tail confusion

2010-03-11 Thread Fredrik Lundh
Fredrik Lundh added the comment: Footnote: "iterparse" does things this way mostly to keep the implementation simple and fast; due to buffering, the tree builder are usually ahead of the event generation with up to 16k. See the note on this page: http://effbot.org/zone/element-ite

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Fredrik Lundh
Fredrik Lundh added the comment: > if I don't specify an encoding, I get unicode. If I do specify an encoding, > I get encoded bytes. You're confusing the XML document encoding with character set encoding. A serialized (unparsed) XML document is a byte stream, not a s

[issue6472] Update ElementTree with upstream changes

2010-03-11 Thread Fredrik Lundh
Fredrik Lundh added the comment: W00t! -- ___ Python tracker <http://bugs.python.org/issue6472> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Fredrik Lundh
Fredrik Lundh added the comment: >>> import array >>> array.array("i", [1, 2, 3]).tostring() b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00' -- ___ Python trac

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Fredrik Lundh
Fredrik Lundh added the comment: So now it's the domain experts against some hypothetical people that might exist? Tricky. -- ___ Python tracker <http://bugs.python.org/i

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Fredrik Lundh added the comment: "'None' has always been the documented default for the encoding parameter" That's probably mostly by accident at least in original ET, but the 1.3 draft docs at effbot.org/elementtree does spell it out explicitly for the 'write

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Fredrik Lundh added the comment: (what's the Python 3 replacement for the array module, btw?) -- ___ Python tracker <http://bugs.python.org/issue8047> ___ ___

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Fredrik Lundh added the comment: "Yes, the feature has been implemented deep down in the _encode() helper function, so it impacts the entire serialiser, not only its API" Ouch. >>> import locale >>> locale.getpreferredencoding() == "utf-8" False >>

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Fredrik Lundh added the comment: "I wouldn't raise much opposition against tobytes() as an alias for tostring(), although that sounds more like duplicating an otherwise simple API." Adding an alias would be a way address the 2.X/3.X terminology overlap; string traditionally i

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Changes by Fredrik Lundh : -- ___ Python tracker <http://bugs.python.org/issue8047> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/m

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Fredrik Lundh added the comment: "I wouldn't raise much opposition against tobytes() as an alias for tostring(), although that sounds more like duplicating an otherwise simple API." Adding an alias would be a way address the 2.X/3.X terminology overlap; string traditionally i

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Fredrik Lundh added the comment: Interesting. But isn't the problem with 3.1 that it relies on the standard encoding, which results in code that may or may not work depending on a global platform setting? Who's doing the encoding in the new version? And what ends up i

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Fredrik Lundh added the comment: Oops :) Yeah, that was pretty lousy way to show what encoding I was using for that test: >>> import locale >>> locale.getpreferredencoding() 'cp1252' >>> (Somewhat related, it would be nice if Python actually normalized

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-21 Thread Fredrik Lundh
Fredrik Lundh added the comment: Hmm. I'm not entirely sure about giving False a meaning when None has traditionally had a different (and documented) meaning. And sleeping on it hasn't convinced me in either direction :-( (well, I'd say no, but the compatibility argum

[issue6488] ElementTree documentation refers to "path" with no explanation, and inconsistently

2010-04-01 Thread Fredrik Lundh
Fredrik Lundh added the comment: > As per PEP 257, “Returns” should become “Return” (it’s a command, not a > description). Upstream ET uses JavaDoc conventions, where the conventions are designed by technical writers, not hackers. In JavaDoc, descriptions are 3rd person declarative

[issue6488] ElementTree documentation refers to "path" with no explanation, and inconsistently

2010-04-01 Thread Fredrik Lundh
Fredrik Lundh added the comment: The missing/extra words in the findtext description is just a case of sloppy copy-editing, most likely after a quick reformatting. Not sure why you're spending all this energy arguing about commas, t

[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-05-01 Thread Fredrik Lundh
Fredrik Lundh added the comment: Namespaces are a fundamental part of the XML information model (both xpath and infoset) and all modern XML document formats, so I'm not sure what problem you're trying to solve by pretending that they don't exist. It's a bit like modif

[issue2892] improve cElementTree iterparse error handling

2010-06-11 Thread Fredrik Lundh
Fredrik Lundh added the comment: Note that this was fixed in upstream 1.3 (and verified by the selftests), but the fix and test was apparently lost when that code was merged into 2.7. Since 2.7 is supposed to ship with 1.3, this is a regression, not a feature request. (But 2.7 is in rc, and

[issue6266] cElementTree.iterparse & ElementTree.iterparse return differently encoded strings

2009-06-20 Thread Fredrik Lundh
Fredrik Lundh added the comment: Converting from UTF-8 to Unicode is the right thing to do, but converting back to Latin-1 is not correct -- note that ET returns a Unicode string, not an 8-bit string. There's a "makestring" helper that does the right thing in the library

[issue6266] cElementTree.iterparse & ElementTree.iterparse return differently encoded strings

2009-06-21 Thread Fredrik Lundh
Fredrik Lundh added the comment: It should definitely give what's intended (either a Unicode string, or, if the content is plain ASCII, an 8-bit string). What did you get instead? -- ___ Python tracker <http://bugs.python.org/i

[issue6233] ElementTree (py3k) doesn't properly encode characters that can't be represented in the specified encoding

2009-06-21 Thread Fredrik Lundh
Fredrik Lundh added the comment: Umm. Isn't _encode used to encode tags and attribute names? The charref syntax is only valid in CDATA sections and attribute values, which are encoded by the corresponding _escape functions. I suspect this patch will make things blow up on a non-ASCI

[issue6233] ElementTree (py3k) doesn't properly encode characters that can't be represented in the specified encoding

2009-06-21 Thread Fredrik Lundh
Fredrik Lundh added the comment: Did you look at the 1.3 alpha code base when you came up with this idea? Unfortunately, 1.3's _encode is used for a different purpose... I don't have time to test it tonight, but I suspect that 1.3's escape_data/escape_attrib functions mi

[issue6233] ElementTree (py3k) doesn't properly encode characters that can't be represented in the specified encoding

2009-06-24 Thread Fredrik Lundh
Fredrik Lundh added the comment: That's backwards, unless I'm missing something here: charrefs represent Unicode characters, not UTF-8 byte values. The character "LATIN SMALL LETTER A WITH TILDE" with the character value 227 should be represented as "ã" if

[issue5166] ElementTree and minidom don't prevent creation of not well-formed XML

2009-06-24 Thread Fredrik Lundh
Fredrik Lundh added the comment: For ET, that's very much on purpose. Validating data provided by every single application would kill performance for all of them, even if only a small minority would ever try to serialize data that cannot be represented i

[issue6562] OverflowError in RLock.acquire()

2009-08-04 Thread Fredrik Lundh
Fredrik Lundh added the comment: PIL is completely thread-agnostic, so I not sure there's anything PIL can do to fix this. (and ImageQt is of course an interface to PyQt, which is an interface to Qt, which consists of a *lot* more than 50

[issue7139] ElementTree: Incorrect serialization of end-of-line characters in attribute values

2009-11-02 Thread Fredrik Lundh
Fredrik Lundh added the comment: The real problem here is that XML attributes weren't really designed to hold data that doesn't survive normalization. One would have thought that making it difficult to do that, and easy to store such things as character data, would have made people t

[issue3475] _elementtree.c import can fail silently

2009-11-08 Thread Fredrik Lundh
Fredrik Lundh added the comment: Note that "fail silently" is a bit of a misnomer - if the embedded import doesn't work, portions of the library will fail pretty loudly. Feel free to use some variation of the suggested patch, or just wait until the next upstream release ge

[issue7462] Implement fastsearch algorithm for rfind/rindex

2010-01-04 Thread Fredrik Lundh
Fredrik Lundh added the comment: Thanks Florent! > Are there any simple, common cases that are made slower by this patch? The original fastsearch implementation has a couple of special cases to make sure it's faster than the original code in all cases. The reason it wasn't im

[issue2842] Dictionary methods: inconsistency

2008-05-12 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: Eh? Why did you add *everyone* involved the project to the nosy list? (I'll leave explaining why breaking almost all Python programs in the name of "consistency" is an absurd idea to someone else). -- n

[issue2842] Dictionary methods: inconsistency

2008-05-12 Thread Fredrik Lundh
Changes by Fredrik Lundh <[EMAIL PROTECTED]>: -- nosy: -effbot __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2842> __ ___ Python-bugs-list mailin

[issue3299] invalid object destruction in re.finditer()

2008-07-06 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: This report makes no sense to me; at least in Python 2.X, PyObject_Del removes a chunk of memory from the object heap. It's designed to be used from dealloc implementations, to release the actual memory (either directly, or as

[issue3353] make built-in tokenizer available via Python C API

2008-07-14 Thread Fredrik Lundh
New submission from Fredrik Lundh <[EMAIL PROTECTED]>: CPython provides a Python-level API to the parser, but not to the tokenizer itself. Somewhat annoyingly, it does provide a nice C API, but that's not properly exposed for external modules. To fix this, the tokenizer.h fil

[issue3299] invalid object destruction in re.finditer()

2008-07-19 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: Reducing priority to normal; this bug has been around since Python 2.2, and only affects code that doesn't work anyway when running on debug builds. -- priority: critical -> normal

[issue3353] make built-in tokenizer available via Python C API

2008-07-21 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: There are a few things in the struct that needs to be public, but that's nothing that cannot be handled by documentation. No need to complicate the API just in case. ___ Python tracker <[E

[issue3409] ElementPath.Path.findall problem with unicode input

2008-07-22 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: Hmm. That's embarrassing. What was I thinking? Guess it's time to update the 2.X codebase to ET 1.2.8. ___ Python tracker <[EMAIL PROTECTED]> <http://

[issue3353] make built-in tokenizer available via Python C API

2008-07-24 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: That's should be all that's needed to expose the existing API, as is. If you want to verify the build, you can grab the pytoken.c and setup.py files from this directory, and try building the module. http://svn.effbot.org/publ

[issue3475] _elementtree.c import can fail silently

2008-08-03 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: This is fixed in the ET 1.3-compatible codebase. Since it's too late to add ET 1.3 to 2.6, I guess it's time to make a new 1.2 bugfix release for 2.6. ___ Python tracker <[EMAI

[issue648658] xmlrpc can't do proxied HTTP

2008-08-19 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: It's a missing feature, not a bug in the existing code. But if you're desperate, why not just use the transport implementation that's attached to this issue? ___ Python tracker <

[issue3811] Update Unicode database to 5.1.0

2008-09-10 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: The patch looks fine to me (assuming that I didn't miss something critical hidden among the large table diffs). (I'd probably named the "NODELTA" flag after what it is rather than what it isn't, but I cann

[issue3825] Major reworking of Python 2.5.2 re module

2008-09-13 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: A bit more information on the changes to the core engine that are responsible for the 2x speedup (on what?) would be nice to have, I think (especially since you seem to have removed the KMP prefix scanner). (Isn't there a RE benc

[issue3865] explain that profilers should be used for profiling, not benchmarking

2008-09-14 Thread Fredrik Lundh
New submission from Fredrik Lundh <[EMAIL PROTECTED]>: You often see people using the profiler for benchmarking instead of profiling. I suggest adding a note that explains that the profiler modules are designed to provide an execution profile for a given program, not for benchmarking dif

[issue3865] explain that profilers should be used for profiling, not benchmarking

2008-09-14 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: (the reason this is extra bad for C modules is that the profilers introduce overhead for Python code, but not for C-level functions. For example, using the standard profiler to benchmark parser performance for xml.etree.ElementT

[issue3547] Ctypes is confused by bitfields of varying integer types

2008-09-24 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: Looks fine to me, except for the comment in the test suite. Should +# MS compilers do NOT combine c_short and c_int into +# one field, gcc doesn't. perhaps be +# MS compilers do NOT combine c_short and

[issue3547] Ctypes is confused by bitfields of varying integer types

2008-09-24 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: "Do" should be "does", right. Not enough coffee today :) ___ Python tracker <[EMAIL PROTECTED]> &

[issue433029] SRE: posix classes aren't supported

2008-09-27 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: Yes, this refers to the POSIX character classes as described here: http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html (Ideally, there should be an (internal) API that lets you register class definitions from the

[issue4100] xml.etree.ElementTree does not read xml-text over page bonderies

2008-11-01 Thread Fredrik Lundh
Fredrik Lundh <[EMAIL PROTECTED]> added the comment: Roland's right - "iterparse" only guarantees that it has seen the ">" character of a starting tag when it emits a "start" event, so the attributes are defined, but the contents of the text and

[issue1777] ElementTree/cElementTree findtext inconsistency

2009-01-10 Thread Fredrik Lundh
Fredrik Lundh added the comment: Forgot to mention that this is fixed in the cElementTree trunk (public as of today's 1.0.6 preview release). Will merge with Python trunk when I find the time... ___ Python tracker <http://bugs.python.org/i

[issue1143] Update to latest ElementTree in Python 2.7

2009-04-02 Thread Fredrik Lundh
Fredrik Lundh added the comment: ET 1.3 is still in alpha, though. Hopefully, that'll sort itself out over the next few weeks. -- ___ Python tracker <http://bugs.python.org/i

[issue1538691] Patch cElementTree to export CurrentLineNumber

2009-04-02 Thread Fredrik Lundh
Fredrik Lundh added the comment: In the upstream 1.0.6, the ParseError exception has a position attribute that contains a (line, column) tuple. -- ___ Python tracker <http://bugs.python.org/issue1538

[issue5767] xmlrpclib loads invalid documents

2009-04-16 Thread Fredrik Lundh
Fredrik Lundh added the comment: sgmlop doesn't do much validation; to quote the homepage: "[sgmlop] is tolerant, and happily accepts XML-like data that are not well-formed. If you need strictness, use another parser." But given that Python ships with cElementTree