[issue3565] array documentation, method names not 3.x-compliant

2011-07-12 Thread Matt Giuca
Matt Giuca added the comment: There are still some inconsistencies in the documentation (in particular, incorrectly using the word "string" to refer to a bytes object, which made sense in Python 2 but not 3), which I fixed in my doc-only.patch file that's coming up to its

[issue8821] Range check on unicode repr

2010-12-29 Thread Matt Giuca
Matt Giuca added the comment: > I think that we have good reasons to not remove the NUL character. Please note: Nobody is suggesting that we remove the NUL character. I was merely suggesting that we don't rely on it where it is unnecessary. Returning to my original patch: If the

[issue8821] Range check on unicode repr

2010-08-02 Thread Matt Giuca
Matt Giuca added the comment: OK, I finally had time to review this issue again. Firstly, granted the original fix broke a test case, shouldn't we figure out why it broke and fix it, rather than just revert the change and continue relying on this tenuous assumption? Surely it's be

[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-21 Thread Matt Giuca
Matt Giuca added the comment: If you're going the way of option 2, I would strongly advise against relying on the KeyError. The fact that a KeyError is raised by urllib.quote is not part of it's specification, it's a bug/quirk in the implementation (which is now unlikely to

[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Matt Giuca
Matt Giuca added the comment: OK sure, there are some other things broken, but they are mostly not dealing with string data, but binary data (for example, zlib expects a sequence of bytes, not characters). Just one quick point: > urllib.urlretrieve("file:///tmp/hé") > U

[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Matt Giuca
Matt Giuca added the comment: > I think everyone assumed that the parameter should be a "str" object > and nothing else. Apparently some people used it accidentally with > some unicode strings and it "worked" until these strings contained > non-ASCII characters.

[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Matt Giuca
Matt Giuca added the comment: > Well, isn't it a new feature you're adding? You had a function which raised a confusing and unintentional KeyError when given non-ASCII Unicode input. Now it doesn't. That's the bug fix part. What I assume you're referring to a

[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-19 Thread Matt Giuca
Matt Giuca added the comment: >From http://mail.python.org/pipermail/python-checkins/2010-July/095350.html: > Looking at the issue (which in itself was quite old), you could as well > have fixed the robotparser module instead. It isn't an issue with robotparser. The original rep

[issue1712522] urllib.quote throws exception on Unicode URL

2010-07-18 Thread Matt Giuca
Matt Giuca added the comment: Thanks for doing that, Senthil. -- ___ Python tracker <http://bugs.python.org/issue1712522> ___ ___ Python-bugs-list mailin

[issue8987] Distutils doesn't quote Windows command lines properly

2010-06-12 Thread Matt Giuca
Changes by Matt Giuca : -- type: -> behavior ___ Python tracker <http://bugs.python.org/issue8987> ___ ___ Python-bugs-list mailing list Unsubscri

[issue8987] Distutils doesn't quote Windows command lines properly

2010-06-12 Thread Matt Giuca
New submission from Matt Giuca : I discovered this investigating a bug report that python-cjson doesn't compile properly on Windows (http://pypi.python.org/pypi/python-cjson). Cjson's setup.py asks distutils to compile with the flag '-DMODULE_VERSION="1.0.5&quo

[issue8821] Range check on unicode repr

2010-05-25 Thread Matt Giuca
New submission from Matt Giuca : In unicodeobject.c's unicodeescape_string, in UCS2 builds, if the last character of the string is the start of a UTF-16 surrogate pair (between '\ud800' and '\udfff'), there is a slight overrun problem. For example: >>> re

[issue8143] urlparse has a duplicate of urllib.unquote

2010-03-15 Thread Matt Giuca
Matt Giuca added the comment: What about the alternative (newmodule) patch? That doesn't have threading issues, or break backwards compatibility. -- ___ Python tracker <http://bugs.python.org/i

[issue8135] urllib.unquote doesn't decode mixed-case percent escapes

2010-03-15 Thread Matt Giuca
Matt Giuca added the comment: Thanks very much. Importantly, note that unquote is currently duplicated between urllib and urlparse. I have a bug on it (#8143) but in the meantime, you will have to commit this fix to both modules. -- ___ Python

[issue8143] urlparse has a duplicate of urllib.unquote

2010-03-14 Thread Matt Giuca
Matt Giuca added the comment: If this patch is rejected, then at the very least, the urllib.unquote function needs a comment at the top explaining that it is duplicated in urlparse, so any changes should be made to both. Note that urlparse.unquote is not a documented function, or in the

[issue8135] urllib.unquote doesn't decode mixed-case percent escapes

2010-03-14 Thread Matt Giuca
Matt Giuca added the comment: Tiny fix to patch2 -- replaced list comprehension with generator expression in dictionary construction. -- Added file: http://bugs.python.org/file16552/urllib-unquote-mixcase.patch2 ___ Python tracker <h

[issue8135] urllib.unquote doesn't decode mixed-case percent escapes

2010-03-14 Thread Matt Giuca
Changes by Matt Giuca : Removed file: http://bugs.python.org/file16551/urllib-unquote-mixcase.patch2 ___ Python tracker <http://bugs.python.org/issue8135> ___ ___ Pytho

[issue8135] urllib.unquote doesn't decode mixed-case percent escapes

2010-03-14 Thread Matt Giuca
Matt Giuca added the comment: I thought more about it, and wrote a different patch which doesn't remove the dictionary. I just replaced the dictionary creation code -- now it includes keys for all combinations of upper and lower case (for two-letter hex codes). This dictionary isn&#

[issue8136] urllib.unquote decodes percent-escapes with Latin-1

2010-03-14 Thread Matt Giuca
Matt Giuca added the comment: Oh, I just discovered that urlparse contains a copy of unquote, which will also need to be patched. I've submitted a patch to remove the duplicate (#8143) -- if that is accepted first then there's no need to worr

[issue8135] urllib.unquote doesn't decode mixed-case percent escapes

2010-03-14 Thread Matt Giuca
Matt Giuca added the comment: Oh, I just discovered that urlparse contains a copy of unquote, which will also need to be patched. I've submitted a patch to remove the duplicate (#8143) -- if that is accepted first then there's no need to worr

[issue8143] urlparse has a duplicate of urllib.unquote

2010-03-14 Thread Matt Giuca
New submission from Matt Giuca : urlparse contains a complete copy of the urllib.unquote function. This is extremely nasty code duplication -- I have two patches pending on urllib.unquote (#8135 and #8136) and I only just realised that I missed urlparse.unquote! The reason given for this is

[issue8135] urllib.unquote doesn't decode mixed-case percent escapes

2010-03-14 Thread Matt Giuca
Matt Giuca added the comment: > Note: I've also backported the remainder of the 'unquote' test cases > from Python 3 but I found another bug, so I will report that separately, > with a patch. Filed under issue #8136. -- _

[issue8136] urllib.unquote decodes percent-escapes with Latin-1

2010-03-14 Thread Matt Giuca
Matt Giuca added the comment: New version of "explain" patch -- fixed comment linking to the wrong bug ID -- now links to this bug ID (#8136). -- Added file: http://bugs.python.org/file16545/urllib-unquote-explain.patch ___ Python trac

[issue8136] urllib.unquote decodes percent-escapes with Latin-1

2010-03-14 Thread Matt Giuca
Changes by Matt Giuca : Removed file: http://bugs.python.org/file16542/urllib-unquote-explain.patch ___ Python tracker <http://bugs.python.org/issue8136> ___ ___ Pytho

[issue8136] urllib.unquote decodes percent-escapes with Latin-1

2010-03-14 Thread Matt Giuca
Matt Giuca added the comment: Alternative patch which fixes test cases and documentation without changing the behaviour. -- Added file: http://bugs.python.org/file16542/urllib-unquote-explain.patch ___ Python tracker <http://bugs.python.

[issue8136] urllib.unquote decodes percent-escapes with Latin-1

2010-03-14 Thread Matt Giuca
New submission from Matt Giuca : The 'unquote' function has some very strange behaviour on Unicode input. My proposed fix will, I am sure, be contentious, because it could change existing behaviour (only on unicode strings), but I think it's worth it for a sane unquote

[issue8135] urllib.unquote doesn't decode mixed-case percent escapes

2010-03-14 Thread Matt Giuca
New submission from Matt Giuca : urllib.unquote fails to decode a percent-escape with mixed case. To demonstrate: >>> unquote("%fc") '\xfc' >>> unquote("%FC") '\xfc' >>> unquote("%Fc") '%Fc' >>> u

[issue1712522] urllib.quote throws exception on Unicode URL

2010-03-14 Thread Matt Giuca
Matt Giuca added the comment: I've finally gotten around to a complete analysis of this code. I have a code/test/documentation patch which fixes the issue without any code breakage. There is another bug in quote which I've found and fixed with this patch: If the 'safe'

[issue5827] os.path.normpath doesn't preserve unicode

2009-11-18 Thread Matt Giuca
Matt Giuca added the comment: Thanks Ezio. I've updated the patch to incorporate your suggestions. Note that I too have only tested it on Linux, but I tested both posixpath and ntpath (and there is no OS-specific code, except for the filenames themselves). I'm not sure if using

[issue6118] urllib.parse.quote_plus ignores optional arguments

2009-05-26 Thread Matt Giuca
New submission from Matt Giuca : urllib.parse.quote_plus will ignore its encoding and errors arguments if its input string has a space in it. Intended behaviour: >>> urllib.parse.quote_plus("\xa2\xd8 \xff", encoding='latin-1') '%A2%D8+%FF' Observed behav

[issue1712522] urllib.quote throws exception on Unicode URL

2009-05-26 Thread Matt Giuca
Matt Giuca added the comment: The issue of urllib.quote was discussed at extreme length in issue 3300, which was specific to Python 3. http://bugs.python.org/issue3300 In the end, I rewrote the entire family of urllib.quote and unquote functions; they're now Unicode compliant and a

[issue3565] array documentation, method names not 3.0 compliant

2009-04-23 Thread Matt Giuca
Matt Giuca added the comment: I agree with that -- too big a change to make now. But can we please get the documentation patch accepted? It's been waiting here for eight months with corrections to clearly-incorrect documentation. -- ___ P

[issue5827] os.path.normpath doesn't preserve unicode

2009-04-23 Thread Matt Giuca
Changes by Matt Giuca : -- type: -> behavior ___ Python tracker <http://bugs.python.org/issue5827> ___ ___ Python-bugs-list mailing list Unsubscri

[issue5827] os.path.normpath doesn't preserve unicode

2009-04-23 Thread Matt Giuca
New submission from Matt Giuca : In the Python 2.x branch, os.path.normpath will sometimes return a str even if given a unicode. This is not an issue in the Python 3.0 branch. This happens specifically when it throws away all string data and constructs its own: >>> os.path.n

[issue3565] array documentation, method names not 3.0 compliant

2009-04-23 Thread Matt Giuca
Matt Giuca added the comment: Full method renaming patch. -- Added file: http://bugs.python.org/file13756/doc+bytesmethods.patch ___ Python tracker <http://bugs.python.org/issue3

[issue3565] array documentation, method names not 3.0 compliant

2009-04-23 Thread Matt Giuca
Matt Giuca added the comment: OK since the patches I submitted are now eight months old, I just did an update and re-applied them. I am submitting new patch files which don't change anything, but are patches against revision 71822 (should be much easier to apply). I'd still like

[issue3613] base64.encodestring does not actually accept strings

2009-04-23 Thread Matt Giuca
Matt Giuca added the comment: Now, base64.encodestring and decodestring seem a bit weird because the Base64 encoded string is also required to be a bytes. It seems to me that once something is Base64-encoded, it's considered to be ASCII text, not just some byte string, and therefore it s

[issue3613] base64.encodestring does not actually accept strings

2009-04-23 Thread Matt Giuca
Matt Giuca added the comment: I've attached a patch which renames encodestring to encodebytes (keeping encodestring around as an alias). Updated test and documentation. I also renamed decodestring to decodebytes, because it also refuses to accept a string (only a bytes). I have an altern

[issue3565] array documentation, method names not 3.0 compliant

2009-03-17 Thread Matt Giuca
Matt Giuca added the comment: Note that, irrespective of the changes to the library itself, the documentation is out of date since it still uses the old "string/unicode" nomenclature, rather than the new "bytes/string". I have provided a separate documentation patch which

[issue3803] Comparison operators - New rules undocumented in Python 3.0

2008-09-07 Thread Matt Giuca
New submission from Matt Giuca <[EMAIL PROTECTED]>: I've noticed that in Python 3.0, the <, >, <= and >= operators now raise a TypeError when comparing objects of different types, rather than ordering them "consistently but arbitrarily". The documentation doesn&#

[issue3794] __div__ still documented in Python 3

2008-09-06 Thread Matt Giuca
New submission from Matt Giuca <[EMAIL PROTECTED]>: The "special method names" section of the Python 3.0 documentation still mentions the __div__ method. I believe this method has been totally removed in Python 3 in favour of __truediv__. (Perhaps I am mistaken, but 'int

[issue3793] Small RST fix in datamodel.rst

2008-09-06 Thread Matt Giuca
New submission from Matt Giuca <[EMAIL PROTECTED]>: A missing blank line under the heading for __bool__ in datamodel.rst (in Python 3.0 docs) causes the following line to appear in the output HTML. ".. index:: single: __len__() (mapping object method)" Visible here: http://doc

[issue3565] array documentation, method names not 3.0 compliant

2008-09-03 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Can I just remind people that I have a documentation patch ready here (and has been for about a month)? Of course the doc+bytesmethods.patch may be debatable and probably too late to go in 3.0. But you should be able to commit doc-only

[issue600362] relocate cgi.parse_qs() into urlparse

2008-08-25 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: It seems like parse_multipart and parse_header are very strongly related to parse_qs. (eg. if you want to process HTTP requests you'll want to call parse_qs for x-www-form-urlencoded and parse_multipart for multipart/form-data). Shou

[issue3565] array documentation, method names not 3.0 compliant

2008-08-20 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: A similar issue came up in another bug (http://bugs.python.org/issue3613), and Guido said: "IMO it's okay to add encodebytes(), but let's leave encodestring() around with a deprecation warning, since it's so late in th

[issue3613] base64.encodestring does not actually accept strings

2008-08-20 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: > > it should be renamed to encodestring > Huh ? It is already called that :) Um ... yes. I mean encodebytes :) > > Best we can do is document them. > Oh well. But I don't know the rules. People are saying thing

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-20 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Thanks for pointing that out, Antoine. I just commented on that bug. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue3613] base64.encodestring does not actually accept strings

2008-08-20 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Hi Dmitry, RE the method behaviour: I think it probably is correct to NOT accept a string. Given that it's base64 encoding it, it only makes sense to encode bytes, not arbitrary Unicode characters which have no well-defined binary rep

[issue3609] does parse_header really belong in CGI module?

2008-08-19 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: These functions are for generic MIME headers and bodies, so are applicable to CGI, HTTP, Email, and any other protocols based on MIME. So I think having them in email.header makes about as much sense as having them in cgi. Isn't

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-18 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Hi, Sorry to bump this, but you (Guido) said you wanted this closed by Wednesday. Is this patch committable yet? (There are no more unresolved issues that I am aware of). ___ Python tracker <[EMAIL

[issue3576] Demo/embed builds against old version

2008-08-17 Thread Matt Giuca
New submission from Matt Giuca <[EMAIL PROTECTED]>: The Python 2.6 version of Demo/embed/Makefile builds against libpython2.5.a, which doesn't exist in this version. Quick patch to let it build against libpython2.6.a. Commit log: Fixed Demo/embed/Makefile to build against lib

[issue3564] making partial functions comparable

2008-08-16 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: It's highly debatable whether these should compare true. (Note: saying "they aren't comparable" is a misnomer -- they are comparable, they just don't compare equal). >From a mathematical standpoint, the

[issue3565] array documentation, method names not 3.0 compliant

2008-08-16 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Oops .. forgot to update the array.rst docs with the new method names. Replaced doc+bytesmethods.patch with a fixed version. Added file: http://bugs.python.org/file11123/doc+bytesmethods.patch ___

[issue3565] array documentation, method names not 3.0 compliant

2008-08-16 Thread Matt Giuca
Changes by Matt Giuca <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file11122/doc+bytesmethods.patch ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue3565] array documentation, method names not 3.0 compliant

2008-08-16 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: I renamed tostring/fromstring to tobytes/frombytes in the array module, as described above. I then grepped the entire py3k tree for "tostring" and "fromstring", and carefully replaced all references which pertain to arra

[issue3565] array documentation, method names not 3.0 compliant

2008-08-16 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: (Fixed issue title) -- title: array documentation, method names not 3.0 compliant -> array documentation, method names not 3.0 compliant ___ Python tracker <[EMAIL PROTE

=?utf-8?q?[issue3565]_array_documentation, =09method_names_not_3.0_compliant?=

2008-08-16 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: > I'm not a native speaker (of English), but my understanding is that the > noun "string", in itself, can very well be used to describe this type: > the result is a "byte string", as opposed to a "ch

[issue3565] array documentation, method names not 3.0 compliant

2008-08-16 Thread Matt Giuca
New submission from Matt Giuca <[EMAIL PROTECTED]>: A few weeks ago I fixed the struct module's documentation which wasn't 3.0 compliant (basically renaming "strings" to "bytes" and "unicode" to "string"). Now I've had a l

[issue3547] Ctypes is confused by bitfields of varying integer types

2008-08-15 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Confirmed in HEAD for Python 2.6 and 3.0, on Linux. Python 2.6b2+ (trunk:65708, Aug 16 2008, 15:04:13) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Python 3.0b2+ (py3k:65708, Aug 16 2008, 15:09:19) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu

[issue3557] Segfault in sha1

2008-08-14 Thread Matt Giuca
New submission from Matt Giuca <[EMAIL PROTECTED]>: Continuing the discussion from Issue 3552 (http://bugs.python.org/issue3552). r65676 makes changes to Modules/md5module.c and Modules/sha1module.c, to allow them to read mutable buffers. There's a segfault in sha1module if given

[issue3552] uuid - exception on uuid3/uuid5

2008-08-14 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: So are you saying that if I had libopenssl (or whatever the name is) installed and linked with Python, it would bypass the use of _md5 and _sha1, and call the hash functions in libopenssl instead? And all the buildbots _do_ have it linked?

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-14 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Antoine: > I think if you move the line defining "str" out of the loop, relative > timings should change quite a bit. Chances are that the random > functions are not very fast, since they are written in pure Pytho

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-14 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: New patch (patch10). Details on Rietveld review tracker (http://codereview.appspot.com/2827). Another update on the remaining "outstanding issues": Resolved issues since last time: > Should unquote accept a bytes/bytearray a

[issue3552] uuid - exception on uuid3/uuid5

2008-08-14 Thread Matt Giuca
New submission from Matt Giuca <[EMAIL PROTECTED]>: The test suite breaks on the Lib/test/test_uuid.py, as of r65661. This is because uuid3 and uuid5 now raise exceptions. TypeError: new() argument 1 must be bytes or read-only buffer, not bytearray The problem is due to the changes in t

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-14 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: OK I implemented the defaultdict solution. I got curious so ran some rough speed tests, using the following code. import random, urllib.parse for i in range(0, 10): str = ''.join(chr(random.randint(0, 0x10)) f

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-14 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Ah cheers Antoine, for the tip on using defaultdict (I was confused as to how I could access the key just by passing defaultfactory, as the manual suggests). ___ Python tracker <[EMAIL PROTECTE

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-13 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: > I'm OK with replace for unquote() ... > For quote() I think strict is better There's just an odd inconsistency there, but it's only a tiny "gotcha"; and I agree with all your other arguments. I'

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-13 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: > I have no strong opinion on the very remaining points you listed, > except that IMHO encode_rfc2231 with charset=None should not try to > use UTF8 by default. But someone with more mail protocol skills > should comment :) O

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-12 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: By the way, what is the current status of this bug? Is anybody waiting on me to do anything? (Re: Patch 9) To recap my previous list of outstanding issues raised by the review: > Should unquote accept a bytes/bytearray as well

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-12 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Bill, this debate is getting snipy, and going nowhere. We could argue about what is the "pure" and "correct" thing to do, but we have a limited time frame here, so I suggest we just look at the important facts. 1

[issue3532] bytes.tohex method

2008-08-10 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: So I assumed. In that case, why is there a "fromhex"? (Was that put in there before the notion of transform/untransform?) As I've been saying, it's weird to have a fromhex but not a tohex. Anyway, assuming we g

[issue3532] bytes.tohex method

2008-08-10 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: OK thanks. Well I still can't really see what transform/untransform are about. Is it OK to keep this issue open (and listed as 3.1) until more information becomes available on those methods? ___

[issue3532] bytes.tohex method

2008-08-10 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Oh, where's the information on those? (A brief search of the peps and bug tracker shows nothing). ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-10 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: > Invalid user input? What if the query string comes from filling > a form? > For example if I search the word "numéro" in a latin1 Web site, > I get the following URL: > http://www.le-tigre.net/spip.php?page=re

[issue3532] bytes.tohex method

2008-08-10 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: > Except, when we look at the context. This is bytes class > method returns a bytes or bytearray object, decoding the given > string object. > Do we require an opposite in the bytes class method? Where will > we use it? No,

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-10 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Made a bunch of requested changes (I've reverted the "all safe" patch for now since it caused so much grief; see above). * quote: Fixed encoding illegal % sequences (and lots of new test cases to prove it). * quote now thr

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-10 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Guido suggested that quote's "safe" parameter should allow any character, not just ASCII range. I've implemented this now. It was a lot messier than I imagined. The problem is that in my older patches, both 's&#

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-09 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: I've been thinking more about the errors="strict" default. I think this was Guido's suggestion. I've decided I'd rather stick with errors="replace". I changed errors="replace" to errors=

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-09 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: > Bill's main concern is with a policy decision; I doubt he would > object to using your code once that is resolved. But his patch does the same basic operations as mine, just implemented differently and with the heap of issues

[issue3532] bytes.tohex method

2008-08-09 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: You did the 3.1 thing again! We can accept a new feature like this before 3.0b3, can we not? > Hmm. There are probably many modules that you haven't used yet. Snap :) Well, I didn't know about the community's prefe

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-09 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Bill, I had a look at your patch. I see you've decided to make quote_as_string the default? In that case, I don't know why you had to rewrite everything to implement the same basic behaviour as my patch. (My latest few patches su

[issue3532] bytes.tohex method

2008-08-09 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: > I recommend to use binascii.hexlify. Ah, see I did not know about this! Thanks for pointing it out. * However, it is *very* obscure. I've been using Python for a year and I didn't know about it. * And, it requires import

[issue3532] bytes.tohex method

2008-08-09 Thread Matt Giuca
New submission from Matt Giuca <[EMAIL PROTECTED]>: I haven't been able to find a way to encode a bytes object in hexadecimal, where in Python 2.x I'd go "str.encode('hex')". I recommend adding a bytes.tohex() method (in the same vein as the existing bytes.f

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-07 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: I'm also attaching a "metapatch" - diff from patch 7 to patch 8. This is to give a rough idea of what I changed since the review. (Sorry - This is actually a diff between the two patches, so it's pretty hard to read. I

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-07 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Following Guido and Antoine's reviews, I've written a new patch which fixes *most* of the issues raised. The ones I didn't fix I have noted below, and commented on the review site (http://codereview.appspot.com/2827/). Note

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-07 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: > The important is that the defaults are safe. If users want to override > the defaults and produce potentially invalid URIs, there is no reason to > discourage them. OK I think that's a fairly valid argument. I'm about

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-07 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: A reply to a point on GvR's review, I'd like to open for discussion. This relates to whether or not quote's "safe" argument should allow non-ASCII characters. > Using errors='ignore' seems like a mis

[issue3300] urllib.quote and unquote - Unicode issues

2008-08-07 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Dear GvR, New code review comments by mgiuca have been published. Please go to http://codereview.appspot.com/2827 to read them. Message: Hi Guido, Thanks very much for this very detailed review. I've replied to the comments. I wi

[issue3478] Documentation for struct module is out of date in 3.0

2008-07-31 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Thanks for the props! ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3478> ___ ___ Python

[issue3300] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Hmm ... seems patch 6 I just checked in fails a test case! Sorry! (It's minor, gives a harmless BytesWarning if you run with -b, which "make test" does, so I only picked it up after submitting). I've slightly chang

[issue3478] Documentation for struct module is out of date in 3.0

2008-07-31 Thread Matt Giuca
New submission from Matt Giuca <[EMAIL PROTECTED]>: The documentation for the "struct" module still uses the term "string" even though the struct module itself deals entirely in bytes objects in Python 3.0. I propose updating the documentation to reflect the 3.0 ter

[issue3300] urllib.quote and unquote - Unicode issues

2008-07-31 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: OK after a long discussion on the mailing list, Guido gave this the OK, with the provision that there are str->bytes and bytes->str versions of these functions as well. So I've written those. http://mail.python.org/pipermail/pyt

[issue3348] Cannot start wsgiref simple server in Py3k

2008-07-22 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Wow, I read the WSGI spec. That seems very strange that it says "HTTP does not directly support Unicode, and neither does this interface." Clearly HTTP *does* support Unicode, because it allows you to specify an encoding. I assu

[issue3348] Cannot start wsgiref simple server in Py3k

2008-07-22 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Are you saying the stream passed to _write SHOULD always be a binary stream, and hence the test case is wrong, because it opens a text stream? (I'm not sure where the stream comes from, but we should guarantee it's a binary str

[issue3300] urllib.quote and unquote - Unicode issues

2008-07-12 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: So today I grepped for "urllib" in the entire library in an effort to track down every dependency on quote and unquote to see exactly how my patch breaks other code. I've now investigated every module in the library which u

[issue3348] Cannot start wsgiref simple server in Py3k

2008-07-12 Thread Matt Giuca
New submission from Matt Giuca <[EMAIL PROTECTED]>: The wsgiref "simple server" module has a demo server, which fails to start in Python 3.0 for a bunch of reasons. To verify this, just go into the Lib/wsgiref directory, and run: python3.0 ./simple_server.py (which launches

[issue3347] urllib.robotparser doesn't work in Py3k

2008-07-12 Thread Matt Giuca
New submission from Matt Giuca <[EMAIL PROTECTED]>: urllib.robotparser is broken in Python 3.0, due to a bytes object appearing where a str is expected. Example: >>> import urllib.robotparser >>> r = urllib.robotparser.RobotFileParser('http://www.python.org/robots

[issue3300] urllib.quote and unquote - Unicode issues

2008-07-12 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: OK I spent awhile writing test cases for quote and unquote, encoding and decoding various Unicode strings with different encodings. As a result, I found a bunch of issues in my previous patch, so I've rewritten the patches to bot

[issue3300] urllib.quote and unquote - Unicode issues

2008-07-11 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: Since I got a complaint that my last reply was too long, I'll summarize it. It's a bug report, not a feature request. I can't get a simple web app to be properly Unicode-aware in Python 3, which worked fine in Python 2. T

[issue3300] urllib.quote and unquote - Unicode issues

2008-07-11 Thread Matt Giuca
Matt Giuca <[EMAIL PROTECTED]> added the comment: > 3.0b1 has been released, so no new features can be added to 3.0. While my proposal is no doubt going to cause a lot of code breakage, I hardly consider it a "new feature". This is very definitely a bug. As I understand it,

  1   2   >