[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread Georg Brandl
Georg Brandl ge...@python.org added the comment: I'd really like someone else to throw a pair of eyes at the code changes before it is committed. But yes, I will allow this into rc2, and we'll buy some stabilizing time by adding an rc3 to the cycle. --

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread Georg Brandl
Changes by Georg Brandl ge...@python.org: -- Removed message: http://bugs.python.org/msg127391 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9124 ___

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread Georg Brandl
Georg Brandl ge...@python.org added the comment: I'd really like someone else to throw a pair of eyes at the code changes before it is committed. But yes, I will allow this into rc2, since a completely broken module isn't really what a minor release is about. --

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso sdao...@googlemail.com added the comment: RDM: it seems i was too tired to get your messages right last evening! Indeed it's now completely my fault, i should inspect the content further in respect to the str/bytes etc. stuff! Thus - i will now need three or four days

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: +if isinstance(message, io.TextIOWrapper): +# Backward compatibility hack. +message = message.buffer Is it a good thing to parse a mailbox using a text file? If not, we should emit a

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: Victor: yes, I was thinking that when I added that comment but forgot to come back to it. Thanks for spotting that. Another thing I forgot about yesterday is that I activated the commented out statements that do linesep transformations

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: OK, I've added deprecation warnings for using StringIO or text mode files as input. I found one bug thereby, but it is a bug that pre-existed the patch (see issue 11062). I've completed my code review. To address Victor's question

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: The last step is running the tests on Windows. Attached is the updated patch. mailbox4.patch doesn't pass on Windows, Raymond is working on a patch. -- ___ Python tracker

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: (I hope you meant I was working on a patch :) Patch is done, but there is one remaining test failure that I'm not sure how to handle. The test is test_add_text_file_warns. The code checks to see if a file is a subclass of

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: Benjamin suggested using hasattr(message, 'buffer'), and that works great. The test revealed a bug in the patch, which is now fixed. All tests pass on windows. As far as I'm concerned the patch is ready to go. Other reviews would of

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: Committed (with RM approval on IRC) in r88252. Note that this does not necessarily solve the performance problem. A new issue should be opened for that if it still exist. -- resolution: - fixed stage: commit review -

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso sdao...@googlemail.com added the comment: After cloning branches/py3k (i now have three different detached repo snakes in my arena (2.7,3.1,py3k), by the way - not bad for a greenhorn, huh?). I've applied RDMs patch from msg127245. Note: the test mails are *malformed*!

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: Steffen: thanks for testing. Do those error messages have tracebacks? Can you post them? Can you post example messages and a short program that demonstrates the problem? I'm going to be creating some non-ascii test cases, but any

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso sdao...@googlemail.com added the comment: Indeed i tried to create tracebacks (even with import traceback), but these all end up in my code (and all the time). I have not yet figured out how to create tracebacks which leave my code and reach the source, which surely

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: I don't see those error messages in the mailbox source. I'm guess your application isn trapping the errors in a try/except. In that case, just do a bare 'raise' in the except clause, and you should get the full traceback. I'm sure

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread A.M. Kuchling
Changes by A.M. Kuchling li...@amk.ca: -- nosy: -akuchling ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9124 ___ ___ Python-bugs-list mailing

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso sdao...@googlemail.com added the comment: You're indeed right, i've overseen a try..catch! I'll even be able to give you some fast code hints now (and i'll be offline the next few hours - the mails are simply mails with illegal charsets, say): Traceback (most recent call

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: What is the data type returned by your get_msg? I bet it is string, and email can't handle messages in string format that have non-ASCII characters (I'm adding an explicit error message for this). You either need to use a Message

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: I'm updating the patch to contain a couple tests using non-ASCII. More are needed. Before this patch, one could process a file containing non-ASCII characters as text, and if your default encoding happened to be able to decode it,

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
Changes by R. David Murray rdmur...@bitdance.com: Removed file: http://bugs.python.org/file20565/mailbox3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9124 ___

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso sdao...@googlemail.com added the comment: What is the data type returned by your get_msg? I bet it is string, and email can't handle messages in string format that have non-ASCII characters (Now i see that the local names 'box', 'mbox' and 'mailbox' have become

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: Well, that's a bunch of code, and I'm afraid I don't know what your answer to my question was. What error do you get now if you use the new version of mailbox3.patch? If you feed the new mailbox/email bytes, it will preserve the bytes

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: Added two more tests of non-ASCII. I think the tests now cover the necessary cases. I still want to do a full code review tomorrow, but I think the patch is in final form if anyone else is available to do a review as well. Georg, are

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
Changes by R. David Murray rdmur...@bitdance.com: Added file: http://bugs.python.org/file20585/mailbox3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9124 ___

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
Changes by R. David Murray rdmur...@bitdance.com: Removed file: http://bugs.python.org/file20579/mailbox3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9124 ___

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso sdao...@googlemail.com added the comment: I missed your mailbox3.patch, but now i've merged it in. One error changed, it now happens when a re.search is applied to a header value and thus seems to match what you say. I'm not able to understand this error this evening,

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: If you are using the most recent mailbox3 patch (I should have renamed it, sorry...I've no done so to make it clear) you should be getting an error message that tells you to use binary or Message. So I don't understand how you are

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-27 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: Attached is a patch that builds on Victor's patch, but takes the approach I discussed of maintaining backward compatibility (for the most part; see below). The test suite in this version is substantially unchanged. The major changes

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-26 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: I reverted r88197 because it was incorrect and caused an email test to fail. Once I come up with a test for it I'll fix it correctly. test_mailbox is a good (indirect) test suite for this change. The problem of r88197 is that it

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-26 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso sdao...@googlemail.com added the comment: This message will not help anyone. And it's not a chat, but because it seems i really made the horses gone grazy (direct translation of a german proverb): - msg127002: RDM, i didn't know all of that and i am really sorry. Now I

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-26 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: All test_email and test_mailbox pass with mailbox.patch+BytesGenerator_handle_text.patch on Windows except one test: == ERROR: test_set_item

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-26 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: There's a missing conversion in mailbox.patch. Running with -bb shows the issue. Here is an updated patch. -- Added file: http://bugs.python.org/file20532/mailbox2.patch ___ Python tracker

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-26 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: Haypo: yeah, in an ideal world Generator would use get_payload() and not _payload, but I had to make some compromises with purity of model separation in order to achieve the practical goal of handling bytes usefully. I'd like to fix

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-26 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: pitrou There's a missing conversion in mailbox.patch. pitrou Running with -bb shows the issue. pitrou Here is an updated patch. Good catch: test_mailbox now pass on Windows. -- Some remarks on mailbox2.patch. get_string() returns

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-26 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: I haven't looked at the items Haypo has pointed to yet, but I have looked at the API issues (get_string, add, etc). It seems to me that we have to make a decision here: do we break API backward compatibility and convert to consuming

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso sdao...@googlemail.com added the comment: Re-New to Python - Re-Started with Py3K in 2011. 'Found myself in a dead-end after 10 days of work because a KOI8-R spam mail causes the file I/O decoding process to fail - and there is NO WAY TO HANDLE THIS with mailbox.py! (Went

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: -- priority: high - critical versions: +Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9124 ___

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: I'm afraid so. The python3 uptake process was expected to take five years overall, and we are only up to about the second year at this point. So while you may have been away from Python for 6 years, you came back right in the middle

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: That should have been too late to make API changes for 3.2. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9124 ___

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: ISTM an API change is okay if it fixes a critical usability bug. Also, if this is going to ship as-is, the docs should get a big warning right at the top. Perhaps the source code should also emit a notice that the module

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: mailbox.patch: - open files in binary mode not as text - parse as bytes not as Unicode - replace email.generator.Generator() by email.generator.BytesGenerator() - use .message_from_bytes() instead of .message_from_str() - use

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: While working on this issue, I found and fixed two bugs in the email binary parser: r88196 and r88197. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9124

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: Nice. Thanks Victor. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9124 ___

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: Thanks, Victor, you beat me to it :) I'll see if I can review this tomorrow, or if not I can probably do it Thursday. I reverted r88197 because it was incorrect and caused an email test to fail. Once I come up with a test for it I'll

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-24 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: -- nosy: +haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9124 ___ ___

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-12 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: The aforementioned python-dev thread (available at http://mail.python.org/pipermail/python-dev/2010-June/101186.html ) explains things quite well. The mailbox module needs to be modified to use binary I/O, both for functionality and for speed.

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-12 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: RDM, you were suggested for this by Thomas Wouters (who wrote much of the existing code). Are you up for it? -- assignee: - r.david.murray nosy: +r.david.murray ___ Python

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-12 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: With the module being so slow as to be unusable, this can be considered a bugfix, so it is okay if the fix goes into 3.2.1. -- versions: -Python 3.3 ___ Python tracker

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-12 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: I've been intending to take a look at this issue at some point, but am not sure when I'd get to it. I took a quick look. It does seems to me that it is true that for data-validity purposes the message files need to be opened in binary