Re: [Python-Dev] Security implications of pep 383

2011-03-30 Thread Terry Reedy
On 3/30/2011 6:39 PM, Toshio Kuratomi wrote: Really, surrogates are a red herring to this whole issue. The issue is that the original code was trying to compare two different transformations of byte sequences and expecting them to be equal. Let's say that you have the following byte value::

Re: [Python-Dev] Security implications of pep 383

2011-03-30 Thread Toshio Kuratomi
On Wed, Mar 30, 2011 at 08:36:43AM +0200, Lennart Regebro wrote: > On Wed, Mar 30, 2011 at 07:54, Toshio Kuratomi wrote: > > Lennart is missing that you just need to use the same encoding > > + surrogateescape (or stick with bytes) for decoding the byte strings that > > you are comparing. > > You

Re: [Python-Dev] Security implications of pep 383

2011-03-30 Thread Terry Reedy
On 3/30/2011 2:57 AM, Gregory P. Smith wrote: http://blog.omega-prime.co.uk/?p=107 I posted link to this as comment, with my summary of thread. I don't see your comment on the blog post. So either the author is moderating comments and hasn't seen yours yet (likely) My comment and Nick's

Re: [Python-Dev] Security implications of pep 383

2011-03-30 Thread Glenn Linderman
On 3/29/2011 12:10 PM, Toshio Kuratomi wrote: The possible flaw in python is this: Code like the blog poster wrote passes python3 without an error or a warning. This gives the programmer no feedback that they're doing something wrong until it actually bites them in the foot in deployed code.

Re: [Python-Dev] Security implications of pep 383

2011-03-30 Thread Nick Coghlan
On Wed, Mar 30, 2011 at 4:57 PM, Gregory P. Smith wrote: > I don't see your comment on the blog post.  So either the author is > moderating comments and hasn't seen yours yet (likely) or they don't want > disagreement in their comments. ;) My comment was sitting in the moderation queue last time

Re: [Python-Dev] Security implications of pep 383

2011-03-30 Thread Gregory P. Smith
On Tue, Mar 29, 2011 at 4:07 PM, Terry Reedy wrote: > On 3/29/2011 2:23 PM, Michael Foord wrote: > > Not sure how real the security risk is here: >> >> http://blog.omega-prime.co.uk/?p=107 >> >> Basically he is saying that if you store a list of blacklisted files >> with names encoded in big-5 (

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Lennart Regebro
On Tue, Mar 29, 2011 at 23:17, "Martin v. Löwis" wrote: > I think the whole blacklist example is artificial. The string in the > blacklist is actually a Chinese "hello" greeting, so it surely isn't > the string being blacklisted. For proper blacklisting, you would likely > use substring searches,

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Lennart Regebro
On Wed, Mar 30, 2011 at 07:54, Toshio Kuratomi wrote: > Lennart is missing that you just need to use the same encoding > + surrogateescape (or stick with bytes) for decoding the byte strings that > you are comparing. You lost me here. I need to do this for what? //Lennart ___

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Toshio Kuratomi
On Tue, Mar 29, 2011 at 10:55:47PM +0200, Victor Stinner wrote: > Le mardi 29 mars 2011 à 22:40 +0200, Lennart Regebro a écrit : > > The lesson here seems to be "if you have to use blacklists, and you > > use unicode strings for those blacklists, also make sure the string > > you compare with doesn

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Terry Reedy
On 3/29/2011 2:23 PM, Michael Foord wrote: Not sure how real the security risk is here: http://blog.omega-prime.co.uk/?p=107 Basically he is saying that if you store a list of blacklisted files with names encoded in big-5 (or some other non-utf8 compatible encoding) if those names are passed a

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Martin v. Löwis
> '\N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}' != '\N{LATIN SMALL > LETTER O WITH DIAERESIS}' > > I guess the filesystem shouldn't treat these as the same (even though > they are), but what if some webservice does? I suspect you should > normalize both strings before comparing them in any bla

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Victor Stinner
Le mardi 29 mars 2011 à 22:45 +0200, Lennart Regebro a écrit : > On Tue, Mar 29, 2011 at 22:40, Lennart Regebro wrote: > > The lesson here seems to be "if you have to use blacklists, and you > > use unicode strings for those blacklists, also make sure the string > > you compare with doesn't have s

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Antoine Pitrou
On Tue, 29 Mar 2011 22:40:01 +0200 Lennart Regebro wrote: > The lesson here seems to be "if you have to use blacklists, and you > use unicode strings for those blacklists, also make sure the string > you compare with doesn't have surrogates". Not really. As everyone said, this can happen even wit

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Victor Stinner
Le mardi 29 mars 2011 à 22:40 +0200, Lennart Regebro a écrit : > The lesson here seems to be "if you have to use blacklists, and you > use unicode strings for those blacklists, also make sure the string > you compare with doesn't have surrogates". No. '\u4f60\u597d'.encode('big5').decode('latin1')

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Lennart Regebro
The lesson here seems to be "if you have to use blacklists, and you use unicode strings for those blacklists, also make sure the string you compare with doesn't have surrogates". //Lennart ___ Python-Dev mailing list Python-Dev@python.org http://mail.pyt

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Lennart Regebro
On Tue, Mar 29, 2011 at 22:40, Lennart Regebro wrote: > The lesson here seems to be "if you have to use blacklists, and you > use unicode strings for those blacklists, also make sure the string > you compare with doesn't have surrogates". > For that matter, what happens with combining characters?

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Victor Stinner
Le mardi 29 mars 2011 à 19:23 +0100, Michael Foord a écrit : > Hey all, > > Not sure how real the security risk is here: > > http://blog.omega-prime.co.uk/?p=107 > > Basically he is saying that if you store a list of blacklisted files > with names encoded in big-5 (or some other non-utf8

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Toshio Kuratomi
On Tue, Mar 29, 2011 at 07:23:25PM +0100, Michael Foord wrote: > Hey all, > > Not sure how real the security risk is here: > > http://blog.omega-prime.co.uk/?p=107 > > Basically he is saying that if you store a list of blacklisted files > with names encoded in big-5 (or some other non-utf8

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Laura Creighton
In a message of Tue, 29 Mar 2011 19:23:25 BST, Michael Foord writes: >Hey all, > >Not sure how real the security risk is here: > > http://blog.omega-prime.co.uk/?p=107 > >Basically he is saying that if you store a list of blacklisted files >with names encoded in big-5 (or some other non-utf8

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Martin v. Löwis
> Not sure how real the security risk is here: > > http://blog.omega-prime.co.uk/?p=107 > > Basically he is saying that if you store a list of blacklisted files > with names encoded in big-5 (or some other non-utf8 compatible encoding) > if those names are passed at the command line, or othe

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Antoine Pitrou
On Tue, 29 Mar 2011 19:23:25 +0100 Michael Foord wrote: > Hey all, > > Not sure how real the security risk is here: > > http://blog.omega-prime.co.uk/?p=107 > > Basically he is saying that if you store a list of blacklisted files > with names encoded in big-5 (or some other non-utf8 comp

[Python-Dev] Security implications of pep 383

2011-03-29 Thread Michael Foord
Hey all, Not sure how real the security risk is here: http://blog.omega-prime.co.uk/?p=107 Basically he is saying that if you store a list of blacklisted files with names encoded in big-5 (or some other non-utf8 compatible encoding) if those names are passed at the command line, or other