Re: Correct handling of case in unicode and regexps

2013-02-24 Thread jmfauth
On 23 fév, 15:26, Devin Jeanpierre wrote: > Hi folks, > > I'm pretty unsure of myself when it comes to unicode. As I understand > it, you're generally supposed to compare things in a case insensitive > manner by case folding, right? So instead of a.lower() == b.lower() > (the ASCII way), you do a.

Re: Correct handling of case in unicode and regexps

2013-02-23 Thread MRAB
On 2013-02-23 18:57, Devin Jeanpierre wrote: On Sat, Feb 23, 2013 at 1:12 PM, MRAB wrote: The basic rule is that a series of characters in the regex must match a series of characters in the text, with no partial matches in either. For example, 'ss' can match 'ß', but 's' can't match 'ß' becaus

Re: Correct handling of case in unicode and regexps

2013-02-23 Thread Devin Jeanpierre
On Sat, Feb 23, 2013 at 1:12 PM, MRAB wrote: > The basic rule is that a series of characters in the regex must match a > series of characters in the text, with no partial matches in either. > > For example, 'ss' can match 'ß', but 's' can't match 'ß' because that > would be matching part of 'ß'. >

Re: Correct handling of case in unicode and regexps

2013-02-23 Thread MRAB
On 2013-02-23 17:51, Devin Jeanpierre wrote: On Sat, Feb 23, 2013 at 12:41 PM, MRAB wrote: Getting full case folding to work can be tricky. There's always going to be a limit to what's worth doing. There are also areas where it's not clear what the result should be. You've already mentioned ma

Re: Correct handling of case in unicode and regexps

2013-02-23 Thread Devin Jeanpierre
On Sat, Feb 23, 2013 at 12:41 PM, MRAB wrote: > Getting full case folding to work can be tricky. There's always going to > be a limit to what's worth doing. > > There are also areas where it's not clear what the result should be. > You've already mentioned matching 's' against 'ß' (fails) and matc

Re: Correct handling of case in unicode and regexps

2013-02-23 Thread MRAB
On 2013-02-23 15:30, Devin Jeanpierre wrote: On Sat, Feb 23, 2013 at 10:26 AM, Devin Jeanpierre wrote: However, regex has the same behavior. My apologies, I forgot to set the VERSION1 flag. Interesting. 'ss' matches 'ß', but 's+' does not. Is this desirable behavior? Getting full case fol

Re: Correct handling of case in unicode and regexps

2013-02-23 Thread Devin Jeanpierre
On Sat, Feb 23, 2013 at 10:26 AM, Devin Jeanpierre wrote: > However, regex has the same behavior. My apologies, I forgot to set the VERSION1 flag. Interesting. 'ss' matches 'ß', but 's+' does not. Is this desirable behavior? -- Devin -- http://mail.python.org/mailman/listinfo/python-list

Re: Correct handling of case in unicode and regexps

2013-02-23 Thread Devin Jeanpierre
On Sat, Feb 23, 2013 at 10:11 AM, Vlastimil Brom wrote: > you may check the new regex implementation > https://pypi.python.org/pypi/regex > which does support casefolding in case insensitive matches (beyond > many other features and improvements comparing to re) Good point, I've been looking only

Re: Correct handling of case in unicode and regexps

2013-02-23 Thread Vlastimil Brom
2013/2/23 Devin Jeanpierre : > Hi folks, > > I'm pretty unsure of myself when it comes to unicode. As I understand > it, you're generally supposed to compare things in a case insensitive > manner by case folding, right? So instead of a.lower() == b.lower() > (the ASCII way), you do a.casefold() ==