Re: Case-insensitive string equality

2017-09-05 Thread Steve D'Aprano
On Wed, 6 Sep 2017 12:27 am, Grant Edwards wrote: > On 2017-09-03, Gregory Ewing wrote: >> Stefan Ram wrote: >>> But of >>> course, actually the rules of orthography require "Maße" or >>> "Masse" and do not allow "MASSE" or "MASZE", just as in >>> English, "English" has to be written "Eng

Re: Case-insensitive string equality

2017-09-05 Thread Chris Angelico
On Wed, Sep 6, 2017 at 12:27 AM, Grant Edwards wrote: > On 2017-09-03, Gregory Ewing wrote: >> Stefan Ram wrote: >>> But of >>> course, actually the rules of orthography require "Maße" or >>> "Masse" and do not allow "MASSE" or "MASZE", just as in >>> English, "English" has to be written

Re: Case-insensitive string equality

2017-09-05 Thread Grant Edwards
On 2017-09-03, Gregory Ewing wrote: > Stefan Ram wrote: >> But of >> course, actually the rules of orthography require "Maße" or >> "Masse" and do not allow "MASSE" or "MASZE", just as in >> English, "English" has to be written "English" and not >> "english" or "ENGLISH". > > While "engl

Re: Case-insensitive string equality

2017-09-05 Thread Chris Angelico
On Tue, Sep 5, 2017 at 6:05 PM, Stefan Behnel wrote: > Steve D'Aprano schrieb am 02.09.2017 um 02:31: >> - the German eszett, ß, which has two official[1] uppercase forms: 'SS' >> and an uppercase eszett > > I wonder if there is an equivalent to Godwin's Law with respect to > character case relate

Re: Case-insensitive string equality

2017-09-05 Thread Stefan Behnel
Steve D'Aprano schrieb am 02.09.2017 um 02:31: > - the German eszett, ß, which has two official[1] uppercase forms: 'SS' > and an uppercase eszett I wonder if there is an equivalent to Godwin's Law with respect to character case related discussions and the German ß. Stefan -- https://mail.pytho

Re: Case-insensitive string equality

2017-09-04 Thread Rick Johnson
Steven D'Aprano wrote: [...] > (1) Add a new string method, which performs a case- > insensitive equality test. Here is a potential > implementation, written in pure Python: > > def equal(self, other): > if self is other: > return True > if not isinstance(other, str): > rai

Re: Case-insensitive string equality

2017-09-04 Thread Tim Chase
On 2017-09-02 12:21, Steve D'Aprano wrote: > On Fri, 1 Sep 2017 01:29 am, Tim Chase wrote: > > I'd want to have an optional parameter to take locale into > > consideration. E.g. > > Does regular case-sensitive equality take the locale into > consideration? No. Python says that .casefold() ht

Re: Capital ß [was Re: Case-insensitive string equality]

2017-09-04 Thread MRAB
On 2017-09-04 03:28, Steve D'Aprano wrote: On Sat, 2 Sep 2017 01:48 pm, Stefan Ram wrote: Steve D'Aprano writes: [1] I believe that the German government has now officially recognised the uppercase form of ß. [skip to the last paragraph for some "ß" content, unless you want to read deta

Capital ß [was Re: Case-insensitive string equality]

2017-09-03 Thread Steve D'Aprano
On Sat, 2 Sep 2017 01:48 pm, Stefan Ram wrote: > Steve D'Aprano writes: >>[1] I believe that the German government has now officially recognised the >>uppercase form of ß. > > [skip to the last paragraph for some "ß" content, > unless you want to read details about German spelling rules.] >

Re: Case-insensitive string equality

2017-09-03 Thread Steve D'Aprano
On Mon, 4 Sep 2017 09:10 am, Gregory Ewing wrote: > Stefan Ram wrote: >> But of >> course, actually the rules of orthography require "Maße" or >> "Masse" and do not allow "MASSE" or "MASZE", just as in >> English, "English" has to be written "English" and not >> "english" or "ENGLISH". >

Re: Case-insensitive string equality

2017-09-03 Thread Gregory Ewing
Stefan Ram wrote: But of course, actually the rules of orthography require "Maße" or "Masse" and do not allow "MASSE" or "MASZE", just as in English, "English" has to be written "English" and not "english" or "ENGLISH". While "english" is wrong in English, there's no rule against usin

Re: Case-insensitive string equality

2017-09-03 Thread Pavol Lisy
On 9/3/17, Steve D'Aprano wrote: > On Sun, 3 Sep 2017 05:17 pm, Stephan Houben wrote: > >> Generally speaking, the more you learn about case normalization, >> the more attractive case sensitivity looks > > Just because something is hard doesn't mean its not worth doing. > > And just because you ca

Re: Case-insensitive string equality

2017-09-03 Thread Steve D'Aprano
On Sun, 3 Sep 2017 05:17 pm, Stephan Houben wrote: > Generally speaking, the more you learn about case normalization, > the more attractive case sensitivity looks Just because something is hard doesn't mean its not worth doing. And just because you can't please all the people all the time doesn'

Re: Case-insensitive string equality

2017-09-03 Thread Chris Angelico
On Sun, Sep 3, 2017 at 5:17 PM, Stephan Houben wrote: > Generally speaking, the more you learn about case normalization, > the more attractive case sensitivity looks ;-) Absolutely agreed. My general recommendation is to have two vastly different concepts: "equality matching" and "searching". Equ

Re: Case-insensitive string equality

2017-09-03 Thread Stephan Houben
Op 2017-09-02, Pavol Lisy schreef : > But problem is that if somebody like to have stable API it has to be > changed to "do what the Unicode consortium said (at X.Y. )" :/ It is even more exciting. Presumably a reason to have case-insentivity is to be compatible with existing popular case-inse

Re: Case-insensitive string equality

2017-09-02 Thread Pavol Lisy
On 9/2/17 at 4:21, Steve D'Aprano wrote: > If regular case-sensitive string comparisons don't support the locale, why > should case-insensitive comparisons be required to? I think that Chris answered very good before: On 9/2/17 at 2:53 AM, Chris Angelico wrote: > On Sat, Sep 2, 2017 at 10:31 AM

Re: Case-insensitive string equality

2017-09-01 Thread Chris Angelico
On Sat, Sep 2, 2017 at 12:21 PM, Steve D'Aprano wrote: > On Fri, 1 Sep 2017 01:29 am, Tim Chase wrote: > >> On 2017-08-31 07:10, Steven D'Aprano wrote: >>> So I'd like to propose some additions to 3.7 or 3.8. >> >> Adding my "yes, a case-insensitive equality-check would be useful" >> with the foll

Re: Case-insensitive string equality

2017-09-01 Thread Steve D'Aprano
On Fri, 1 Sep 2017 01:29 am, Tim Chase wrote: > On 2017-08-31 07:10, Steven D'Aprano wrote: >> So I'd like to propose some additions to 3.7 or 3.8. > > Adding my "yes, a case-insensitive equality-check would be useful" > with the following concerns: > > I'd want to have an optional parameter to

Re: Case-insensitive string equality

2017-09-01 Thread Chris Angelico
On Sat, Sep 2, 2017 at 10:31 AM, Steve D'Aprano wrote: > On Sat, 2 Sep 2017 01:41 am, Chris Angelico wrote: > >> Aside from lower(), which returns the string unchanged, the case >> conversion rules say that this contains two letters. > > Do you have a reference to that? > > I mean, where in the Un

Re: Case-insensitive string equality

2017-09-01 Thread Steve D'Aprano
On Sat, 2 Sep 2017 01:41 am, Chris Angelico wrote: > Aside from lower(), which returns the string unchanged, the case > conversion rules say that this contains two letters. Do you have a reference to that? I mean, where in the Unicode case conversion rules is that stated? You cannot take the beh

Re: Case-insensitive string equality

2017-09-01 Thread Chris Angelico
On Sat, Sep 2, 2017 at 10:09 AM, Steve D'Aprano wrote: > The question wasn't what "\N{LATIN SMALL LIGATURE FI}".upper() would find, > but "\N{LATIN SMALL LIGATURE FI}". > > Nor did they ask about > > "\N{LATIN SMALL LIGATURE FI}".replace("\N{LATIN SMALL LIGATURE > FI}", "Surprise!") > >> So what's

Re: Case-insensitive string equality

2017-09-01 Thread Steve D'Aprano
On Sat, 2 Sep 2017 01:41 am, Chris Angelico wrote: > On Fri, Sep 1, 2017 at 11:22 PM, Steve D'Aprano > wrote: >> On Fri, 1 Sep 2017 09:53 am, MRAB wrote: >> >>> What would you expect the result would be for: >>> >>> "\N{LATIN SMALL LIGATURE FI}".case_insensitive_find("F") >>> >>> "\N{LA

Re: Case-insensitive string equality

2017-09-01 Thread Chris Angelico
On Fri, Sep 1, 2017 at 11:22 PM, Steve D'Aprano wrote: > On Fri, 1 Sep 2017 09:53 am, MRAB wrote: > >> What would you expect the result would be for: >> >> "\N{LATIN SMALL LIGATURE FI}".case_insensitive_find("F") >> >> "\N{LATIN SMALL LIGATURE FI}".case_insensitive_find("I) > > That's ea

Re: Case-insensitive string equality

2017-09-01 Thread Steve D'Aprano
On Thu, 31 Aug 2017 08:15 pm, Rhodri James wrote: > I'd quibble about the name and the implementation (length is not > preserved under casefolding), Yes, I'd forgotten about that. > but I'd go for this. The number of times > I've written something like this in different languages... [...] >

Re: Case-insensitive string equality

2017-09-01 Thread Steve D'Aprano
On Fri, 1 Sep 2017 09:53 am, MRAB wrote: > What would you expect the result would be for: > > "\N{LATIN SMALL LIGATURE FI}".case_insensitive_find("F") > > "\N{LATIN SMALL LIGATURE FI}".case_insensitive_find("I) That's easy. -1 in both cases, since neither "F" nor "I" is found in eit

Re: Case-insensitive string equality

2017-08-31 Thread Pete Forman
Steven D'Aprano writes: > Three times in the last week the devs where I work accidentally > introduced bugs into our code because of a mistake with case-insensitive > string comparisons. They managed to demonstrate three different failures: > > # 1 > a = something().upper() # normalise string >

Re: Case-insensitive string equality

2017-08-31 Thread Tim Chase
On 2017-09-01 00:53, MRAB wrote: > What would you expect the result would be for: > >>> "\N{LATIN SMALL LIGATURE FI}".case_insensitive_find("F") 0 >>> "\N{LATIN SMALL LIGATURE FI}".case_insensitive_find("I) 0.5 >>> "\N{LATIN SMALL LIGATURE FFI}".case_insensitive_find("I) 0.6

Re: Case-insensitive string equality

2017-08-31 Thread MRAB
On 2017-08-31 16:29, Tim Chase wrote: On 2017-08-31 07:10, Steven D'Aprano wrote: So I'd like to propose some additions to 3.7 or 3.8. Adding my "yes, a case-insensitive equality-check would be useful" with the following concerns: I'd want to have an optional parameter to take locale into con

Re: Case-insensitive string equality

2017-08-31 Thread Tim Chase
On 2017-08-31 07:10, Steven D'Aprano wrote: > So I'd like to propose some additions to 3.7 or 3.8. Adding my "yes, a case-insensitive equality-check would be useful" with the following concerns: I'd want to have an optional parameter to take locale into consideration. E.g. "i".case_insensitiv

Re: Case-insensitive string equality

2017-08-31 Thread Pavol Lisy
On 8/31/17, Steve D'Aprano wrote: >> Additionally: a proper "case insensitive comparison" should almost >> certainly start with a Unicode normalization. But should it be NFC/NFD >> or NFKC/NFKD? IMO that's a good reason to leave it in the hands of the >> application. > > Normalisation is orthogo

Re: Case-insensitive string equality

2017-08-31 Thread John Gordon
In Serhiy Storchaka writes: > > But when there is a common source of mistakes, we can help prevent > > that mistake. > How can you do this? I know only one way -- teaching and practicing. Modify the environment so that the mistake simply can't happen (or at least happens much less frequently.

Re: Case-insensitive string equality

2017-08-31 Thread Tim Chase
On 2017-08-31 18:17, Peter Otten wrote: > A quick and dirty fix would be a naming convention: > > upcase_a = something().upper() I tend to use a "_u" suffix as my convention: something_u = something.upper() which keeps the semantics of the original variable-name while hinting at the normaliza

Re: Case-insensitive string equality

2017-08-31 Thread Peter Otten
Steven D'Aprano wrote: > Three times in the last week the devs where I work accidentally > introduced bugs into our code because of a mistake with case-insensitive > string comparisons. They managed to demonstrate three different failures: > > # 1 > a = something().upper() # normalise string > .

Re: Case-insensitive string equality

2017-08-31 Thread Tim Chase
On 2017-08-31 23:30, Chris Angelico wrote: > The method you proposed seems a little odd - it steps through the > strings character by character and casefolds them separately. How is > it superior to the two-line function? And it still doesn't solve any > of your other cases. It also breaks when ca

Re: Case-insensitive string equality

2017-08-31 Thread Rhodri James
On 31/08/17 15:03, Chris Angelico wrote: On Thu, Aug 31, 2017 at 11:53 PM, Stefan Ram wrote: Chris Angelico writes: On Thu, Aug 31, 2017 at 10:49 PM, Steve D'Aprano wrote: On Thu, 31 Aug 2017 05:51 pm, Serhiy Storchaka wrote: 31.08.17 10:10, Steven D'Aprano ???: def equal(s, t): ret

Re: Case-insensitive string equality

2017-08-31 Thread Serhiy Storchaka
ormed. What are you discussing? Without knowing what problem you are solving and what solution your are proposed it is hard to discuss it. The easy one-line function solves the problem of testing case-insensitive string equality. True. Except that when a problem is as common as case-insens

Re: Case-insensitive string equality

2017-08-31 Thread Steve D'Aprano
t; solves the problem of testing case-insensitive string equality. True. Except that when a problem is as common as case-insensitive comparisons, there should be a standard solution, instead of having to re-invent the wheel over and over again. Even when the wheel is only two or three lines. This

Re: Case-insensitive string equality

2017-08-31 Thread Chris Angelico
On Fri, Sep 1, 2017 at 12:27 AM, Steve D'Aprano wrote: >> Additionally: a proper "case insensitive comparison" should almost >> certainly start with a Unicode normalization. But should it be NFC/NFD >> or NFKC/NFKD? IMO that's a good reason to leave it in the hands of the >> application. > > Norma

Re: Case-insensitive string equality

2017-08-31 Thread Steve D'Aprano
On Fri, 1 Sep 2017 12:03 am, Chris Angelico wrote: > On Thu, Aug 31, 2017 at 11:53 PM, Stefan Ram wrote: >> Chris Angelico writes: >>>The method you proposed seems a little odd - it steps through the >>>strings character by character and casefolds them separately. How is >>>it superior to the t

Re: Case-insensitive string equality

2017-08-31 Thread Chris Angelico
On Thu, Aug 31, 2017 at 11:53 PM, Stefan Ram wrote: > Chris Angelico writes: >>On Thu, Aug 31, 2017 at 10:49 PM, Steve D'Aprano >> wrote: >>> On Thu, 31 Aug 2017 05:51 pm, Serhiy Storchaka wrote: 31.08.17 10:10, Steven D'Aprano ???: > def equal(s, t): > return s.casefold() == t.

Re: Case-insensitive string equality

2017-08-31 Thread Serhiy Storchaka
he easy two-line function doesn't even come close to solving the problem of case-insensitive string operations; It is not clear what is your problem exactly. The easy one-line function solves the problem of testing case-insensitive string equality. Regular expressions solve the problem o

Re: Case-insensitive string equality

2017-08-31 Thread Chris Angelico
On Thu, Aug 31, 2017 at 10:49 PM, Steve D'Aprano wrote: > On Thu, 31 Aug 2017 05:51 pm, Serhiy Storchaka wrote: > >> 31.08.17 10:10, Steven D'Aprano пише: >>> (iii) Not every two line function needs to be in the standard library. >>> Just add this to the top of every module: >>> >>> def equal(s, t

Re: Case-insensitive string equality

2017-08-31 Thread Steve D'Aprano
On Thu, 31 Aug 2017 05:51 pm, Serhiy Storchaka wrote: > 31.08.17 10:10, Steven D'Aprano пише: >> (iii) Not every two line function needs to be in the standard library. >> Just add this to the top of every module: >> >> def equal(s, t): >> return s.casefold() == t.casefold() > > This is my a

Re: Case-insensitive string equality

2017-08-31 Thread Rhodri James
On 31/08/17 08:10, Steven D'Aprano wrote: So I'd like to propose some additions to 3.7 or 3.8. If the feedback here is positive, I'll take it to Python-Ideas for the negative feedback :-) (1) Add a new string method, which performs a case-insensitive equality test. Here is a potential implement

Re: Case-insensitive string equality

2017-08-31 Thread Serhiy Storchaka
31.08.17 10:10, Steven D'Aprano пише: (iii) Not every two line function needs to be in the standard library. Just add this to the top of every module: def equal(s, t): return s.casefold() == t.casefold() This is my answer. Unsolved problems: This proposal doesn't help with sets and dic

Re: Case-insensitive string equality

2017-08-31 Thread Antoon Pardon
IMO this should be solved by a company used library and I would go in the direction of a Normalized_String class. This has the advantages (1) that the company can choose whatever normalization suits them, not all cases are suited by comparing case insentitively, (2) individual devs in the com

Case-insensitive string equality

2017-08-31 Thread Steven D'Aprano
Three times in the last week the devs where I work accidentally introduced bugs into our code because of a mistake with case-insensitive string comparisons. They managed to demonstrate three different failures: # 1 a = something().upper() # normalise string ... much later on if a == b.lower():