Must be whatever the ICU collating sequence does. It apparently sorts into an order you like, but does not sort the characters as being "the same". Unless they are "the same" they will not be in the same group.
The unifuzz "unaccented" collation does sort the two characters as "the same" (note that it is just happenstance what character you got for "S"): sqlite> select substr(name collate unaccented,1,1) from contributors group by substr(name collate unaccented,1,1); QUERY PLAN |--SCAN TABLE contributors USING COVERING INDEX contributorsNameIndex (~48 rows) `--USE TEMP B-TREE FOR GROUP BY addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Init 0 47 0 00 Start at 47 1 SorterOpen 1 2 0 k(1,UNACCENTED) 00 2 Integer 0 3 0 00 r[3]=0; clear abort flag 3 Null 0 6 6 00 r[6..6]=NULL 4 Gosub 5 43 0 00 5 OpenRead 2 5 0 k(2,,) 00 root=5 iDb=0; contributorsNameIndex 6 ColumnsUsed 2 0 0 1 00 7 Explain 7 0 0 SCAN TABLE contributors USING COVERING INDEX contributorsNameIndex (~48 rows) 00 8 Noop 0 0 0 00 Begin WHERE-loop0: contributors 9 Rewind 2 18 8 0 00 10 Noop 0 0 0 00 Begin WHERE-core 11 Column 2 0 10 00 r[10]=contributors.name 12 Function0 6 10 8 substr(3) 03 r[8]=func(r[10..12]) 13 Column 2 0 9 00 r[9]=contributors.name 14 MakeRecord 8 2 13 00 r[13]=mkrec(r[8..9]) 15 SorterInsert 1 13 0 00 key=r[13] 16 Noop 0 0 0 00 End WHERE-core 17 Next 2 10 0 01 18 Noop 0 0 0 00 End WHERE-loop0: contributors 19 OpenPseudo 3 13 2 00 2 columns in r[13] 20 SorterSort 1 46 0 00 GROUP BY sort 21 SorterData 1 13 3 00 r[13]=data 22 Column 3 0 7 00 r[7]= 23 Compare 6 7 1 k(1,UNACCENTED) 00 r[6] <-> r[7] 24 Jump 25 29 25 00 25 Move 7 6 1 00 r[6]=r[7] 26 Gosub 4 37 0 00 output one row 27 IfPos 3 46 0 00 if r[3]>0 then r[3]-=0, goto 46; check abort flag 28 Gosub 5 43 0 00 reset accumulator 29 If 2 31 0 00 30 Column 3 1 1 00 r[1]= 31 Integer 1 2 0 00 r[2]=1; indicate data in accumulator 32 SorterNext 1 21 0 00 33 Gosub 4 37 0 00 output final row 34 Goto 0 46 0 00 35 Integer 1 3 0 00 r[3]=1; set abort flag 36 Return 4 0 0 00 37 IfPos 2 39 0 00 if r[2]>0 then r[2]-=0, goto 39; Groupby result generator entry point 38 Return 4 0 0 00 39 Copy 1 15 0 00 r[15]=r[1] 40 Function0 6 15 14 substr(3) 03 r[14]=func(r[15..17]) 41 ResultRow 14 1 0 00 output=r[14] 42 Return 4 0 0 00 end groupby result generator 43 Null 0 1 1 00 r[1..1]=NULL 44 Integer 0 2 0 00 r[2]=0; indicate accumulator empty 45 Return 5 0 0 00 46 Halt 0 0 0 00 47 Transaction 0 0 518 0 01 usesStmtJournal=0 48 Integer 1 11 0 00 r[11]=1 49 Integer 1 12 0 00 r[12]=1 50 Integer 1 16 0 00 r[16]=1 51 Integer 1 17 0 00 r[17]=1 52 Goto 0 1 0 00 A B C D E F G I J K L M P R S T W sqlite> --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-----Original Message----- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Michael Herger >Sent: Friday, 8 February, 2019 02:13 >To: sqlite-users@mailinglists.sqlite.org >Subject: Re: [sqlite] GROUP BY and ICU collation > >Hi Keith, > >thanks for your response (which partly goes beyond my understanding, >but >I'm learning :-)). > >> But you are not using the same "expression" for selecting, sorting, >and grouping. That is, you need to specify: >> >> SELECT expression, count(distinct id) >> FROM artists >> GROUP BY expression >> ORDER BY expression; >> >> where expression is the expression that you want to use >> >> SELECT substr(name collate de_DE, 1, 1), count(distinct id) >> FROM artists >> GROUP BY substr(name collate de_DE, 1, 1) >> ORDER BY substr(name collate de_DE, 1, 1); >> >> If you do not do so then you cannot expect the "GROUP BY" to be >using the same ordering as the "ORDER BY" nor the select to be >returning the same value that was used to do the grouping and >sorting. > >Ok, tried that: > >SELECT SUBSTR(contributors.namesort COLLATE de_DE , 1, 1) >FROM contributors >GROUP BY SUBSTR(contributors.namesort COLLATE de_DE , 1, 1) >ORDER BY SUBSTR(contributors.namesort COLLATE de_DE , 1, 1) > >Resulted in: > >"A" >"C" >"D" >"P" >"R" >"S" >"Š" >"T" >"W" > >Whereas this: > >SELECT contributors.name, contributors.namesort COLLATE de_DE >FROM contributors >ORDER BY contributors.namesort COLLATE de_DE > >...resulted in this: > >... > ["Roddy Woomble & Band", "RODDY WOOMBLE BAND"], > ["Samuel Yirga", "SAMUEL YIRGA"], > ["Stephin Merritt", "STEPHIN MERRITT"], > ["Šuma Čovjek", "ŠUMA ČOVJEK"], > ["Syriana", "SYRIANA"], > ["Tom Griesgraber", "TOM GRIESGRABER"], >... > >So despite my using the same expression for the GROUP as for the >ORDER, >the grouping seems to ignore the collation, whereas the sorting alone >would not. > >I'm using the ICU extension (otherwise the sorting wouldn't work >either, >right?). Could that extension be responsible for this odd behaviour? > >I've put a small sample .db file in my dropbox: >https://www.dropbox.com/s/w4h6n3cyamt6yc0/library.db?dl=0 > > >> >> (This substitutes the collation NOCASE for the de_DE since I no >have a de_DE collation: >> >> sqlite> select substr(name collate nocase, 1, 1), count(distinct >id) >> ...> FROM artists >> ...> group by substr(name collate nocase, 1, 1) >> ...> order by substr(name collate nocase, 1, 1); >> QUERY PLAN >> |--SCAN TABLE artists (~1048576 rows) >> `--USE TEMP B-TREE FOR GROUP BY >> addr opcode p1 p2 p3 p4 p5 comment >> ---- ------------- ---- ---- ---- ------------- -- --------- >---- >> 0 Init 0 57 0 00 Start at >57 >> 1 Noop 1 4 0 00 >> 2 SorterOpen 3 3 0 k(1,NOCASE) 00 >> 3 Integer 0 5 0 00 r[5]=0; >clear abort flag >> 4 Null 0 8 8 00 >r[8..8]=NULL >> 5 Gosub 7 52 0 00 >> 6 OpenRead 0 3 0 2 00 root=3 >iDb=0; artists >> 7 ColumnsUsed 0 0 0 3 00 >> 8 Explain 8 0 0 SCAN TABLE artists (~1048576 >rows) 00 >> 9 Noop 0 0 0 00 Begin >WHERE-loop0: artists >> 10 Rewind 0 20 0 00 >> 11 Noop 0 0 0 00 Begin >WHERE-core >> 12 Column 0 1 13 00 >r[13]=artists.name >> 13 Function0 6 13 10 substr(3) 03 >r[10]=func(r[13..15]) >> 14 Column 0 1 11 00 >r[11]=artists.name >> 15 Column 0 0 12 00 >r[12]=artists.id >> 16 MakeRecord 10 3 16 00 >r[16]=mkrec(r[10..12]) >> 17 SorterInsert 3 16 0 00 >key=r[16] >> 18 Noop 0 0 0 00 End >WHERE-core >> 19 Next 0 11 0 01 >> 20 Noop 0 0 0 00 End >WHERE-loop0: artists >> 21 OpenPseudo 4 16 3 00 3 columns >in r[16] >> 22 SorterSort 3 56 0 00 GROUP BY >sort >> 23 SorterData 3 16 4 00 >r[16]=data >> 24 Column 4 0 9 00 r[9]= >> 25 Compare 8 9 1 k(1,NOCASE) 00 r[8] <- >> r[9] >> 26 Jump 27 31 27 00 >> 27 Move 9 8 1 00 >r[8]=r[9] >> 28 Gosub 6 44 0 00 output >one row >> 29 IfPos 5 56 0 00 if >r[5]>0 then r[5]-=0, goto 56; check abort flag >> 30 Gosub 7 52 0 00 reset >accumulator >> 31 Column 4 2 17 00 r[17]= >> 32 Found 2 36 17 1 00 >key=r[17] >> 33 MakeRecord 17 1 18 00 >r[18]=mkrec(r[17]) >> 34 IdxInsert 2 18 17 1 10 >key=r[18] >> 35 AggStep 0 17 2 count(1) 01 >accum=r[2] step(r[17]) >> 36 If 4 38 0 00 >> 37 Column 4 1 1 00 r[1]= >> 38 Integer 1 4 0 00 r[4]=1; >indicate data in accumulator >> 39 SorterNext 3 23 0 00 >> 40 Gosub 6 44 0 00 output >final row >> 41 Goto 0 56 0 00 >> 42 Integer 1 5 0 00 r[5]=1; >set abort flag >> 43 Return 6 0 0 00 >> 44 IfPos 4 46 0 00 if r[4]>0 >then r[4]-=0, goto 46; Groupby result generator entry point >> 45 Return 6 0 0 00 >> 46 AggFinal 2 1 0 count(1) 00 >accum=r[2] N=1 >> 47 Copy 1 21 0 00 >r[21]=r[1] >> 48 Function0 6 21 19 substr(3) 03 >r[19]=func(r[21..23]) >> 49 Copy 2 20 0 00 >r[20]=r[2] >> 50 ResultRow 19 2 0 00 >output=r[19..20] >> 51 Return 6 0 0 00 end >groupby result generator >> 52 Null 0 1 3 00 >r[1..3]=NULL >> 53 OpenEphemeral 2 0 0 k(1,B) 00 nColumn=0 >> 54 Integer 0 4 0 00 r[4]=0; >indicate accumulator empty >> 55 Return 7 0 0 00 >> 56 Halt 0 0 0 00 >> 57 Transaction 0 0 2 0 01 >usesStmtJournal=0 >> 58 Integer 1 14 0 00 r[14]=1 >> 59 Integer 1 15 0 00 r[15]=1 >> 60 Integer 1 22 0 00 r[22]=1 >> 61 Integer 1 23 0 00 r[23]=1 >> 62 Goto 0 1 0 00 >> sqlite> >> >>> >>> Michael >>> >>> >>>> >>>> That is >>>> >>>> select name collate nocase, count(distinct id) from x group by >name >>> collate nocase order by name collate nocase >>>> >>>> whill produce cased output not the value that was used for the >>> sorting. >>>> >>>> >>>> select lower(name collate nocase), count(distinct id) from x >group >>> by name collate nocase order by name collate nocase; >>>> >>>> to transmorgificate name into a "caseless" representation. So >you >>> would need to do something like this: >>>> >>>> select de_DE(substr(name collate de_DE,1,1)), count(distinct id) >>>from artists >>>> group by substr(name collate de_DE,1,1) >>>> order by by substr(name collate de_DE,1,1) >>>> >>>> and the function de_DE would have to transmorgificate its value >to >>> the result you want to see. >>>> >>>> --- >>>> The fact that there's a Highway to Hell but only a Stairway to >>> Heaven says a lot about anticipated traffic volume. >>>> >>>>> -----Original Message----- >>>>> From: sqlite-users [mailto:sqlite-users- >>>>> boun...@mailinglists.sqlite.org] On Behalf Of li...@herger.net >>>>> Sent: Thursday, 7 February, 2019 05:12 >>>>> To: sqlite-users@mailinglists.sqlite.org >>>>> Subject: [sqlite] GROUP BY and ICU collation >>>>> >>>>>>> Hi there, >>>>>>> >>>>>>> I'm trying to create a list with an index list. Eg. I have >>>>> artists: >>>>>>> >>>>>>> Sting >>>>>>> Šuma Čovjek >>>>>>> Suzanne Vega >>>>>>> >>>>>>> That's the sort order I'd get using an ICU collation. "Šuma >>>>> Čovjek" >>>>>>> would be sorted as "Suma..." as expected. >>>>>>> >>>>>>> Now I'd like to create an index bar by providing groups of the >>>>> first >>>>>>> character: >>>>>>> >>>>>>> SELECT SUBSTR(name,1,1), count(distinct id) FROM artists GROUP >>> BY >>>>>>> SUBSTR(name,1,1) ORDER BY name COLLATE de_DE>> >>>>>> Aren't you missing a COLLATE clause after the GROUP BY term? >>>>>> >>>>>> ... GROUP BY SUBSTR(name,1,1) COLLATE de_DE ... >>>>> >>>>> TBH: I didn't even know about this. I thought the COLLATE at the >>> end >>>>> of >>>>> the statement would do it for all. >>>>> >>>>> Alas, tried again to no avail. No matter whether I add it after >>> the >>>>> GROUP BY or not, the result is the same. >>>>> >>>>> I should probably have added some version information: I'm using >>> the >>>>> Perl DBD::SQLite 1.58 (sqlite 3.22.0). I didn't see any mention >of >>>>> related changes in the changelog for SQLite. What would be the >>>>> easiest >>>>> (and most reliable) way to try to reproduce this without Perl? >Is >>>>> there >>>>> a HowTo use collations with the CLI sqlite? >>>>> >>>>> -- >>>>> >>>>> Michael >> >> --- >> The fact that there's a Highway to Hell but only a Stairway to >Heaven says a lot about anticipated traffic volume. >> >> >> >> >> _______________________________________________ >> sqlite-users mailing list >> sqlite-users@mailinglists.sqlite.org >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite- >users >> >_______________________________________________ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users