So 'S' might be before 'Š', but in words it might go
'Sam'
'Šam'
'Skunk'
'Škunk'
'Sudden'
'Šudden'
rather than a simple character-at-a-time order of
'Sam'
'Skunk'
'Sudden'
'Šam'
'Škunk'
'Šudden'
-----Original Message-----
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On
Behalf Of Michael Herger
Sent: Friday, February 08, 2019 4:13 AM
To: sqlite-users@mailinglists.sqlite.org
Subject: Re: [sqlite] GROUP BY and ICU collation
Hi Keith,
thanks for your response (which partly goes beyond my understanding, but
I'm learning :-)).
But you are not using the same "expression" for selecting, sorting, and
grouping. That is, you need to specify:
SELECT expression, count(distinct id)
FROM artists
GROUP BY expression
ORDER BY expression;
where expression is the expression that you want to use
SELECT substr(name collate de_DE, 1, 1), count(distinct id)
FROM artists
GROUP BY substr(name collate de_DE, 1, 1)
ORDER BY substr(name collate de_DE, 1, 1);
If you do not do so then you cannot expect the "GROUP BY" to be using the same ordering
as the "ORDER BY" nor the select to be returning the same value that was used to do the
grouping and sorting.
Ok, tried that:
SELECT SUBSTR(contributors.namesort COLLATE de_DE , 1, 1)
FROM contributors
GROUP BY SUBSTR(contributors.namesort COLLATE de_DE , 1, 1)
ORDER BY SUBSTR(contributors.namesort COLLATE de_DE , 1, 1)
Resulted in:
"A"
"C"
"D"
"P"
"R"
"S"
"Š"
"T"
"W"
Whereas this:
SELECT contributors.name, contributors.namesort COLLATE de_DE
FROM contributors
ORDER BY contributors.namesort COLLATE de_DE
...resulted in this:
...
["Roddy Woomble & Band", "RODDY WOOMBLE BAND"],
["Samuel Yirga", "SAMUEL YIRGA"],
["Stephin Merritt", "STEPHIN MERRITT"],
["Šuma Čovjek", "ŠUMA ČOVJEK"],
["Syriana", "SYRIANA"],
["Tom Griesgraber", "TOM GRIESGRABER"],
...
So despite my using the same expression for the GROUP as for the ORDER,
the grouping seems to ignore the collation, whereas the sorting alone
would not.
I'm using the ICU extension (otherwise the sorting wouldn't work either,
right?). Could that extension be responsible for this odd behaviour?
I've put a small sample .db file in my dropbox:
https://www.dropbox.com/s/w4h6n3cyamt6yc0/library.db?dl=0
(This substitutes the collation NOCASE for the de_DE since I no have a de_DE
collation:
sqlite> select substr(name collate nocase, 1, 1), count(distinct id)
...> FROM artists
...> group by substr(name collate nocase, 1, 1)
...> order by substr(name collate nocase, 1, 1);
QUERY PLAN
|--SCAN TABLE artists (~1048576 rows)
`--USE TEMP B-TREE FOR GROUP BY
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 57 0 00 Start at 57
1 Noop 1 4 0 00
2 SorterOpen 3 3 0 k(1,NOCASE) 00
3 Integer 0 5 0 00 r[5]=0; clear abort
flag
4 Null 0 8 8 00 r[8..8]=NULL
5 Gosub 7 52 0 00
6 OpenRead 0 3 0 2 00 root=3 iDb=0; artists
7 ColumnsUsed 0 0 0 3 00
8 Explain 8 0 0 SCAN TABLE artists (~1048576 rows) 00
9 Noop 0 0 0 00 Begin WHERE-loop0:
artists
10 Rewind 0 20 0 00
11 Noop 0 0 0 00 Begin WHERE-core
12 Column 0 1 13 00 r[13]=artists.name
13 Function0 6 13 10 substr(3) 03
r[10]=func(r[13..15])
14 Column 0 1 11 00 r[11]=artists.name
15 Column 0 0 12 00 r[12]=artists.id
16 MakeRecord 10 3 16 00
r[16]=mkrec(r[10..12])
17 SorterInsert 3 16 0 00 key=r[16]
18 Noop 0 0 0 00 End WHERE-core
19 Next 0 11 0 01
20 Noop 0 0 0 00 End WHERE-loop0:
artists
21 OpenPseudo 4 16 3 00 3 columns in r[16]
22 SorterSort 3 56 0 00 GROUP BY sort
23 SorterData 3 16 4 00 r[16]=data
24 Column 4 0 9 00 r[9]=
25 Compare 8 9 1 k(1,NOCASE) 00 r[8] <-> r[9]
26 Jump 27 31 27 00
27 Move 9 8 1 00 r[8]=r[9]
28 Gosub 6 44 0 00 output one row
29 IfPos 5 56 0 00 if r[5]>0 then
r[5]-=0, goto 56; check abort flag
30 Gosub 7 52 0 00 reset accumulator
31 Column 4 2 17 00 r[17]=
32 Found 2 36 17 1 00 key=r[17]
33 MakeRecord 17 1 18 00 r[18]=mkrec(r[17])
34 IdxInsert 2 18 17 1 10 key=r[18]
35 AggStep 0 17 2 count(1) 01 accum=r[2]
step(r[17])
36 If 4 38 0 00
37 Column 4 1 1 00 r[1]=
38 Integer 1 4 0 00 r[4]=1; indicate
data in accumulator
39 SorterNext 3 23 0 00
40 Gosub 6 44 0 00 output final row
41 Goto 0 56 0 00
42 Integer 1 5 0 00 r[5]=1; set abort flag
43 Return 6 0 0 00
44 IfPos 4 46 0 00 if r[4]>0 then
r[4]-=0, goto 46; Groupby result generator entry point
45 Return 6 0 0 00
46 AggFinal 2 1 0 count(1) 00 accum=r[2] N=1
47 Copy 1 21 0 00 r[21]=r[1]
48 Function0 6 21 19 substr(3) 03 r[19]=func(r[21..23])
49 Copy 2 20 0 00 r[20]=r[2]
50 ResultRow 19 2 0 00 output=r[19..20]
51 Return 6 0 0 00 end groupby result
generator
52 Null 0 1 3 00 r[1..3]=NULL
53 OpenEphemeral 2 0 0 k(1,B) 00 nColumn=0
54 Integer 0 4 0 00 r[4]=0; indicate
accumulator empty
55 Return 7 0 0 00
56 Halt 0 0 0 00
57 Transaction 0 0 2 0 01 usesStmtJournal=0
58 Integer 1 14 0 00 r[14]=1
59 Integer 1 15 0 00 r[15]=1
60 Integer 1 22 0 00 r[22]=1
61 Integer 1 23 0 00 r[23]=1
62 Goto 0 1 0 00
sqlite>
Michael
That is
select name collate nocase, count(distinct id) from x group by name
collate nocase order by name collate nocase
whill produce cased output not the value that was used for the
sorting.
select lower(name collate nocase), count(distinct id) from x group
by name collate nocase order by name collate nocase;
to transmorgificate name into a "caseless" representation. So you
would need to do something like this:
select de_DE(substr(name collate de_DE,1,1)), count(distinct id)
>from artists
group by substr(name collate de_DE,1,1)
order by by substr(name collate de_DE,1,1)
and the function de_DE would have to transmorgificate its value to
the result you want to see.
---
The fact that there's a Highway to Hell but only a Stairway to
Heaven says a lot about anticipated traffic volume.
-----Original Message-----
From: sqlite-users [mailto:sqlite-users-
boun...@mailinglists.sqlite.org] On Behalf Of li...@herger.net
Sent: Thursday, 7 February, 2019 05:12
To: sqlite-users@mailinglists.sqlite.org
Subject: [sqlite] GROUP BY and ICU collation
Hi there,
I'm trying to create a list with an index list. Eg. I have
artists:
Sting
Šuma Čovjek
Suzanne Vega
That's the sort order I'd get using an ICU collation. "Šuma
Čovjek"
would be sorted as "Suma..." as expected.
Now I'd like to create an index bar by providing groups of the
first
character:
SELECT SUBSTR(name,1,1), count(distinct id) FROM artists GROUP
BY
SUBSTR(name,1,1) ORDER BY name COLLATE de_DE>>
Aren't you missing a COLLATE clause after the GROUP BY term?
... GROUP BY SUBSTR(name,1,1) COLLATE de_DE ...
TBH: I didn't even know about this. I thought the COLLATE at the
end
of
the statement would do it for all.
Alas, tried again to no avail. No matter whether I add it after
the
GROUP BY or not, the result is the same.
I should probably have added some version information: I'm using
the
Perl DBD::SQLite 1.58 (sqlite 3.22.0). I didn't see any mention of
related changes in the changelog for SQLite. What would be the
easiest
(and most reliable) way to try to reproduce this without Perl? Is
there
a HowTo use collations with the CLI sqlite?
--
Michael
---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a
lot about anticipated traffic volume.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users