Hi all,

Sorry for being late to the party, finally got some time get my hands on moving 
from FB2.5 to FB3.

I'm looking at creating collations in order to get WHERE and ORDER BY working 
as my needs, preferably also handling of numeric stored as text sorted in 
numeric order.
Still haven't decided yet if I should go for ISO8859_1 or for UTF8 though, 
weighing pros and cons regarding features and impact of bytes.

Living in Sweden I think most of you know that I have to deal with åäö.

However, I have found some quirks that currently prevent my progress.

First I have downloaded the snapshot for FB 3.0.3 for Windows 
(Firebird-3.0.3.32802-0_x64).
In order to get UNICODE collation with Swedish to work I went to 
http://site.icu-project.org/download and downloaded icu4c-52_1-Win64-msvc10.zip 
from where I got the icu*52.dll's and extracted them to FB3. (After I first had 
deleted all icu*.* files in FB3).
No changes done in intl\fbintl.conf or anything else, is it really that simple?

Then I created a database, created some collations and a table with some data:

SET NAMES UTF8;
SET SQL DIALECT 3;

CREATE DATABASE 'C:\Data\Test\Collation_CI.fdb'
USER 'SYSDBA' PASSWORD 'masterkey'
PAGE_SIZE 16384
DEFAULT CHARACTER SET UTF8;

SHOW VERSION;
ISQL Version: WI-V3.0.3.32802 Firebird 3.0
Server version:
Firebird/Windows/AMD/Intel/x64 (access method), version "WI-V3.0.3.32802 
Firebird 3.0"
on disk structure version 12.0

CREATE COLLATION UTF_SV_CI
       FOR UTF8
       FROM UNICODE
       CASE INSENSITIVE
      'LOCALE=sv_SE';

CREATE COLLATION UTF_SV_CI_NUM
       FOR UTF8
       FROM UNICODE
       CASE INSENSITIVE
      'LOCALE=sv_SE;NUMERIC-SORT=1';

CREATE COLLATION ISO_SV_CI
       FOR ISO8859_1
       FROM SV_SV
       CASE INSENSITIVE;

COMMIT;

CREATE TABLE TABLE_T (
    FIELD_UTF_SV_CI     VARCHAR(5) CHARACTER SET UTF8      COLLATE UTF_SV_CI,
    FIELD_UTF_SV_CI_NUM VARCHAR(5) CHARACTER SET UTF8      COLLATE 
UTF_SV_CI_NUM,
    FIELD_ISO_SV_CI     VARCHAR(5) CHARACTER SET ISO8859_1 COLLATE ISO_SV_CI
);

COMMIT;

INSERT INTO TABLE_T VALUES ('1',  '1',  '1' );
INSERT INTO TABLE_T VALUES ('2',  '2',  '2' );
INSERT INTO TABLE_T VALUES ('10', '10', '10');
INSERT INTO TABLE_T VALUES ('20', '20', '20');
INSERT INTO TABLE_T VALUES ('a',  'a',  'a' );
INSERT INTO TABLE_T VALUES ('AA', 'AA', 'AA');
INSERT INTO TABLE_T VALUES ('aa', 'aa', 'aa');
INSERT INTO TABLE_T VALUES ('Aa', 'Aa', 'Aa');
INSERT INTO TABLE_T VALUES ('A',  'A',  'A' );
INSERT INTO TABLE_T VALUES ('b',  'b',  'b' );
INSERT INTO TABLE_T VALUES ('B',  'B',  'B' );
INSERT INTO TABLE_T VALUES ('o',  'o',  'o' );
INSERT INTO TABLE_T VALUES ('O',  'O',  'O' );
INSERT INTO TABLE_T VALUES ('x',  'x',  'x' );
INSERT INTO TABLE_T VALUES ('X',  'X',  'X' );
INSERT INTO TABLE_T VALUES ('å',  'å',  'å' );
INSERT INTO TABLE_T VALUES ('Å',  'Å',  'Å' );
INSERT INTO TABLE_T VALUES ('ä',  'ä',  'ä' );
INSERT INTO TABLE_T VALUES ('Ä',  'Ä',  'Ä' );
INSERT INTO TABLE_T VALUES ('ö',  'ö',  'ö' );
INSERT INTO TABLE_T VALUES ('Ö',  'Ö',  'Ö' );

COMMIT;


Now some tests.


SELECT FIELD_UTF_SV_CI
FROM   TABLE_T
WHERE  FIELD_UTF_SV_CI BETWEEN '' AND 'b'
ORDER BY 1;

FIELD_UTF_SV_CI
===============
1
10
2
20
a
A
aa
Aa
AA
b
B
===================
Result as expected


SELECT FIELD_UTF_SV_CI_NUM
FROM   TABLE_T
WHERE  FIELD_UTF_SV_CI_NUM BETWEEN '' AND 'b'
ORDER BY 1;

FIELD_UTF_SV_CI_NUM
===================
1
2
10
20
a
A
aa
Aa
AA
b
B
===================
Result as expected


SELECT FIELD_ISO_SV_CI
FROM   TABLE_T
WHERE  FIELD_ISO_SV_CI BETWEEN '' AND 'b'
ORDER BY 1;

FIELD_ISO_SV_CI
===============
1
10
2
20
A
a
Ä
Å
Ö
ä
å
ö
AA
Aa
aa
B
b
===================
Not the expected result.
Both wrong order and records that should not be there.


Let's add some index:

COMMIT;

CREATE INDEX IDX_UTF_SV_CI     ON TABLE_T (FIELD_UTF_SV_CI);
CREATE INDEX IDX_UTF_SV_CI_NUM ON TABLE_T (FIELD_UTF_SV_CI_NUM);
CREATE INDEX IDX_ISO_SV_CI     ON TABLE_T (FIELD_ISO_SV_CI);

COMMIT;


And repeat the queries from before:


SELECT FIELD_UTF_SV_CI
FROM   TABLE_T
WHERE  FIELD_UTF_SV_CI BETWEEN '' AND 'b'
ORDER BY 1;

FIELD_UTF_SV_CI
===============
1
10
2
20
a
A
aa
Aa
AA
b
B
===================
Result as expected


SELECT FIELD_UTF_SV_CI_NUM
FROM   TABLE_T
WHERE  FIELD_UTF_SV_CI_NUM BETWEEN '' AND 'b'
ORDER BY 1;

FIELD_UTF_SV_CI_NUM
===================
1
2
10
20
a
A
aa
Aa
AA
b
===================
Not the expected result.
Where did the 'B' go?


SELECT FIELD_ISO_SV_CI
FROM   TABLE_T
WHERE  FIELD_ISO_SV_CI BETWEEN '' AND 'b'
ORDER BY 1;

FIELD_ISO_SV_CI
===============
1
10
2
20
a
A
å
Å
ä
Ä
ö
Ö
AA
aa
Aa
b
B
===================
Not the expected result.
Both wrong order and records that should not be there.
Although the order differs slightly from the same query without index.


I have done the same tests with the release version of FB 3.0.2 
(Firebird-3.0.2.32703-0_x64) as well as tested with icu 53 and 57 for them both 
with the same result, no differences there.

Using the original icu files that comes with Firebird-3.0.2.32703-0_x64 and 
Firebird-3.0.3.32802-0_x64 and doing the same tests for FIELD_ISO_SV_CI gives 
the same result, so no difference there if the icu library is replaced or not.

I have also done some test with ACCENT INSENSITIVE but I will leave that beast 
for now.

Can anyone shed some light on this or point me to what I may be missing?

Thanks for reading,
Magnus Johansson
  • [firebird-... 'Magnus Johansson (nimajo)' mag...@nimajo.se [firebird-support]
    • [fire... danymar...@gmail.com [firebird-support]
      • R... Magnus Johansson mag...@nimajo.se [firebird-support]

Reply via email to