On Mon, Dec 07, 2009 at 05:35:49PM -0500, Igor Tandetnik wrote:
> Alexey Pechnikov <pechni...@mobigroup.ru>
> wrote: 
> > The normalization is now performed by any string operation. But more
> > fast and useful to do it once at data store. 
> 
> So, which normalization form should the data store choose for me? And
> what if I need a different one?
> 
> I'd rather the database store my data exactly the way I put it in. I
> really don't want it to decide for me what my data should look like.

I believe the right thing to do is to normalize strings when creating
index entries, but to leave the table data unnormalized.  You'd have to
make the equality operator also normalize though.

That way you can have a unique text column and it will accept &acute;
only one way, composed or decomposed, but not both.

I.e., normalization-insensitive matching, normalization-preserving.
Provides the best user experience.  If multiple systems' input methods
produce text in different normalization forms, or even unnormalized,
users will still find their data -- no surprises.  And given that
whatever systems the users are using likely can display the strings
produced by their input modes, preserving those strings unmodified gives
you the highest likelihood that the strings returned will display
properly.

(This is what Solaris implements for NFSv4, CIFS and local ZFS
filesystem access, for example.  ZFS hashes directories, and it
normalizes filenames prior to hashing, both on create and lookup, but
the directory entries are left unnormalized.)

To do this right requires support in SQLite3, even if it's provided by
an extension.  I don't recall if user-defined collation functions
provide everything you need to support this.

> >>> May be automatically dropping the BOM for
> >>> ICU collated fields is more correct way.
> >> 
> >> Why don't you do just that in your application?
> > 
> > Yes, I fix it in my application, but this problem can be produced in
> > any application. 
> 
> One person's problem is another's feature. If that other application
> doesn't want BOM in its strings, it should strip it, just like yours
> now does.

+1

Nico
-- 
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to