On Mon, Dec 07, 2009 at 05:35:49PM -0500, Igor Tandetnik wrote: > Alexey Pechnikov <pechni...@mobigroup.ru> > wrote: > > The normalization is now performed by any string operation. But more > > fast and useful to do it once at data store. > > So, which normalization form should the data store choose for me? And > what if I need a different one? > > I'd rather the database store my data exactly the way I put it in. I > really don't want it to decide for me what my data should look like.
I believe the right thing to do is to normalize strings when creating index entries, but to leave the table data unnormalized. You'd have to make the equality operator also normalize though. That way you can have a unique text column and it will accept ´ only one way, composed or decomposed, but not both. I.e., normalization-insensitive matching, normalization-preserving. Provides the best user experience. If multiple systems' input methods produce text in different normalization forms, or even unnormalized, users will still find their data -- no surprises. And given that whatever systems the users are using likely can display the strings produced by their input modes, preserving those strings unmodified gives you the highest likelihood that the strings returned will display properly. (This is what Solaris implements for NFSv4, CIFS and local ZFS filesystem access, for example. ZFS hashes directories, and it normalizes filenames prior to hashing, both on create and lookup, but the directory entries are left unnormalized.) To do this right requires support in SQLite3, even if it's provided by an extension. I don't recall if user-defined collation functions provide everything you need to support this. > >>> May be automatically dropping the BOM for > >>> ICU collated fields is more correct way. > >> > >> Why don't you do just that in your application? > > > > Yes, I fix it in my application, but this problem can be produced in > > any application. > > One person's problem is another's feature. If that other application > doesn't want BOM in its strings, it should strip it, just like yours > now does. +1 Nico -- _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users