On Wed, 2010-05-05 at 12:53 +0200, Aleksander Morgado wrote: > Hi Jamie & all, > > > > > I will modify the libunistring and libicu based algorithms tomorrow so > > that if ASCII-7 only, normalization and casefolding is not done, just a > > tolower() of each character. That would make the values more approximate > > to the glib/custom parser. > > > > Just finished the ASCII-only improvement in both libunistring and > libicu, and here are the new results. This time instead of the mean > value of several tests, I took the minimum one. > > For the 50k ASCII-only file: > * glib/pango: 0.062 > * libicu: 0.060 > * libunistring: 0.057 > > For the 200k ASCII-only file: > * glib/pango: 0.189 > * libicu: 0.200 > * libunistring: 0.119 > > And for the 182k mixed english/chinese/japanese file: > * glib/pango: 21.4 > * libicu: 0.220 > * libunistring: 0.175 > > So, with this improvement considering ASCII-only words a special case, > libunistring really beats them all. > yeah libunistring looks like good stuff - I must check the source!
I still note you need to apply word filtering rules on words beginning with numbers or symbols - Im sure thats easy to do? thanks jamie _______________________________________________ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list