Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)

2010-05-05 Thread Jamie McCracken
On Wed, 2010-05-05 at 19:56 +0100, Martyn Russell wrote: > On Tue, 2010-05-04 at 20:11 +0200, Aleksander Morgado wrote: > > Hi all again, > > Hi, > > > > > > > I've been playing with substituting the two word break algorithms in > > > libtracker-fts (custom for non-CJK and pango-based for CJK) w

Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)

2010-05-05 Thread Martyn Russell
On Tue, 2010-05-04 at 20:11 +0200, Aleksander Morgado wrote: > Hi all again, Hi, > > > > I've been playing with substituting the two word break algorithms in > > libtracker-fts (custom for non-CJK and pango-based for CJK) with a > > single one using GNU libunistring (LGPLv3). Note that libicu (I

Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)

2010-05-05 Thread Jamie McCracken
On Wed, 2010-05-05 at 18:39 +0200, Aleksander Morgado wrote: > > > So, with this improvement considering ASCII-only words a special case, > > > libunistring really beats them all. > > > > > > > yeah libunistring looks like good stuff - I must check the source! > > > > I still note you need to a

Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)

2010-05-05 Thread Aleksander Morgado
> > So, with this improvement considering ASCII-only words a special case, > > libunistring really beats them all. > > > > yeah libunistring looks like good stuff - I must check the source! > > I still note you need to apply word filtering rules on words beginning > with numbers or symbols - I

Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)

2010-05-05 Thread Jamie McCracken
On Wed, 2010-05-05 at 12:53 +0200, Aleksander Morgado wrote: > Hi Jamie & all, > > > > > I will modify the libunistring and libicu based algorithms tomorrow so > > that if ASCII-7 only, normalization and casefolding is not done, just a > > tolower() of each character. That would make the values m

Re: [Tracker] [PATCH] TST improvements, ...

2010-05-05 Thread Tshepang Lekhonkhobe
> On Tue, May 4, 2010 at 12:29, Martyn Russell wrote: >>> 38 - some clean-ups and consistency fixes (metadata tile) >> >> You use '%s' for URIs, please don't we use \"%s\" everywhere else in the >> code base and also it breaks for URIs which use ' in their name. >> >> We usually escape strings too

Re: [Tracker] [PATCH] TST improvements, ...

2010-05-05 Thread Tshepang Lekhonkhobe
On Wed, May 5, 2010 at 13:10, Martyn Russell wrote: > On Wed, 2010-05-05 at 12:29 +0200, Tshepang Lekhonkhobe wrote: >> On Tue, May 4, 2010 at 12:29, Martyn Russell wrote: >> >> 38 - some clean-ups and consistency fixes (metadata tile) >> > >> > You use '%s' for URIs, please don't we use \"%s\" e

Re: [Tracker] [PATCH] misc

2010-05-05 Thread Martyn Russell
On Wed, 2010-05-05 at 12:40 +0200, Tshepang Lekhonkhobe wrote: > Hi, find attached some small fixes > > 002 removes "%u" which is deprecated Committed, thanks. > 003 is a typo fix Committed, thanks. -- Regards, Martyn ___ tracker-list mailing list

Re: [Tracker] [PATCH] TST improvements, ...

2010-05-05 Thread Martyn Russell
On Wed, 2010-05-05 at 12:29 +0200, Tshepang Lekhonkhobe wrote: > On Tue, May 4, 2010 at 12:29, Martyn Russell wrote: > >> 38 - some clean-ups and consistency fixes (metadata tile) > > > > You use '%s' for URIs, please don't we use \"%s\" everywhere else in the > > code base and also it breaks for

Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)

2010-05-05 Thread Aleksander Morgado
Hi Jamie & all, > > I will modify the libunistring and libicu based algorithms tomorrow so > that if ASCII-7 only, normalization and casefolding is not done, just a > tolower() of each character. That would make the values more approximate > to the glib/custom parser. > Just finished the ASCII-

[Tracker] [PATCH] misc

2010-05-05 Thread Tshepang Lekhonkhobe
Hi, find attached some small fixes 002 removes "%u" which is deprecated 003 is a typo fix -- my place on the web: floss-and-misc.blogspot.com From be150f26d514002e0c06ae3b961561f6286cb482 Mon Sep 17 00:00:00 2001 From: Tshepang Lekhonkhobe Date: Wed, 5 May 2010 02:37:53 +0200 Subject: [PATCH 2

Re: [Tracker] [PATCH] TST improvements, ...

2010-05-05 Thread Tshepang Lekhonkhobe
On Tue, May 4, 2010 at 12:29, Martyn Russell wrote: >> 38 - some clean-ups and consistency fixes (metadata tile) > > You use '%s' for URIs, please don't we use \"%s\" everywhere else in the > code base and also it breaks for URIs which use ' in their name. > > We usually escape strings too before