On Fri, 2010-04-23 at 09:17 +0100, Martyn Russell wrote:
> Thanks Aleksander.
>
> I think it makes sense to fix this. Just to be clear, does this mean we
> don't need Pango in libtracker-fts/tracker-parser.c to determine word
> breaks for CJK?
Thats not broken so would not recommend trying to
Hi Martyn,
>
> I think it makes sense to fix this. Just to be clear, does this mean we
> don't need Pango in libtracker-fts/tracker-parser.c to determine word
> breaks for CJK?
>
Well, of course not sure about this. I understand the need of
word-breaking in libtracker-fts, but I could also un
On 22/04/10 17:34, Aleksander Morgado wrote:
Hi all!
Hi,
Word breaks:
When text content is extracted from several doc types (msoffice, oasis,
pdf...), a simple word break algorithm is used, basically looking for
letters. This algorithm is far from perfect, as it doesn't follow the
common rul
Hi Jamie,
>
> word break detection is done in
> http://git.gnome.org/browse/tracker/tree/src/libtracker-fts/tracker-parser.c
>
> THis is highly optimised and does checks for Plain ASCII/Latin/CJK
> encodings to determine which word breaking algorithm to use
>
> For CJK we always use pango to wo