Am 03.09.2010 13:27, schrieb Dan Kennedy:
>
> On Sep 2, 2010, at 6:37 PM, Lukas Haase wrote:
>
>> Hi,
>>
>> I use FTS3 (SQLITE_ENABLE_FTS3) with enhanced query syntax
>> (SQLITE_ENABLE_FTS3_PARENTHESIS).
>>
>> Now if I search for a string like '2002/91/AH' there are lots of items
>> which do NOT contain this string. This is a query:
>>
>> SELECT rowid, content FROM fulltext WHERE content MATCH '2002/91/AH';
>>
>> In my case, there are only 10 items which actually contain the string
>> '2002/91/AH' but the query above gives me 162 (!) matches!
>>
>> I can not find any reason for this. Some of the topics contain
>> "similar"
>> strings like 2002/96/AH or even 94/31/EG. But in fact, these strings
>> must not be matched :-(
>>
>> Does the slash have a special meaning in the query syntax? Does a
>> query
>> like 2002/91/AH have a special meaning?
>
> The '/' characters are serving as token separators. So
> you are searching for (2002 OR 91 OR ah). If you enclose
> the date in double quotes:
>
>     ... MATCH '"2002/91/AH"'
>
> you will be searching for the phrase "2002 91 ah", which
> is as close as you can get to what you want without writing
> a custom tokenizer:
>
>     http://www.sqlite.org/fts3.html#section_5_1

Oh great! Thank you for your hint! With double quotes it works as I 
would expect it.

In general writing a custom tokenizer would not be a problem BUT in my 
case it is complicated because the database (including the fulltext 
table) is created on a different machine with SQLites packages from a 
distributor (Debian stable).

The data is queried using a custom C++ application, so on client side I 
may alter the code.

Until now I use the simple tokenizer. At first I thought I can make it 
work if I just remove the slash '/' from the separator list. Therefore I 
added the following line to simpleCreate:

t->delim['/'] = 0;

Now, nothing was found with content MATCH '2002/91/AH'; and first I did 
not understand why.

After your explanation it is clear why: 2002, 91 and AH are separate 
tokens in my database and MATCH '"2002/91/AH'" would normally search for 
the tokens 2002, 91, AH next to each other. But after the patch above, 
2002/91/AH is a single token which is not found.

So my questions is: Is there any other way to achieve my goal?

Regards,
Luke


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to