Hi Paul, and thank you for your reply.
The trouble I have is that in my query, all the keywords don't necessarily
have to be present in order for a successful match to be made. SqLite's fts
only seems to match if all the keywords are present, which I don't require.
I am not familiar with Perl, but am working exclusively in C++.
The input I am processing is arbitrary, and so is the data that I am
searching through in the index. The incoming data is user messages, and the
index contains old messages that the robot has given to users (stemmed and
stripped in various ways to make matches more probable), and then there's
another column which contains an appropriate answer if that query is
matched. I want it to match as many keywords as possible but not necessarily
all, and order by:
1. How many keywords were matched, with some minimum threshold below which
no match is made.
2. How well the ordering matched.
Do you have any tips?
Kind regards,
Philip Bennefall
----- Original Message -----
From: <pc...@sympatico.ca>
To: <sqlite-users@sqlite.org>
Sent: Thursday, June 14, 2012 7:01 PM
Subject: [sqlite] Full text search without full phrase matches
I had to implement something like this for comparing passages from statutes
(see the Introduction in Douglas Hay and Paul Craven, *Masters, Servants and
Magistrates in Britain and the Empire, 1562-1955* [UNCP Press, 2004] for an
illustration).
You need to isolate the keywords, in whatever order, count them, and measure
the distances (number of words) between them. SqLite is great for managing
the tables of keywords, the lists of texts that contain them, and tables of
distances. But it is not the optimal tool for breaking down the texts and
extracting the keywords and distances. I used Perl for this job, and found
that I could easily adapt recipes from the Perl Cookbook and similar
repositories to build my routines. I wrote the disaggregated lists of
keywords, distances and texts as sql tables and analysed them in SqLite.
Paul Craven
York University
----------------------------------
Date: Wed, 13 Jun 2012 23:09:35 +0200
From: Philip Bennefall <phi...@blastbay.com>
To: <sqlite-users@sqlite.org>
Subject: [sqlite] Full text search without full phrase matches
Message-ID: <A12309DB130E42BBA0590D664F66922A@chicken>
Content-Type: text/plain; charset="iso-8859-1"
Hi all,
I am new to this maling list and to SqLite, so I wanted to start by thanking
all of those who make this project a reality. It is a great tool.
Now, to my question. I am trying to use the full text search feature to find
rough matches for a chat robot. Basically I want to match as many keywords
as possible, but not necessarily all of them. The results should be sorted
based on how many keywords were found in the phrase and how closely ordered
they are to the query. In other words the ordering doesn't have to be exact,
but the closer it is, the higher the result should rank. Similarly, even if
only one or two words in the phrase are found it should match, but rank
higher the more of the words that are present. I have read the reference and
I see the NEAR statement and the matchinfo function, as well as the example
of how to use it, but I cannot figure out how to apply this knowledge to my
specific problem. Does anyone have any suggestions?
Thanks in advance for your help.
Kind regards,
Philip Bennefall
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users