On Sat, Jun 5, 2010 at 10:29 AM, Han-Teng Liao <hant...@gmail.com> wrote:
>    I intend to use my existing datasets stored in sqlite3 database for some
> linguistic analysis for Chinese language. After I have successfully
> installed and run the FTS3 Extension and ICU Extension, I am curious whether
> it is theoretically possible to generate the tf-idf matrix from the FTS3
> table?  If so, please do not hesitate to point me to the rough direction
> that I should take.

For each term the index encodes the documents containing the term and
the hits within the document, so it is certainly possible.  You might
look at adapting the testing function dumpDoclistFunc() in fts3.c.
The performance would be about the same as doing a search on that
term.  You wouldn't want to do this in an attempt to optimize queries,
because it already would touch all of the term's data.  But it might
make sense as a ranking input.  If you're working with a static data
set, use optimize() to consolidate the indices once you've loaded the
data.

-scott
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to