On Sat, Jun 5, 2010 at 10:29 AM, Han-Teng Liao <hant...@gmail.com> wrote: > I intend to use my existing datasets stored in sqlite3 database for some > linguistic analysis for Chinese language. After I have successfully > installed and run the FTS3 Extension and ICU Extension, I am curious whether > it is theoretically possible to generate the tf-idf matrix from the FTS3 > table? If so, please do not hesitate to point me to the rough direction > that I should take.
For each term the index encodes the documents containing the term and the hits within the document, so it is certainly possible. You might look at adapting the testing function dumpDoclistFunc() in fts3.c. The performance would be about the same as doing a search on that term. You wouldn't want to do this in an attempt to optimize queries, because it already would touch all of the term's data. But it might make sense as a ranking input. If you're working with a static data set, use optimize() to consolidate the indices once you've loaded the data. -scott _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users