One further question: In fts3.c, a comment is found which describes the file format dependent on the different compiler settings. * Result formats differ with the setting of DL_DEFAULTS. Examples: ** ** DL_DOCIDS: [1] [3] [7] ** DL_POSITIONS: [1 0[0 4] 1[17]] [3 1[5]] ** DL_POSITIONS_OFFSETS: [1 0[0,0,3 4,23,26] 1[17,102,105]] [3 1[5,20,23]] I also found one functional limitation if we use only DL_DOCIDS, in order to reduce the overall size. /* ** By default, only positions and not offsets are stored in the doclists. ** To change this so that offsets are stored too, compile with ** ** -DDL_DEFAULT=DL_POSITIONS_OFFSETS ** ** If DL_DEFAULT is set to DL_DOCIDS, your table can only be inserted ** into (no deletes or updates). */
Are there any other functional drawbacks if we go for DOCIDS only, e.g. search for "term1 term2" in a document? Best Martin ________________________________ Von: D. Richard Hipp <d...@hwaci.com> An: General Discussion of SQLite Database <sqlite-users@sqlite.org> Gesendet: Dienstag, den 26. Mai 2009, 12:27:59 Uhr Betreff: Re: [sqlite] FTS3 On May 26, 2009, at 5:03 AM, Martin Pfeifle wrote: > Dear all, > we need full and fuzzy text search for addresses. > Currently we are looking into Lucene and SQLite's FTS extension. > For us it is crucial to understand the file structures and the > concepts behind the libraries. > Is there a self-contained, comprehensive document for FTS3 (besides > the comments in fts3.c) ? There is no information on FTS3 apart from the code comments and the README files in the source tree. The file formats for FTS3 and lucene are completely different at the byte level. But if you dig deeper, you will find that they both use the same underlying concepts and ideas and really are two different implementations of the same algorithm. During development, we were constantly testing the performance and index size of FTS3 against CLucene using the Enron email corpus. Our goal was for FTS3 to run significantly faster than CLucene and to generate an index that was no larger in size. That goal was easily met at the time, though we have not tested FTS3 against CLucene lately to see if anything has changed. One of the issues with CLucene that FTS3 sought to address was that when inserting new elements into the index, the insertion time was unpredictable. Usually the insertions would be very fast. But lucene will occasionally take a very long time for a single insertion in order to merge multiple smaller indices into larger indices. This was seen as undesirable. FTS3 strives to give much better worst-case insertion times by doing index merges incrementally and spreading the cost of index merges across many inserts. D. Richard Hipp d...@hwaci.com _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users