Re[2]: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-15 Thread Ion Silvestru
Just for information: A full-text indexer based only on SQLite BTree index, not using tables: http://www.codeproject.com/useritems/Text_Indexer.asp - To unsubscribe, send email to [EMAIL PROTECTED] ---

Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-14 Thread Ralf Junker
Scott Hess wrote: >>I am optimistic that the proper implementation will use even less than 50%: > >Indeed :-). Glad to read this ;-) >>I found that _not_ adding the original text turned out to be a great time >>saver. This makes sense if we know that the original text is about 4 times >>the si

Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Scott Hess
On 3/13/07, Ralf Junker <[EMAIL PROTECTED]> wrote: Scott Hess wrote: >Keeping track of that information would probably double the >size of the index. With your estimate, the SQLite full text index (without document storage) would still take up only 50% of the documents' size. In my opinion, this

Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Ralf Junker
Hello Scott, I was hoping that you would read my message, many thanks for your reply! >UPDATE and DELETE need to have the previous document text, because the >docids are embedded in the index, and there is no docid->term index >(or, put another way, the previous document text _is_ the docid->term

Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Ralf Junker
Ion Silvestru wrote: >Just a question: did you eliminated stop-words in your tests? No, I did not eliminate any stop-words. The two test runs were equal except for the small changes in FTS 2. My stop words question was not intended for source code but for human language texts. Ralf --

Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Scott Hess
On 3/13/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Ion Silvestru <[EMAIL PROTECTED]> wrote: > To Ralf: > >As a side effect, the offsets() and snippet() functions stopped working, > >as they seem to rely on the presence of the full document text in the > >current implementation. > > Did you

Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread drh
Ion Silvestru <[EMAIL PROTECTED]> wrote: > To Ralf: > > >As a side effect, the offsets() and snippet() functions stopped working, as > >they seem to rely on the presence of the full document text in the current > >implementation. > > Did you tested "phrase" searching on the index-only version,

Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Ion Silvestru
To Ralf: >As a side effect, the offsets() and snippet() functions stopped working, as >they seem to rely on the presence of the full document text in the current >implementation. Did you tested "phrase" searching on the index-only version, didn't this kind of search rely on offsets()? ---

Re[2]: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Ion Silvestru
>Just a question: did you eliminated stop-words in your tests? Sorry, you specified that you indexed source code files, so no stop-words are applicable here. - To unsubscribe, send email to [EMAIL PROTECTED] ---

Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Ion Silvestru
Thank you. Just a question: did you eliminated stop-words in your tests? >Concluding: Given the great database size savings possible by separating full >text index from data storage, I wish that >developers would consider adding such an option to the SQLite FTS interface. If such an option wil

[sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Ralf Junker
>But what about: > >I am very interested to know if it would be possible to use an FTS indexing >module to store the inverted index only, but >not the document's text. This would safe disk space if the text to index is >stored on disk rather than inside the database. This is possible with just