On Fri, Apr 16, 2010 at 3:24 AM, Alexey Pechnikov
<pechni...@mobigroup.ru> wrote:
> And you can use my patches for zlib-compression for FTS3. I'm planning to make
> the "fts3z" extension because I want to use as original FTS3
> as FTS3 with compression together.

Back when I was working up fts1, I experimented with compression and
found it useful, but ran up against the problem of SQLite itself not
having inbuilt support for compression.  Bummer!

Anyhow, having a distinct fts3z for compression would be sub-optimal,
I think, because it would fall behind.  Maybe you could implement it
as a compile-time option to fts3.c which allows it to export both fts3
and fts3z?

Anyhow, you may also wish to experiment with how intrusive it would be
to add externally-specified processing functions to the virtual table.
 I'd imagine something like:

   CREATE VIRTUAL TABLE t USING fts3(STORE FUNCTION compress, RETRIEVE
FUNCTION uncompress, title, body);

the table would not be accessible if you tried to load it on a SQLite
which didn't have the uncompress function, but that should quickly
become obvious when you look at the schema.  Another option would be
like how REGEXP works:

   CREATE VIRTUAL TABLE t USING fts3(COMPRESSED, title, body);

when COMPRESSED is specified, the select and update queries would
include fts3_compress() and fts3_uncompress() calls.  If the SQLite
embedder has not defined those functions, then errors will be
generated.

I have no veto, here, but my preference would be the first version,
where the specific functions are listed.  The second version is easier
to code, but it means that distinct implementations could find
themselves unable to read each other's tables because they define
fts3_*compress() differently.  The first version _could_ have that
problem, but at least allows for the possibility of not having it.

Hmm.  You could also define the function to take a flag to control
compress/uncompress:

  CREATE VIRTUAL TABLE t USING fts3(STORE WITH storefn, title, body);

where storefn(0, original) and storefn(1, compressed), or something like that.

-----

Of course, here I'm ignoring the entire problem of separate
compressors for the document data versus the index data, or separate
compressors for different columns.  I could imagine:

   CREATE VIRTUAL TABLE t USING fts3(title, body STORE WITH storefn);

but at some point it just gets too hard to hold everything together.
There's no per-column tokenizer, either :-).  That level of
configurability would probably be better served by refactoring fts to
allow the index and data to be distinct.  Then you could perhaps layer
an fts index over a table with views and triggers to accomplish
compression.

[Note that the "STORE WITH" variant above could also be a route to this:
  storefn(table_name, column_name, in_out, data)
then storefn() could do the conversion from "t" to "t_contents" and
build the queries.  I think performance might end up contrary to the
goals of using compression, though :-).]

Moving on,
scott
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to