Re: [sqlite] Indexing virtual tables

Scott Hess Mon, 19 May 2008 13:23:28 -0700

It sounds like you're still trying to push your index outside the virtual table.


My suggestion was to have your virtual table t internally create a
regular SQLite table t_aux (or something of the sort) which replicates
the data you want to index from your flat file, and puts an index on
it.  Then your virtual table xBestIndex implementation can use that
table to find things more quickly, if the inputs are appropriate.

It will be easiest if you just replicate the data to be indexed in a
regular SQLite table that the virtual table owns, and use regular
SQLite calls to set the index up.  The only way you can currently have
an index without the data in SQLite is to implement it yourself, which
could be painful.  It sounds like the size of the data you need in
your index is much much smaller than the size of the entire dataset,
so it may not be worth the effort to do more than use regular SQLite
tables and indices in your virtual-table implementation.  [Put another
way, implement it and see how it works!]

---

In terms of proving out whether the system can work, it may be easier
to take the virtual table implementation you currently have, and build
the INDEX1_SORT type of table you propose, then do joins.  This is a
bit clunky, but if this is not fast enough, or is using too much space
or something, then it's likely that an implementation pushed into the
virtual table will _also_ not be fast enough or will use too much
space, or whatever the problem is.  [Virtual tables aren't magic!
They just let you rearrange things.]

---

So far as optimal... virtual tables (and SQLite in general) are just
tools for solving problems.  I think you need to be determining if
they can be used to create a good-enough solution to your problem,
rather than an optimal solution.

-scott


On Sat, May 17, 2008 at 7:41 AM, Aladdin Lampé <[EMAIL PROTECTED]> wrote:
>
> Hi! Here is what I'm still trying to achieve:
> - I have a custom file format having "records" and file offsets.
> - Each record in that custom file format has the same number of fields, but 
> the records itself are variable length, that's why I need a file offset to 
> quickly locate a record. One other way (if you can wait for a very long 
> time...) is to walk sequentially the records list to get the desired record.
> - I've implemented a working SQLite "virtual table" in order to be able to 
> read and query my custom file format through SQLite.
> - Now, basically what I'd like would be to "CREATE INDEX" on a field of my 
> virtual table to take advantage of it in the xBestIndex callback. But the 
> documentation says that we cannot use "CREATE INDEX" on virtual tables.
>
> Let's say the data in the field "F1" of my virtual table "VFILE", and the 
> file offsets are the following:
> F1              fileoffset
> ----------------------
> a               10
> b               21
> z               34
> x               45
> a               51
> x               69
> z               73
> a               88
> x               94
>
> I want to index the column F1, to be able to have a quick response to queries 
> like:
> select * from VFILE where F1='x'
>
> At this point, I think I have only 3 possible strategies:
> 1. Use SQLite tables to "fake" a standard index using SQLite tables
> 2. Use internal SQLite B-Tree routines to implement my index 
> (sqlite3BtreeCreateTable and stuff)
> 3. Implement my own B-Tree and sort algorithms to achieve this, externally to 
> SQLite
>
> Strategy 3 is precisely what I'm trying to avoid (too much work and testing 
> :-) ).
> Strategy 2 is strongly discouraged by DRH.
>
> Then strategy 1 seems to be (like you've just said) the only way to go:
>
> a) Duplicate the data to be indexed (and the file offsets to use)
>
> create table INDEX1_SORT as select F1, fileoffset from VFILE order by F1, 
> fileoffset
>
> INDEX1_SORT: physical table
> F1              fileoffset
> ----------------------
> a               10
> a               51
> a               88
> b               21
> x               45
> x               69
> x               94
> z               34
> z               73
>
> b) Create an index on that data
>
> OPTION 1: Use SQLite CREATE INDEX at this point.
> b.1.1) create index on INDEX1_SORT(F1)
>
> OPTION 2: Fake index with custom tables
> b.2.1) create table INDEX2_SUM as select F1, min(INDEX1_SORT.rowid) as 
> 'minrow', max(INDEX1_SORT.rowid) as 'maxrow' from INDEX1_SORT group by F1
>
> INDEX2_SUM: physical table
> F1         minrow          maxrow
> ---------------------------------------
> a           1                  3
> b           4                  4
> x           5                  7
> z           8                  9
>
> b.2.2) create index on INDEX_2_SUM(F1)
>
>
> *********************************************************************
>
> Usage for option 2:
> - Use INDEX2_SUM to fetch the requested value in the initial query (select * 
> from VFILE where F1='x')
> - Get data from table INDEX1_SORT between rowid "minrow" (5) and "maxrow" (7)
> - For each line, use the given file offset to locate the real data in the 
> custom file format file.
> - Read 3 records at fileoffet = 45,69,94 and return them to SQLite.
>
> I really feel like all this is not very optimal.
> What is the best strategy to achieve optimal speed and needed storage?
> Am I missing a trivial point?
>
> Thank you for any help on that!
> Aladdin
>
>> Date: Mon, 12 May 2008 15:37:22 -0700
>> From: [EMAIL PROTECTED]
>> To: [email protected]
>> Subject: Re: [sqlite] Indexing virtual tables
>>
>> I'm not quite clear on your question - why wouldn't you just create
>> any indices you need within the virtual-table implementation itself?
>> Sort of like how fts uses SQLite tables to implement data-storage for
>> the full-text index.
>>
>> -scott
>>
>>
>> On Mon, May 5, 2008 at 10:13 AM, Aladdin Lampé  wrote:
>>>
>>> Just thinking again about indexing strategies on virtual tables, I'm 
>>> wondering why virtual tables could not be indexed using the "normal" SQLite 
>>> command "INDEX". Indeed, I just expected that the data inside the column of 
>>> the virtual table could be sequentially scanned (using the "xColumn" 
>>> callback), producing the same result as if it were a real table. Is that 
>>> way of seeing things flawed?
>>>
>>> Any hook allowing to use SQLite internal indexing techniques for virtual 
>>> tables? Maybe using direct b-tree manipulation (even if I know it's not 
>>> recommended)? I'm not very keen on developing my own from stratch. Dealing 
>>> with "big" tables that don't fit into memory does not seem so easy because 
>>> I'll have to use a temporary disk file...
>>>
>>> Some help would be greatly appreciated!
>>> Aladdin
>>>
>>> _________________________________________________________________
>>> Faites vous de nouveaux amis grâce à l'annuaire des profils Messenger !
>>> http://home.services.spaces.live.com/search/?page=searchresults&ss=true&FormId=AdvPeopleSearch&form=SPXFRM&tp=3&sc=2&pg=0&Search.DisplayName=Nom+public&search.gender=&search.age=&Search.FirstName=Pr%C3%A9nom&Search.LastName=Nom&search.location=Lieu&search.occupation=Profession&search.interests=amis&submit=Rechercher
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [email protected]
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>
>> _______________________________________________
>> sqlite-users mailing list
>> [email protected]
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> _________________________________________________________________
> Faites vous de nouveaux amis grâce à l'annuaire des profils Messenger !
> http://home.services.spaces.live.com/search/?page=searchresults&ss=true&FormId=AdvPeopleSearch&form=SPXFRM&tp=3&sc=2&pg=0&Search.DisplayName=Nom+public&search.gender=&search.age=&Search.FirstName=Pr%C3%A9nom&Search.LastName=Nom&search.location=Lieu&search.occupation=Profession&search.interests=amis&submit=Rechercher
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Indexing virtual tables

Reply via email to