It sounds like you're still trying to push your index outside the virtual table.
My suggestion was to have your virtual table t internally create a regular SQLite table t_aux (or something of the sort) which replicates the data you want to index from your flat file, and puts an index on it. Then your virtual table xBestIndex implementation can use that table to find things more quickly, if the inputs are appropriate. It will be easiest if you just replicate the data to be indexed in a regular SQLite table that the virtual table owns, and use regular SQLite calls to set the index up. The only way you can currently have an index without the data in SQLite is to implement it yourself, which could be painful. It sounds like the size of the data you need in your index is much much smaller than the size of the entire dataset, so it may not be worth the effort to do more than use regular SQLite tables and indices in your virtual-table implementation. [Put another way, implement it and see how it works!] --- In terms of proving out whether the system can work, it may be easier to take the virtual table implementation you currently have, and build the INDEX1_SORT type of table you propose, then do joins. This is a bit clunky, but if this is not fast enough, or is using too much space or something, then it's likely that an implementation pushed into the virtual table will _also_ not be fast enough or will use too much space, or whatever the problem is. [Virtual tables aren't magic! They just let you rearrange things.] --- So far as optimal... virtual tables (and SQLite in general) are just tools for solving problems. I think you need to be determining if they can be used to create a good-enough solution to your problem, rather than an optimal solution. -scott On Sat, May 17, 2008 at 7:41 AM, Aladdin Lampé <[EMAIL PROTECTED]> wrote: > > Hi! Here is what I'm still trying to achieve: > - I have a custom file format having "records" and file offsets. > - Each record in that custom file format has the same number of fields, but > the records itself are variable length, that's why I need a file offset to > quickly locate a record. One other way (if you can wait for a very long > time...) is to walk sequentially the records list to get the desired record. > - I've implemented a working SQLite "virtual table" in order to be able to > read and query my custom file format through SQLite. > - Now, basically what I'd like would be to "CREATE INDEX" on a field of my > virtual table to take advantage of it in the xBestIndex callback. But the > documentation says that we cannot use "CREATE INDEX" on virtual tables. > > Let's say the data in the field "F1" of my virtual table "VFILE", and the > file offsets are the following: > F1 fileoffset > ---------------------- > a 10 > b 21 > z 34 > x 45 > a 51 > x 69 > z 73 > a 88 > x 94 > > I want to index the column F1, to be able to have a quick response to queries > like: > select * from VFILE where F1='x' > > At this point, I think I have only 3 possible strategies: > 1. Use SQLite tables to "fake" a standard index using SQLite tables > 2. Use internal SQLite B-Tree routines to implement my index > (sqlite3BtreeCreateTable and stuff) > 3. Implement my own B-Tree and sort algorithms to achieve this, externally to > SQLite > > Strategy 3 is precisely what I'm trying to avoid (too much work and testing > :-) ). > Strategy 2 is strongly discouraged by DRH. > > Then strategy 1 seems to be (like you've just said) the only way to go: > > a) Duplicate the data to be indexed (and the file offsets to use) > > create table INDEX1_SORT as select F1, fileoffset from VFILE order by F1, > fileoffset > > INDEX1_SORT: physical table > F1 fileoffset > ---------------------- > a 10 > a 51 > a 88 > b 21 > x 45 > x 69 > x 94 > z 34 > z 73 > > b) Create an index on that data > > OPTION 1: Use SQLite CREATE INDEX at this point. > b.1.1) create index on INDEX1_SORT(F1) > > OPTION 2: Fake index with custom tables > b.2.1) create table INDEX2_SUM as select F1, min(INDEX1_SORT.rowid) as > 'minrow', max(INDEX1_SORT.rowid) as 'maxrow' from INDEX1_SORT group by F1 > > INDEX2_SUM: physical table > F1 minrow maxrow > --------------------------------------- > a 1 3 > b 4 4 > x 5 7 > z 8 9 > > b.2.2) create index on INDEX_2_SUM(F1) > > > ********************************************************************* > > Usage for option 2: > - Use INDEX2_SUM to fetch the requested value in the initial query (select * > from VFILE where F1='x') > - Get data from table INDEX1_SORT between rowid "minrow" (5) and "maxrow" (7) > - For each line, use the given file offset to locate the real data in the > custom file format file. > - Read 3 records at fileoffet = 45,69,94 and return them to SQLite. > > I really feel like all this is not very optimal. > What is the best strategy to achieve optimal speed and needed storage? > Am I missing a trivial point? > > Thank you for any help on that! > Aladdin > >> Date: Mon, 12 May 2008 15:37:22 -0700 >> From: [EMAIL PROTECTED] >> To: [email protected] >> Subject: Re: [sqlite] Indexing virtual tables >> >> I'm not quite clear on your question - why wouldn't you just create >> any indices you need within the virtual-table implementation itself? >> Sort of like how fts uses SQLite tables to implement data-storage for >> the full-text index. >> >> -scott >> >> >> On Mon, May 5, 2008 at 10:13 AM, Aladdin Lampé wrote: >>> >>> Just thinking again about indexing strategies on virtual tables, I'm >>> wondering why virtual tables could not be indexed using the "normal" SQLite >>> command "INDEX". Indeed, I just expected that the data inside the column of >>> the virtual table could be sequentially scanned (using the "xColumn" >>> callback), producing the same result as if it were a real table. Is that >>> way of seeing things flawed? >>> >>> Any hook allowing to use SQLite internal indexing techniques for virtual >>> tables? Maybe using direct b-tree manipulation (even if I know it's not >>> recommended)? I'm not very keen on developing my own from stratch. Dealing >>> with "big" tables that don't fit into memory does not seem so easy because >>> I'll have to use a temporary disk file... >>> >>> Some help would be greatly appreciated! >>> Aladdin >>> >>> _________________________________________________________________ >>> Faites vous de nouveaux amis grâce à l'annuaire des profils Messenger ! >>> http://home.services.spaces.live.com/search/?page=searchresults&ss=true&FormId=AdvPeopleSearch&form=SPXFRM&tp=3&sc=2&pg=0&Search.DisplayName=Nom+public&search.gender=&search.age=&Search.FirstName=Pr%C3%A9nom&Search.LastName=Nom&search.location=Lieu&search.occupation=Profession&search.interests=amis&submit=Rechercher >>> _______________________________________________ >>> sqlite-users mailing list >>> [email protected] >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >> _______________________________________________ >> sqlite-users mailing list >> [email protected] >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > _________________________________________________________________ > Faites vous de nouveaux amis grâce à l'annuaire des profils Messenger ! > http://home.services.spaces.live.com/search/?page=searchresults&ss=true&FormId=AdvPeopleSearch&form=SPXFRM&tp=3&sc=2&pg=0&Search.DisplayName=Nom+public&search.gender=&search.age=&Search.FirstName=Pr%C3%A9nom&Search.LastName=Nom&search.location=Lieu&search.occupation=Profession&search.interests=amis&submit=Rechercher > _______________________________________________ > sqlite-users mailing list > [email protected] > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

