Folks;
 
I am building a system which uses the virtual tables facilityto join some
non-relational data (coming from system APIs) to relational data.  I am
using the virtual table mechanism in sqlite, and in particular am creating
'eponymous' virtual tables to create 'table valued functions'.
 
I have a problem that puzzles me regarding the appropriate implementation of
the xBestIndex method, in particular in dealing with constraints that are
required for the table.  In my case, it is required that there be a "where
PARAMCOL = 'xxx'" condition somewhere in the query, or the table valued
function makes no sense.  The problem arises where the sqlite engine asks
for advice during the query planning phase, and sometimes sqlite chooses to
take a plan where NO constraints are applied.  A concrete example may help
clarify:
 
Given:
  1)  vtable 'fsdirinfo' projecting columns PATH, FNAME
      'PATH' is a directory, and is required, because this virtual table
does a directory listing of files in a particular path, and you can't get a
directory listing without supplying the path.
  2)  vtable 'verinfo' projecting columns FQPATH, FILEVER, COMPANYNAME,
PRODUCT, COMMENTS
      'FQPATH' is required, because you can't open a file (to read it's
version information) without having a [fully qualified] path to that file.
 
Many times, for simple queries like "select * from fsdirinfo where path =
'c:\windows\system32';"
This works as expected:  xBestIndex is called, it is noticed that there is
an equality constraint on path (as required), and that column is marked
thusly:
 
  pIdxInfo->aConstraintUsage[nIdx].argvIndex = nArgvIndex;
  pIdxInfo->aConstraintUsage[nIdx].omit = 1; //meaning 'sqlite does not need
to help filter more'

(please verify my interpretation of these fields).
 
I also went as far as adding some validation logic where if the required
constraints are not met, I emit an error:
 
  //see if we have met our requirements; it is required that we have an
equality
  //operation on column 1 (PATH), otherwise we can cope
  if ( SQLITE_INDEX_CONSTRAINT_EQ != afsde[1]._nOp )
  {
   /*XXX apparently, this can happen regardless of user SQL
   //must have required columns in constraint list. must must must.
   sqlite3_free( pThis->zErrMsg );
   pThis->zErrMsg = sqlite3_mprintf( VTBLNAME": There must be a constraint
on PATH" );
   return SQLITE_CONSTRAINT; //XXX something better?
  */
  }

but as you can see I commented that out, because sometimes sqlite asks 'what
if I just gave you no constraints, what would you think about that', and
then the error I emit causes the whole query to fail, even though I have
already indicated a query plan that would work.  (I.e., sqlite asks once
"how about constraining on 'path'" to which I reply "yes please", and then
it asked "well, what about constraining on nothing", to which I reply
"error".  Sqlite does not back off and use the other plan that was
'approved'.)
 
As a work-around I masked the invalid unconstrained condition in my xFilter
by simply returning no rows.  However, that approach has problems, too.
Consider this query, which consists of a join between the two vtables I
mentioned:
 
  select * from fsdirinfo fs, verinfo ( fs.PATH || '\' || fs.FNAME ) vi 
  where 
   fs.path = 'C:\Windows\System32' 
   and ( fs.fname glob '*.exe' or fs.fname glob '*.dll' );
 
As it is presently, this returns.... No rows!  Huh!  Well, I stepped through
the code of course before writing in this question, and here's what happens:
1)  fsdirinfo is requested to query on path via equality and fname via glob,
and I can use both in this case
2)  verinfo is requested to query on fqpath via equality, and I can take
that
3)  verinfo gets requested a second time to query on nothing at all.  I have
no choice to say 'uhm, OK' and setup to return no rows, since my
communicating that is a error will abort the query
 
And sqlite then selects the item 3 approach.  Upon sqlite3_step(), fsdirinfo
is run through all the rows as expected, and verinfo is run through, but of
course gets no FQPATH value, and therefor cannot return any rows.  Net
result:  the join results in no rows.  (Incidentally, I have tried this
query functionally in three ways:  1) hidden columns for params (depicted
above), 2) constraints on projected columns (i.e. ye olde
cartesian-product-with-where-clause), 3) join syntax.  All three conventions
work fine, and in the same way.  Alas, that way yields no results, haha.)
 
OK, so with some context, now I can ask some questions (sorry for so much
exposition above).  Questions:
1)  is there an orthodox method of indicating that a query plan request from
xBestIndex is a no-go, for wahtever reason (in my case there must be
constraints on some columns), but does not abort the entire query, and
simply causes one of the other accepted plans to be used.  (I think this is
the single best option if it does exist).
2)  am I using the 'pIdxInfo->aConstraintUsage[nIdx].omit' variable
appropriately?  My interpretation is that means 'the vtable can fully handle
the constraint, sqlite does not need to do a double-check on it afterwards'.
I.e., in the example above, the vtable can totally handle it, but if there
was something like a regex on fname, the vtable can help narrow the results
down, but sqlite needs to do a final fer-real regex test to reject some that
select through anyway.  In that case omit would be 0.
 
Lastly, the third question regards my work-around.  I feel dirty doing this,
but if the query plan fails requirements, then instead of communicating an
error value, what I'm doing is setting the cost to infinity, e.g.
 
3)  is something like this the only hope:
 
 if ( 0 == pIdxInfo->idxNum )
 {
    pIdxInfo->estimatedCost = DBL_MAX;
 }
 else
 {
    pIdxInfo->estimatedCost = 10;
 }
 
Doing this did keep sqlite picking the 'approved' plan, but it just doesn't
feel like a deterministic solution to fiddle with query costs to avoid
catastrophe.
 
And I guess as a bonus 4th question:  What is the established orthodoxy in
picking estimatedCost anyway?  It seems from the source comments that it is
intended to mean 'approximate number of disk accesses', which I understand
qualitatively, but what is a 'disk access' quantitatively.  And how would I
compare that to, say, an API call that I am using as source data for my
vtable, which is purely in-memory, but could conceivably be quite expensive
(e.g. network stuff), so how would I tweak that?
 
OK!  Thanks so much if you read this far!  And thanks even more if anyone
can advise on how to deal with xBestIndex and required constraints on an
eponymous vtable acting as a table-valued function!
 
Cheers;
-dave


_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to