On Thu, Aug 19, 2010 at 5:53 PM, Kees Nuyt <k.n...@zonnet.nl> wrote:
> On Thu, 19 Aug 2010 17:39:14 -0400, Eric Smith
> <eas....@gmail.com> wrote:
>
>>Am I missing something?
>
> You could add a "deleted" column with value range (0,1) and
> create an index on it if benchmarks show that makes it
> faster. As a bonus it is easier to code and maintain than a
> separate table with references and triggers.
>
> Alternatively, you can create an composite index with the
> "deleted" column as one of the components.
>
> From a theoretical view, if you care about the visibility of
> a row, you should express it as an attribute of the entity.
> The solutions above comply with that notion.
> --
>  (  Kees Nuyt

I think you've missed the point. I believe what he's getting at is this:

>> CREATE INDEX foo ON bar (to_be_deleted) <<

Imagine if he had 100 million rows in his table, and 100 of them were
marked "to_be_deleted".
His index will have 100 million rows, probably 500MB or 900MB (not
sure if rowid is 32- or 64-bit), consisting of 99,999,900 "0"s and 100
"1"s.

If he could create what MSSQL calls a "filtered index", using a syntax
like this:

>> CREATE INDEX foo_filtered ON bar (to_be_deleted) WHERE to_be_deleted = 1 <<

he could speed up the statement

>> DELETE FROM bar WHERE to_be_deleted = 1 <<

using that index, just like he could with the unfiltered "foo" index.
The only difference is that where foo has 100 million rows,
foo_filtered only contains 100 rows, taking up only 500-900 bytes
(thus actually having like 300% overhead due to page sizes!)


Now, in order to implement this, the following changes would have to be made:

1. Conditional logic would have to be generated inside the VDBE
programs for INSERT statements.  This is pretty straightforward.

2. Conditional logic would have to be generated inside the VDBE
programs for UPDATE statements. Care must be taken to make sure that
the index is updated properly when the column(s) referenced in the
WHERE clause are updated, but other than that, it's probably pretty
straightforward.

3. Depending on how the IdxDelete operator handles "key not found in
index" errors, the VDBE code generated for DELETE statements may also
need to be updated.

4. The statement parser needs to be modified to parse this syntax.

5. The schema parser needs to be modified to decode this syntax.

6. The optimizer needs to flatten and check that every possible branch
of the WHERE clause on a SELECT/DML statement is compatible with the
WHERE clause of the index, before it can use that index.

Now, I personally could do #1-3, because they're pretty easy.
I could probably even manage #4 and #5 if I spent a week familiarizing
myself with the code.
But #6, as far as I can tell, is a LOT harder.  Consider the following examples:

create index ix1 on Bar (baz) where quux between 30 and 95;

select * from baz where quux = 35; -- Index is viable
select * from baz where quux between 31 and 94; -- Index is viable
select * from baz where quux = 38 or quux between 80 and 90; -- Index is viable
select * from baz where quux in (40,50,60,70); -- again, index is viable
select * from baz where quux between 25 and 35; -- index is NOT viable
select * from baz where quux = 38 or baz = 5; -- index is NOT viable



-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to