On 2016/01/28 8:17 PM, Yannick Duch?ne wrote:
> Hi people,
>
> I though automatic index was controlled by the pragma `automatic_index`, 
> which seems to do nothing for me, unless I'm doing it the wrong way, while 
> `WITHOUT ROWID` on table creations, seems to effectively prevent automatic 
> indexes, the latter with or without the `PRAGMA automatic_index=0`.
>
> Is this expected or is this me doing something wrong?
>
> I'm concerned about it mainly for file size, more than for speed, while it 
> matters too, as the program is far more slow than expected. For a program I'm 
> running I got this:
>
>    * With ROWID: execution time is 22 seconds and DB file size is 1.9MB?;
>    * Without ROWID, execution time is 19 seconds and DB file size is 788KB 
> (2.4 times smaller)?;
>    * With insertion disabled, execution time is 6 seconds, so most of the 
> execution time is spent in rows insertion.
>
> The program indexes HTML files (*), the test runs on only 52 average sized 
> HTML files, so the difference will be even more important with bigger files 
> set.
>
> I'm feeling lost about this: the pragma seems to do nothing and it's `WITHOUT 
> ROWID` on table creations, which seems to do it instead. The pragma is 
> executed right after the cursor creation. I'm using APSW (**). Pragma seems 
> to works, as I tested with `PRAGMA synchronous=0` (just for a test, I really 
> don't want to use this).

I think you are misunderstanding the Pragma and the idea of automatic 
indices. An automatic Index might be created on a table that doesn't 
have an adequately assigned primary key. It might also be created during 
a query (mostly SELECT or sub-SELECT) for which there is no useful Index 
created by the table designer and the Query planner figures it will be 
quicker to make an Index than to do table scans through the query. This 
habit of creating indices during select queries can be forced to not 
happen by setting the "PRAGMA automatic_index=0;", but this needs to 
happen when you open the DB connection, or at a minimum, before you try 
any query - not after the cursor is created,  by that time the index 
might already be made. (This is why you are not seeing any speed 
improvement).

I only mention all the above so you understand what the automatic 
indexing is about, but it has almost certainly nothing to do with your 
query slowness, and even if you switch it off at a more opportune time, 
I would be surprised if it changes the query speed.

>
>
> I don't know if it's OK to ask for this kind of question here. If it's not, 
> just tell me, I won't bother.

It's always Ok to ask here. Welcome to SQLite-World. :)


>
>
> (*) There are bout 103 000 insertions queries ending into 15 600 rows, using 
> this:

103k insertions to produce 15k rows... that is ~15% efficiency - the 
opposite of good design. Perhaps we can help you find better SQL to 
solve your problem. Let's see...

>
>      CREATE TABLE element_attribute_values
>        (element TEXT NOT NULL,
>         attribute TEXT NOT NULL,
>         value TEXT NOT NULL
>           CHECK ((attribute IS NOT '') OR (value IS '')),
>         count INTEGER NOT NULL DEFAULT 1,
>         PRIMARY KEY (element, attribute, value))
>
>      INSERT OR REPLACE INTO element_attribute_values
>        (element, attribute, value, count)
>        VALUES (:element, :attribute, :value,
>          (SELECT count + 1 FROM element_attribute_values
>            WHERE (element IS :element)
>              AND (attribute IS :attribute)
>              AND (value IS :value)))
>
> (**) Standard Python SQLite3 binding seems perceptibly slow. With APSW, the 
> program is 17 times faster than with standard Python SQLite3 binding?? 
> execution time is around 370 seconds in the latter case, whatever how I 
> manage transactions.
>
>

APSW is far better and maintained by Roger Binns (whom you might see 
post here sometimes). Using that is your best bet, but it also cannot 
help heavy-set queries. The best strategy is to make better queries.

I do not know the shape of your data (it matters), but I'm guessing 
"element" represents standard HTML tags with "attribute" and "value" 
giving basic expansion of the attributes list. it'd probably be safer to 
use non case-sensitive values and use standard equation tests in Selects.

Could you try these in your Query loop perhaps:


-- Table: Renamed field count to cnt because "count" is an SQL reserved 
word (though it will work, just a better habit)
CREATE TABLE element_attribute_values (
   element TEXT COLLATE NOCASE NOT NULL,
   attribute TEXT COLLATE NOCASE NOT NULL,
   value TEXT COLLATE NOCASE NOT NULL CHECK ((attribute <> '') OR (value 
= '')),
   cnt INTEGER NOT NULL DEFAULT 0,
   PRIMARY KEY (element, attribute, value)
);

-- Insert Loop start:
-- This will simply fail if the PK already exists, else start the line 
with 0 count.
INSERT OR IGNORE INTO element_attribute_values VALUES (:element, 
:attribute, :value, 0);

-- This might be faster since it uses only one lookup loop, but it might 
also not be.
-- If you share a list of example data to be inserted, we can find a 
faster way. Try it and let us know...
WITH EAV(id,icnt) AS (
   SELECT rowid,cnt+1
     FROM element_attribute_values
    WHERE (element = :element) AND (attribute = :attribute) AND (value = 
:value)
    LIMIT 1
)
UPDATE element_attribute_values SET cnt = (SELECT icnt FROM EAV)
  WHERE rowid = (SELECT id FROM EAV)
;
-- Insert Loop end.



Reply via email to