On Thu, 28 Jan 2016 22:08:02 +0200
R Smith <rsmith at rsweb.co.za> wrote:
> I think you are misunderstanding the Pragma and the idea of automatic
> indices. An automatic Index might be created on a table that doesn't
> have an adequately assigned primary key. It might also be created during
> a query (mostly SELECT or sub-SELECT) for which there is no useful Index
> created by the table designer and the Query planner figures it will be
> quicker to make an Index than to do table scans through the query. This
> habit of creating indices during select queries can be forced to not
> happen by setting the "PRAGMA automatic_index=0;", but this needs to
> happen when you open the DB connection, or at a minimum, before you try
> any query - not after the cursor is created, by that time the index
> might already be made. (This is why you are not seeing any speed
> improvement).
>
> I only mention all the above so you understand what the automatic
> indexing is about, but it has almost certainly nothing to do with your
> query slowness, and even if you switch it off at a more opportune time,
> I would be surprised if it changes the query speed.
Indeed, I was misunderstanding. Are these indexes visible? Is there any
relation with the ones I see from sqlitebrowser?
> 103k insertions to produce 15k rows... that is ~15% efficiency - the
> opposite of good design. Perhaps we can help you find better SQL to
> solve your problem. Let's see...
(red-face)
>
> I do not know the shape of your data (it matters), but I'm guessing
> "element" represents standard HTML tags with "attribute" and "value"
> giving basic expansion of the attributes list. it'd probably be safer to
> use non case-sensitive values and use standard equation tests in Selects.
>
The shape will change, it's far from final. The tables at that step depends on
the next steps in the overall procedure, which is not entirely fixed for now.
> Could you try these in your Query loop perhaps:
>
>
> -- Table: Renamed field count to cnt because "count" is an SQL reserved
> word (though it will work, just a better habit)
> CREATE TABLE element_attribute_values (
> element TEXT COLLATE NOCASE NOT NULL,
> attribute TEXT COLLATE NOCASE NOT NULL,
> value TEXT COLLATE NOCASE NOT NULL CHECK ((attribute <> '') OR (value
> = '')),
> cnt INTEGER NOT NULL DEFAULT 0,
> PRIMARY KEY (element, attribute, value)
> );
>
> -- Insert Loop start:
> -- This will simply fail if the PK already exists, else start the line
> with 0 count.
> INSERT OR IGNORE INTO element_attribute_values VALUES (:element,
> :attribute, :value, 0);
>
> -- This might be faster since it uses only one lookup loop, but it might
> also not be.
> -- If you share a list of example data to be inserted, we can find a
> faster way. Try it and let us know...
> WITH EAV(id,icnt) AS (
> SELECT rowid,cnt+1
> FROM element_attribute_values
> WHERE (element = :element) AND (attribute = :attribute) AND (value =
> :value)
> LIMIT 1
> )
> UPDATE element_attribute_values SET cnt = (SELECT icnt FROM EAV)
> WHERE rowid = (SELECT id FROM EAV)
> ;
> -- Insert Loop end.
>
The default may even be omitted, and may be the `LIMIT 1` too, as each triplet
is unique.
I tried it, with the variant (no `LIMIT 1`) and it's about the same, even a bit
more slow (not much, between one and two seconds more long). While speed is not
the only concern, or perhaps I should not care that much about the DB file size
(the other matters).
It's close to what I had at the beginning, which gave similar timings (just
changed to match your recommendation about "cnt"'s name):
INSERT OR IGNORE INTO element_attribute_values
VALUES (:element, :attribute, :value, 0);
UPDATE element_attribute_values
SET cnt = cnt + 1
WHERE (element = :element)
AND (attribute = :attribute)
AND (value = :value);
> -- If you share a list of example data to be inserted, we can find a
> faster way. Try it and let us know...
I don't mind, I can upload an archive somewhere. I guess you mean table's
content?
I'm aware this use case may be a bit pathological, as I could use Python's
dictionary. However, I decided to not to, for three reasons:
* I wanted to see what it's like to use an SQLite DB as an application data
container (file or memory)?;
* Using a persistent DB is better for incremental process (may stop and resume
later)?;
* Persistent data is welcome for human review (I think about defining views in
sqlitebrowser to dissect the results)?;
For persistence, I first tried CSV files, but this shows to be a inadequate. An
SQL DB and a DB browser, looks better than CSV for this use?case.
--
Yannick Duch?ne