On 2016/01/28 11:44 PM, Yannick Duch?ne wrote:
> On Thu, 28 Jan 2016 22:08:02 +0200
> R Smith <rsmith at rsweb.co.za> wrote:
>
> Indeed, I was misunderstanding. Are these indexes visible? Is there 
> any relation with the ones I see from sqlitebrowser? 

If you mean you are looking at the DB schema and seeing automatic 
indices there, then those are mostly the created ones where the table 
doesn't contain a suitable index or perhaps the rowid. If you mean the 
automatic indices that would be created in a select query for which the 
query planner decides to add an index to facilitate faster querying - 
well those can't be seen like that, but you can inspect the output from 
"EXPLAIN QUERY PLAN" or just "EXPLAIN" (added to the front of your 
query). These will show the query planner's steps and scan plans 
respectively (showing any such auto indices it might use).

>
> The shape will change, it's far from final. The tables at that step depends 
> on the next steps in the overall procedure, which is not entirely fixed for 
> now.

Good to know.

>> Could you try these in your Query loop perhaps:
>>
>>
>> -- Table: Renamed field count to cnt because "count" is an SQL reserved
>> word (though it will work, just a better habit)
>> CREATE TABLE element_attribute_values (
>>     element TEXT COLLATE NOCASE NOT NULL,
>>     attribute TEXT COLLATE NOCASE NOT NULL,
>>     value TEXT COLLATE NOCASE NOT NULL CHECK ((attribute <> '') OR (value
>> = '')),
>>     cnt INTEGER NOT NULL DEFAULT 0,
>>     PRIMARY KEY (element, attribute, value)
>> );
>>
>> -- Insert Loop start:
>> -- This will simply fail if the PK already exists, else start the line
>> with 0 count.
>> INSERT OR IGNORE INTO element_attribute_values VALUES (:element,
>> :attribute, :value, 0);
>>
>> -- This might be faster since it uses only one lookup loop, but it might
>> also not be.
>> -- If you share a list of example data to be inserted, we can find a
>> faster way. Try it and let us know...
>> WITH EAV(id,icnt) AS (
>>     SELECT rowid,cnt+1
>>       FROM element_attribute_values
>>      WHERE (element = :element) AND (attribute = :attribute) AND (value =
>> :value)
>>      LIMIT 1
>> )
>> UPDATE element_attribute_values SET cnt = (SELECT icnt FROM EAV)
>>    WHERE rowid = (SELECT id FROM EAV)
>> ;
>> -- Insert Loop end.
>>
> The default may even be omitted, and may be the `LIMIT 1` too, as each 
> triplet is unique.

Yeah, the LIMIT 1 is just to force single-row-results which are required 
for the sub-query - it is however not possible to obtain multiple row 
results here, so the limit is superfluous in this case, but I like 
keeping it in for legibility.

> I tried it, with the variant (no `LIMIT 1`) and it's about the same, even a 
> bit more slow (not much, between one and two seconds more long). While speed 
> is not the only concern, or perhaps I should not care that much about the DB 
> file size (the other matters).
>
> It's close to what I had at the beginning, which gave similar timings (just 
> changed to match your recommendation about "cnt"'s name):

Yeah, I did not have a test-bed but I thought it might not improve much. 
Post a table somewhere with typical values (the 103k rows) that would be 
needing insertion, we might find a real fast way.


>
> I'm aware this use case may be a bit pathological, as I could use Python's 
> dictionary. However, I decided to not to, for three reasons:
>
>   * I wanted to see what it's like to use an SQLite DB as an application data 
> container (file or memory)?;
>   * Using a persistent DB is better for incremental process (may stop and 
> resume later)?;
>   * Persistent data is welcome for human review (I think about defining views 
> in sqlitebrowser to dissect the results)?;
>
> For persistence, I first tried CSV files, but this shows to be a inadequate. 
> An SQL DB and a DB browser, looks better than CSV for this use?case.
>

I think the dictionary would be faster for this use-case (mostly cause 
it runs in-memory and we do not require all the DB data guarantees for 
the initial sort). However, the fastest way in SQL would be to use a 
temporary table, do you have any problems with that? It would mean you 
would do the inserts in an insert loop and then simply calculate the 
incidence values and insert to the real table - so it would require a 2 
- step process.

In fact, let me write the SQL quick here. The table create is just like 
before, and then:

BEGIN TRANSACTION;

   -- Make Temporary table
CREATE TEMPORARY TABLE eav (
   e TEXT COLLATE NOCASE,
   a TEXT COLLATE NOCASE,
   v TEXT COLLATE NOCASE
);

   -- The insert loop:
INSERT INTO eav VALUES (:element, :attribute, :value);
   -- end of Insert loop


CREATE INDEX Idx_eav ON eav(e,a,v);

INSERT OR IGNORE INTO element_attribute_values (element,attribute,value,cnt)
   SELECT e,a,v,COUNT(*)
     FROM EAV
    WHERE 1
    GROUP BY e,a,v;

   -- Cleanup
DROP TABLE eav;

COMMIT;


If that is not a LOT faster then I will be very surprised...

Reply via email to