On Mar 26, 2019, at 8:35 AM, Wout Mertens <wout.mert...@gmail.com> wrote:
> 
> what would be the best way to answer these, given `CREATE TABLE foo(id
> TEXT, json JSON);` and json is always a json object:
> 
>   - all rows with a given key bar
>      - SELECT * FROM foo WHERE json_extract(json, '$.bar') IS NOT NULL;
>   - all rows where there are only any of the given keys a,b in the object
>      - SELECT * FROM foo WHERE json_remove(json,'$.a','$.b') ='{}';
>   - all rows where there are all the given keys a,b and no others in the
>   object
>      - SELECT * FROM foo WHERE json_remove(json,'$.a','$.b') ='{}' and
>      json_extract(json, '$.a') IS NOT NULL and json_extract(json,
> '$.b') IS NOT
>      NULL;

The biggest problem with these isn’t the use of JSON per se, it’s that each one 
is unindexed, so they’ll all be a full table scan.  That makes your use of 
SQLite here scarcely better than storing all the data in a single JSON object, 
stored in a simple text file, querying it with something like jq.

If we’re talking about hundreds or a few thousand records, the parsing cost 
probably doesn’t matter, but if we’re talking millions or billions of records, 
it’s a pretty expensive way to go.

You should be able to create indices on expressions like “json_extract(json, 
'$.bar')”, but you’ll want to test it to find out whether the space it takes is 
worth the speed increase.

You could also think about partially denormalizing the data: on record 
insertion, pre-parse the JSON and extract the key fields as SQLite fields.

Even better would be to simply choose not to store all of your data in JSON 
form: keep the key fields in normal SQLite fields.  Then you can do indexed 
lookups on the actual records you require and do JSON extraction only on those 
records for each query.

> these queries seem pretty onerous to me

SQLite’s JSON extension is pretty quick.  A recent test I did here did a full 
table scan with JSON parsing on each of ~10000 records in about 80ms.

On the one hand, that’s impressively fast, but on the other, it’s up in 
human-scale time, so I ended up caching the query results for a few minutes to 
avoid making this expensive query too often.

Keep in mind that there’s nothing you can’t encode in plain old SQLite, in 
principle.  JSON is a luxury, not an absolute requirement.  Transforming the 
data to be more relational than hierarchical could be a very worthwhile 
investment, if you have enough records or a low enough latency budget.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to