Hello,

today I've stumbled over a problem that I can't seem to find an answer for.

I have the following tables:

CREATE TABLE object (
        object_id INTEGER PRIMARY KEY
);

CREATE TABLE properties (
        object_id INTEGER,
        property_id INTEGER,
        value_table INTEGER
);

CREATE TABLE string_values (
        property_id INTEGER PRIMARY KEY,
        value VARCHAR(255)
);

CREATE TABLE int_values (
        property_id INTEGER PRIMARY KEY,
        value INTEGER
);

In English:
I have a list of objects. Each object can have 0..n properties. A property is either a 
string or an integer (and later maybe ever more different data types). In the 
'properties' table, the 'value_table' field indicates in which table the value of this 
property is stored: if it's 1, the value is stored in 'string_values'; if it's 2, the 
value is stored in 'int_values'.

Please note that I have left out the indexes to keep it simple; just assume that I 
have an index on each field that I use in WHERE and JOIN-ON statements.

The follwing SQL statements populate the database with an object, an int-property and 
a string-property:

-- Insert one object
INSERT INTO object VALUES (1);

-- Insert one string property
INSERT INTO properties VALUES (1, 100, 1);
INSERT INTO string_values VALUES (100, "Hallo");

-- Insert one int property
INSERT INTO properties VALUES (1, 100, 2);
INSERT INTO int_values VALUES (100, 123);

As you can see, the same values can be used in 'string_values.value' and 
'int_values.value'.
This means, that only the combination of 'value_table'/'property_id' uniquely 
identifies a property value.


When loading the objects into memory, I query for all properties of all objects. For this I use the following select statement:

SELECT
        object.object_id,
        properties.value_table,
        string_values.value,
        int_values.value
        
FROM object
LEFT OUTER JOIN properties ON object.object_id = properties.object_id
LEFT OUTER JOIN string_values ON
        (properties.value_table = 1) AND
        (properties.property_id = string_values.property_id)
        
LEFT OUTER JOIN int_values ON
        (properties.value_table = 2) AND
        (properties.property_id = int_values.property_id)
;       


The format of the result should look like this: +-----------+-------------+---------------------+------------------+ | object_id | value_table | string_values.value | int_values.value | +-----------+-------------+---------------------+------------------+

Using the SQL statement from above, the result I would like to get is this:
+-----------+-------------+---------------------+------------------+
| object_id | value_table | string_values.value | int_values.value |
+-----------+-------------+---------------------+------------------+
|         1 |           1 |               Hallo |             NULL |
|         1 |           2 |               NULL  |              123 |
+-----------+-------------+---------------------+------------------+

This works well with MySQL (which is where I first developed the select statement).
When using it with SQLite, however, I get the following result:
+-----------+-------------+---------------------+------------------+
| object_id | value_table | string_values.value | int_values.value |
+-----------+-------------+---------------------+------------------+
|         1 |        NULL |               NULL  |             NULL |
+-----------+-------------+---------------------+------------------+

I'm by no means a SQL expert (I've basically just started to play around with it), but my theory is this:
When MySQL processes the query, it performs the join from left to right.
First, it finds all objects.
For each object, it finds all properties.
For each property,
- it finds all 'string values'
- it finds all 'int values'
This would match the result of the query.


But it looks like SQLite performs the query in another order, which results in the 
NULL results.

I could find a select statement that produces the desired result in SQLite:

SELECT
        object.object_id,
        properties.value_table,
        string_values.value,
        int_values.value
        
FROM object
LEFT OUTER JOIN properties ON object.object_id = properties.object_id
LEFT OUTER JOIN string_values ON
        (
                (properties.value_table = 1) AND
                (properties.property_id = string_values.property_id)
        ) OR
        (properties.value_table IS NULL)
        
LEFT OUTER JOIN int_values ON
        (
                (properties.value_table = 2) AND
                (properties.property_id = int_values.property_id)
        ) OR
        (properties.value_table IS NULL)
        
;

However, this query is *very* slow.
For a single object with a few properties, this doesn't matter. When using a database 
with
100.000 entries, each having a few properties, the second select is several orders of 
magnitude slower than the first one, since it can't make full use of indexes and has 
to iterate over all values of all tables (according to MySQL's 'EXPLAIN' statement).

Now I'm searching for help. It would be great if someone could answer my questions:
- Why does the first select work with MySQL, but not with SQLite. Is it a bug in 
SQLite, or is MySQL by pure luck behaving as I want it to behave?
- Is there a way to perform such a query in SQLite, with both the desired result and 
high speed. Something like 'properties.value_table IS NULL' part of the second select 
can't be used, as it forces the database to iterate over all entries.

I know that I could use two select statements to first query for all string 
properties, then for all int properties. However, two queries are slower than one 
query, and the speed really matters (I have tried this with MySQL and SQLite).


Thanks for your time and help.

Rolf Schäuble



Reply via email to