Felix Schwarz wrote:

Hi,

I have to decide on a table layout for storing the data of my upcoming project.

Yesterday I have made my way through the excellent presentation at http://www.sqlite.org/php2004/page-036.html and read the sentence

"Put small and frequently used columns early in the table to minimize the need to follow the overflow chain."

Now, that's interesting! And I'm wondering whether there is a big performance hit for a simple

    SELECT binarydata FROM entries WHERE somehash = 27817298;

when I use

    CREATE TABLE entries(
           entry_id INTEGER PRIMARY KEY,
           somehash  INTEGER,
           property1 INTEGER,
           property2  VARCHAR(255),
           property3  VARCHAR(255),
           binarydata BLOB
    );

instead of splitting the binary data (around 40K each) into two tables like this:

    CREATE TABLE entries(
           entry_id INTEGER PRIMARY KEY,
           somehash  INTEGER,
           property1 INTEGER,
           property2  VARCHAR(255),
           property3  VARCHAR(255),
           binary_id INTEGER
    );

    CREATE TABLE binaries(
           binary_id INTEGER PRIMARY KEY,
           binarydata BLOB
    );

and then use a select of this form:

SELECT binarydata FROM binaries WHERE binary_id = (SELECT binary_id FROM entries WHERE somehash = 27817298);

Also, could the usage of VIEWs speed up the SELECTing of data in the second example? Or does it just use SELECTs under the hood itself, i.e. without any caching of data?

Thanks in advance for any feedback.

Felix

Felix,

I don't think you will see much difference between the two layouts. You have your search field, somehash, located early in your records so SQLite does not need to follow the overflow chain to read its value.

The split tables should actually take a little longer because after SQLite has found the correct hash value, it must locate the binary in the binaries table using the binary_id from the entries table. This operation will be relatively fast, but is not gaining you anything.

The hint from the web page is pointing out the added time it takes to scan the overflow chain to find the value of a field located after a large field in a record. Your 40K binary fields will be split into multiple overflow blocks in either table.

To speed this query you would be much better off adding an index on the somehash field. Then SQLite could use the index to quickly locate the correct record in your entries table. It will only scan the overflow chain for queries that return the value of the binarydata field.

HTH
Dennis Cote

Reply via email to