Am 27.06.2006 um 18:06 schrieb Dennis Cote:

Jens Miltner wrote:
I have a schema similar to this:

CREATE TABLE foo (id integer primary key, name text);
CREATE TABLE bar (id integer primary key, foo_id integer, something text);

CREATE INDEX bar_idx on bar(foo_id, something);


When I run a query like

SELECT
    foo.id AS foo_id,
    bar.id AS bar_id
FROM
    bar
LEFT JOIN
    foo ON foo.id=bar.foo_id
ORDER BY
    bar.something
;

sqlite will only use the primary key index of foo
(as the output of 'explain query plan' shows:

0|0|TABLE bar
1|1|TABLE foo USING PRIMARY KEY

)


If I try to disable the foo primary key index by prefixing foo.id with a '+' sign, no index will be used.

I'd have expected the bar_idx index to be used for the ORDER BY clause? Or is this some unreasonable assumption?
Is there a way to enforce this?

Thanks,
</jum>


Jens,

Your query is not using any index to do the order by clause, it is sorting the intermediate results.

There is no need to join the foo table in your query since the foo_id is available in the bar table. The following is equivalent:

   SELECT
       foo_id,
       bar.id AS bar_id
   FROM
       bar
   ORDER BY
       bar.something
   ;


You're right, of course, but then again, this was just a (bad) example of what our table relations are... In reality, our tables are far larger and we do indeed need to join them, because not all information is available from the base table:

SELECT
    foo.name AS name,
    bar.something AS something
FROM
    bar
LEFT JOIN
    foo ON foo.id=bar.foo_id
ORDER BY
    bar.something
;



Is there any way to improve the ORDER BY performance for joined queries? From your answer that the intermediate results are sorted, I take it no index won't ever be used when using ORDER BY with a join query?

Is there a way to rewrite the queries so we don't take the penalty of sorting without an index?

In our case, the 'bar' table may have several 100,000 entries and the 'foo' table probably has much less (say order of thousand).

A minimal query (i.e. one where we don't return a minimal amount of data, so the pure data shuffling doesn't get in the way) with a 'bar' table with 250,000 entries and a 'foo' table with around 10 entries (so the 250,000 entries in the 'bar' table relate to only a few records in the 'foo' table), a query like the above takes 10-20 minutes to run (depending on the ORDER BY clauses used)!

(Side-note: The table does have quite a few columns and apparently, the amount of data per row also massively affects the performance... when I tried with the same table with not all columns filled in - i.e. less data - performance was much better. I also tried increasing the page size, hoping that less paging would have happen, but this didn't really make a noticeable difference)

I would really appreciate any hints on how to improve performance with this kind of setup...

Thanks,

</jum>

Reply via email to