On 24/04/2009 2:16 AM, Marco Bambini wrote:
> I have a query that takes ages ... my app remains in the first  
> sqlite_step for more than 30 minutes (with 100% CPU time).
> Query is:
> 
> SELECT reseaux.name, reseaux.id_reseau, reseaux.insee_id,  
> panneaux.insee_id,  
> panneaux 
> .reseau_id 
>   ,insee 
> .commune 
> ,afficheurs 
> .societe,reseaux.name,reseaux.nb,reseaux.tarif,agglomerations.commune  
> AS aggloName,agglomerations.codeInsee AS  
> aggloInsee 
> ,reseaux 
> .deroulant 
> ,reseaux 
> .eclairage 
> ,reseaux 
> .emprise 
> ,reseaux.partDeVoie,insee.population,SUM(reseaux_insee.population) AS  
> popTouch,SUM(formats.trivision) AS trivision, insee.agglo_id,  
> agglomerations.id_agglomeration FROM  
> reseaux,reseaux_insee,panneaux,formats,typeFormats,afficheurs,insee,  
> agglomerations WHERE insee.agglo_id = agglomerations.id_agglomeration  
> AND panneaux.reseau_id = reseaux.id_reseau AND (panneaux.insee_id =  
> 38656) AND formats.reseau_id = reseaux.id_reseau AND  
> ((formats.startDate >= '2008-02-09' AND formats.endDate <=  
> '2008-02-16') OR (formats.endDate >= '2008-02-09' AND  
> formats.startDate <= '2008-02-09') OR (formats.endDate >= '2008-02-16'  
> AND formats.startDate <= '2008-02-16') OR (formats.startDate <=  
> '2008-02-09' AND formats.endDate >= '2008-02-16')) AND  
> formats.typeFormat_id = typeFormats.id_typeFormat AND  
> afficheurs.id_afficheur = reseaux.afficheur_id AND insee.id_insee =  
> reseaux.insee_id GROUP BY reseaux.id_reseau ORDER BY reseaux.name
> 
> and tables are:

[snipped]

> 
> Can someone suggest me the best index/indexes to use for such a query?  

Do some experimentation with indexes on formats.startDate or 
formats.endDate. Note that if you have multiple indexes on one table, 
SQLite3 will choose one and only one of those indexes to use in the query.

> or the best way to rewrite it in a way more manageable by sqlite?

Never mind sqlite, the first step would be to reformat the SQL so that 
it is understandable by humans e.g.

SELECT reseaux.name, reseaux.id_reseau, reseaux.insee_id,
[snip]
     agglomerations.id_agglomeration
FROM
     reseaux, reseaux_insee, panneaux, formats, typeFormats,
     afficheurs, insee, agglomerations
-- **NOTE** it helps enormously when checking for errors
-- (and preventing errors!)
-- if one uses JOIN clauses instead of
-- listing out the participating tables and hoping
-- they get mentioned in the WHERE clause!!
-- You have 8 tables listed above, but only 6 joining
-- conditions. The reseaux_insee table is an orphan.
-- The result is the (slow) generation
-- of a Cartesian product, *AND* (I would expect)
-- incorrect results. Have you checked the answers,
-- especially "SUM(reseaux_insee.population) AS popTouch"?
WHERE insee.agglo_id = agglomerations.id_agglomeration -- *** INNER JOIN
AND panneaux.reseau_id = reseaux.id_reseau  -- *** INNER JOIN
AND formats.reseau_id = reseaux.id_reseau -- *** INNER JOIN
AND formats.typeFormat_id = typeFormats.id_typeFormat -- *** INNER JOIN
AND afficheurs.id_afficheur = reseaux.afficheur_id -- *** INNER JOIN
AND insee.id_insee = reseaux.insee_id -- *** INNER JOIN
AND (panneaux.insee_id =  38656)
-- the following looks "interesting"
AND (
     (formats.startDate >= '2008-02-09' AND formats.endDate <= '2008-02-16')
     OR (formats.endDate >= '2008-02-09' AND  formats.startDate <= 
'2008-02-09')
     OR (formats.endDate >= '2008-02-16' AND formats.startDate <= 
'2008-02-16')
     OR (formats.startDate <= '2008-02-09' AND formats.endDate >= 
'2008-02-16')
     )
-- very very "interesting"
GROUP BY reseaux.id_reseau
ORDER BY reseaux.name

Then let's focus on the big bunch of "OR"s -- they are worth a good look 
at; I don't recall an example of a query running slower because "a OR b 
OR c ..." was replaced by something logically equivalent.

In this case it appears that you are trying to express this:

Definition: a format is "active" on date D if and only if
    D BETWEEN formats.startDate and formats.endDate
is true

Query: include only if the format is "active" on any date in the period 
2008-02-09 to 2008-02-16 (both inclusive).

This can be expressed rather simply by replacing the big
    AND (four alternative conditions)
by
    AND formats.startDate <= '2008-02-16'
    AND formats.endDate >= '2008-02-09'

This may not be equivalent if formats.startDate > formats.endDate but I 
assume that you've got that kind of problem under control :-)

HTH,
John

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to