Re: [sqlite] Performance Problems with joining and subqueries

Da Martian Thu, 23 Oct 2008 07:56:34 -0700

I have two tables as follows:

CREATE TABLE ROT_DIM_CONTRACT_LINE (ContractNo Integer,ContractLine
Integer,StartDate Integer,EndDate Integer,NCRAmt Float,Item
Integer,BusinessArea Text,ProductGroup Text,ProductStyle Text,Site
Text,Customer Integer,PrincipalContractNo Text,SASContract
Text,ContractMonths Float,StartMonths Float,FullMonths Float,EndMonths
Float,MonthlyAmortAmt Float,FirstAmortAmt Float,LastAmort
Float,BalancingAmortAmt Float,RolloutToContractDiff Float, PRIMARY KEY
(ContractNo ASC,ContractLine ASC))
-CREATE UNIQUE INDEX IDX_ROT_DIM_CONTRACT_LINE_1 ON ROT_DIM_CONTRACT_LINE
(ContractNo ASC,ContractLine ASC)


CREATE TABLE ROT_FACT_REV_ROLLOUT (Period_ID Integer,ContractNo
Integer,ContractLine Integer,Revenue_Amount Float, PRIMARY KEY (Period_ID
ASC,ContractNo ASC,ContractLine ASC))
CREATE UNIQUE INDEX IDX_ROT_FACT_REV_ROLLOUT_1 ON ROT_FACT_REV_ROLLOUT
(Period_ID ASC,ContractNo ASC,ContractLine ASC)


ROT_DIM_CONTRACT_LINE has 131,747 records
ROT_FACT_REV_ROLLOUT has 3,971,369 records

The process I am doing is two fold:
1) Joining the data for complete list
2) Splitting the data by date (Period_ID) into two catagories based on a
threshold date.
    so if my threshold is Sep 2008 (200809) I want all records after Sep
2008 to be displayed AND
    I want a total column for all records prior to Sep 2008.

No case statements in SQLite so two subqueries does the job:

> SEP 2008
select * from
(SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as NCRAmt,
sum(rr.revenue_amount) as RevenueAmt
FROM
  ROT_DIM_CONTRACT_LINE cl join
  ROT_FACT_REV_ROLLOUT rr on
    (cl.ContractNo = rr.ContractNo and
     cl.ContractLine = rr.ContractLine)
where
    rr.period_id > 200809
group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, cl.Site, rr.period_id

<= SEP 2008
SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt
FROM
    ROT_DIM_CONTRACT_LINE cl join
    ROT_FACT_REV_ROLLOUT rr on
       (cl.ContractNo = rr.ContractNo and
        cl.ContractLine = rr.ContractLine)
where
   rr.period_id <= 200809
group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, cl.Site) ncritd on
   (fut.Customer = ncritd.Customer and
    fut.PrincipalContractNo = ncritd.PrincipalContractNo and
    fut.SASContractNo = ncritd.SASContractNo and
    fut.BusinessArea = ncritd.BusinessArea and
    fut.ProductGroup = ncritd.ProductGroup and
    fut.Site = ncritd.Site)
order by
   fut.Customer, fut.PrincipalContractNo, fut.SASContractNo,
fut.BusinessArea, fut.ProductGroup, fut.Site, fut.Period

Ignoring joining for the moment. Each of these queries on there own takes
around 1 minute. If however I create an index to satisfy the group by, say :

create index IDX_ROT_DIM_CONTRACT_LINE_2 on ROT_DIM_CONTRACT_LINE (Customer
ASC, PrincipalContractNo ASC, SASContract ASC, BusinessArea ASC,
ProductGroup ASC, Site ASC)
create index IDX_ROT_DIM_CONTRACT_LINE_3 on ROT_DIM_CONTRACT_LINE
(ContractNo ASC, ContractLine ASC, Customer ASC, PrincipalContractNo ASC,
SASContract ASC, BusinessArea ASC, ProductGroup ASC, Site ASC)
analyze

The query will run for 13 muntes + (killed after 13 minutes last time).
Explain Query plan confirms the use of IDX_ROT_DIM_CONTRACT_LINE_2

Now back to joining, without any extra indices and an empty sqlite_Stat1
table the following query performs terribly (in my opinion). Its simply a
join between the above two subqueries:

select * from
(SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as NCRAmt,
sum(rr.revenue_amount) as RevenueAmt
FROM
  ROT_DIM_CONTRACT_LINE cl join
  ROT_FACT_REV_ROLLOUT rr on
    (cl.ContractNo = rr.ContractNo and
     cl.ContractLine = rr.ContractLine)
where
    rr.period_id > 200809
group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, cl.Site, rr.period_id) fut join
(SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt
FROM
    ROT_DIM_CONTRACT_LINE cl join
    ROT_FACT_REV_ROLLOUT rr on
       (cl.ContractNo = rr.ContractNo and
        cl.ContractLine = rr.ContractLine)
where
   rr.period_id <= 200809
group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, cl.Site) ncritd on
   (fut.Customer = ncritd.Customer and
    fut.PrincipalContractNo = ncritd.PrincipalContractNo and
    fut.SASContractNo = ncritd.SASContractNo and
    fut.BusinessArea = ncritd.BusinessArea and
    fut.ProductGroup = ncritd.ProductGroup and
    fut.Site = ncritd.Site)
order by
   fut.Customer, fut.PrincipalContractNo, fut.SASContractNo,
fut.BusinessArea, fut.ProductGroup, fut.Site, fut.Period


However if I change the first query to be a temp table:
drop table fut
create temp table fut as
SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea as
BusinessArea, cl.ProductGroup as ProductGroup, cl.Site as Site, rr.period_id
as period, sum(cl.NCRAmt) as NCRAmt, sum(rr.revenue_amount) as RevenueAmt
FROM
  ROT_DIM_CONTRACT_LINE cl join
  ROT_FACT_REV_ROLLOUT rr on
    (cl.ContractNo = rr.ContractNo and
     cl.ContractLine = rr.ContractLine)
where
    rr.period_id > 200809
group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, Site, rr.period_id;

I know get around 2 minutes for the temp table combined with the second
subquery:

select fut.Customer as Customer, fut.PrincipalContractNo as
PrincipalContractNo, fut.SASContractNo as SASContractNo, fut.BusinessArea as
BusinessArea, fut.ProductGroup as ProductGroup, fut.Site as Site,
ncritd.NCRAmt as MCR_ITD, fut.Period as Period, fut.RevenueAmt as
RevenueAmt
from
 fut fut join
(SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt
FROM
    ROT_DIM_CONTRACT_LINE cl join
    ROT_FACT_REV_ROLLOUT rr on
       (cl.ContractNo = rr.ContractNo and
        cl.ContractLine = rr.ContractLine)
where
   rr.period_id <= 200809
group by
   cl.Customer, PrincipalContractNo, SASContract, BusinessArea,
ProductGroup, Site) ncritd on
   (fut.Customer = ncritd.Customer and
    fut.PrincipalContractNo = ncritd.PrincipalContractNo and
    fut.SASContractNo = ncritd.SASContractNo and
    fut.BusinessArea = ncritd.BusinessArea and
    fut.ProductGroup = ncritd.ProductGroup and
    fut.Site = ncritd.Site)
order by
   fut.Customer, fut.PrincipalContractNo, fut.SASContractNo,
fut.BusinessArea, fut.ProductGroup, fut.Site, fut.Period ;


So subqueries appear far more inefficient than temp tables.

Thanks,





On Thu, Oct 23, 2008 at 1:05 PM, Igor Tandetnik <[EMAIL PROTECTED]> wrote:

> "Da Martian" <[EMAIL PROTECTED]> wrote
> in message
> news:[EMAIL PROTECTED]<[EMAIL PROTECTED]>
> > Does anyone have any ideas on how to optimise this type of process in
> > SQLite?
>
> What type of process? Show your tables and your query.
>
> Igor Tandetnik
>
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Performance Problems with joining and subqueries

Reply via email to