I have two tables as follows:
CREATE TABLE ROT_DIM_CONTRACT_LINE (ContractNo Integer,ContractLine
Integer,StartDate Integer,EndDate Integer,NCRAmt Float,Item
Integer,BusinessArea Text,ProductGroup Text,ProductStyle Text,Site
Text,Customer Integer,PrincipalContractNo Text,SASContract
Text,ContractMonths Float,StartMonths Float,FullMonths Float,EndMonths
Float,MonthlyAmortAmt Float,FirstAmortAmt Float,LastAmort
Float,BalancingAmortAmt Float,RolloutToContractDiff Float, PRIMARY KEY
(ContractNo ASC,ContractLine ASC))
-CREATE UNIQUE INDEX IDX_ROT_DIM_CONTRACT_LINE_1 ON ROT_DIM_CONTRACT_LINE
(ContractNo ASC,ContractLine ASC)
CREATE TABLE ROT_FACT_REV_ROLLOUT (Period_ID Integer,ContractNo
Integer,ContractLine Integer,Revenue_Amount Float, PRIMARY KEY (Period_ID
ASC,ContractNo ASC,ContractLine ASC))
CREATE UNIQUE INDEX IDX_ROT_FACT_REV_ROLLOUT_1 ON ROT_FACT_REV_ROLLOUT
(Period_ID ASC,ContractNo ASC,ContractLine ASC)
ROT_DIM_CONTRACT_LINE has 131,747 records
ROT_FACT_REV_ROLLOUT has 3,971,369 records
The process I am doing is two fold:
1) Joining the data for complete list
2) Splitting the data by date (Period_ID) into two catagories based on a
threshold date.
so if my threshold is Sep 2008 (200809) I want all records after Sep
2008 to be displayed AND
I want a total column for all records prior to Sep 2008.
No case statements in SQLite so two subqueries does the job:
> SEP 2008
select * from
(SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as NCRAmt,
sum(rr.revenue_amount) as RevenueAmt
FROM
ROT_DIM_CONTRACT_LINE cl join
ROT_FACT_REV_ROLLOUT rr on
(cl.ContractNo = rr.ContractNo and
cl.ContractLine = rr.ContractLine)
where
rr.period_id > 200809
group by
cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, cl.Site, rr.period_id
<= SEP 2008
SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt
FROM
ROT_DIM_CONTRACT_LINE cl join
ROT_FACT_REV_ROLLOUT rr on
(cl.ContractNo = rr.ContractNo and
cl.ContractLine = rr.ContractLine)
where
rr.period_id <= 200809
group by
cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, cl.Site) ncritd on
(fut.Customer = ncritd.Customer and
fut.PrincipalContractNo = ncritd.PrincipalContractNo and
fut.SASContractNo = ncritd.SASContractNo and
fut.BusinessArea = ncritd.BusinessArea and
fut.ProductGroup = ncritd.ProductGroup and
fut.Site = ncritd.Site)
order by
fut.Customer, fut.PrincipalContractNo, fut.SASContractNo,
fut.BusinessArea, fut.ProductGroup, fut.Site, fut.Period
Ignoring joining for the moment. Each of these queries on there own takes
around 1 minute. If however I create an index to satisfy the group by, say :
create index IDX_ROT_DIM_CONTRACT_LINE_2 on ROT_DIM_CONTRACT_LINE (Customer
ASC, PrincipalContractNo ASC, SASContract ASC, BusinessArea ASC,
ProductGroup ASC, Site ASC)
create index IDX_ROT_DIM_CONTRACT_LINE_3 on ROT_DIM_CONTRACT_LINE
(ContractNo ASC, ContractLine ASC, Customer ASC, PrincipalContractNo ASC,
SASContract ASC, BusinessArea ASC, ProductGroup ASC, Site ASC)
analyze
The query will run for 13 muntes + (killed after 13 minutes last time).
Explain Query plan confirms the use of IDX_ROT_DIM_CONTRACT_LINE_2
Now back to joining, without any extra indices and an empty sqlite_Stat1
table the following query performs terribly (in my opinion). Its simply a
join between the above two subqueries:
select * from
(SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as NCRAmt,
sum(rr.revenue_amount) as RevenueAmt
FROM
ROT_DIM_CONTRACT_LINE cl join
ROT_FACT_REV_ROLLOUT rr on
(cl.ContractNo = rr.ContractNo and
cl.ContractLine = rr.ContractLine)
where
rr.period_id > 200809
group by
cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, cl.Site, rr.period_id) fut join
(SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt
FROM
ROT_DIM_CONTRACT_LINE cl join
ROT_FACT_REV_ROLLOUT rr on
(cl.ContractNo = rr.ContractNo and
cl.ContractLine = rr.ContractLine)
where
rr.period_id <= 200809
group by
cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, cl.Site) ncritd on
(fut.Customer = ncritd.Customer and
fut.PrincipalContractNo = ncritd.PrincipalContractNo and
fut.SASContractNo = ncritd.SASContractNo and
fut.BusinessArea = ncritd.BusinessArea and
fut.ProductGroup = ncritd.ProductGroup and
fut.Site = ncritd.Site)
order by
fut.Customer, fut.PrincipalContractNo, fut.SASContractNo,
fut.BusinessArea, fut.ProductGroup, fut.Site, fut.Period
However if I change the first query to be a temp table:
drop table fut
create temp table fut as
SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea as
BusinessArea, cl.ProductGroup as ProductGroup, cl.Site as Site, rr.period_id
as period, sum(cl.NCRAmt) as NCRAmt, sum(rr.revenue_amount) as RevenueAmt
FROM
ROT_DIM_CONTRACT_LINE cl join
ROT_FACT_REV_ROLLOUT rr on
(cl.ContractNo = rr.ContractNo and
cl.ContractLine = rr.ContractLine)
where
rr.period_id > 200809
group by
cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, Site, rr.period_id;
I know get around 2 minutes for the temp table combined with the second
subquery:
select fut.Customer as Customer, fut.PrincipalContractNo as
PrincipalContractNo, fut.SASContractNo as SASContractNo, fut.BusinessArea as
BusinessArea, fut.ProductGroup as ProductGroup, fut.Site as Site,
ncritd.NCRAmt as MCR_ITD, fut.Period as Period, fut.RevenueAmt as
RevenueAmt
from
fut fut join
(SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt
FROM
ROT_DIM_CONTRACT_LINE cl join
ROT_FACT_REV_ROLLOUT rr on
(cl.ContractNo = rr.ContractNo and
cl.ContractLine = rr.ContractLine)
where
rr.period_id <= 200809
group by
cl.Customer, PrincipalContractNo, SASContract, BusinessArea,
ProductGroup, Site) ncritd on
(fut.Customer = ncritd.Customer and
fut.PrincipalContractNo = ncritd.PrincipalContractNo and
fut.SASContractNo = ncritd.SASContractNo and
fut.BusinessArea = ncritd.BusinessArea and
fut.ProductGroup = ncritd.ProductGroup and
fut.Site = ncritd.Site)
order by
fut.Customer, fut.PrincipalContractNo, fut.SASContractNo,
fut.BusinessArea, fut.ProductGroup, fut.Site, fut.Period ;
So subqueries appear far more inefficient than temp tables.
Thanks,
On Thu, Oct 23, 2008 at 1:05 PM, Igor Tandetnik <[EMAIL PROTECTED]> wrote:
> "Da Martian" <[EMAIL PROTECTED]> wrote
> in message
> news:[EMAIL PROTECTED]<[EMAIL PROTECTED]>
> > Does anyone have any ideas on how to optimise this type of process in
> > SQLite?
>
> What type of process? Show your tables and your query.
>
> Igor Tandetnik
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users