Re: [web2py] Re: self join on postgres

paolo.vall...@gmail.com Tue, 08 Jan 2013 00:28:08 -0800

Thanks for this explanations! To add more info, I can say that the elements
on left column are naturally ordered, I have thousands of
records(~5000record/day) for each station but only a few
station_id(~5-10stations totally).
Instead of making a complex query, a cleaner approach would be to run the
query with only the join and then do a loop in python to remove the
unnecessary lines.


Moreover, right now, the worst think is that since I switched to postgres
the initial query (the one with only the leftjoin) is taking minutes to be
executed (almost 4m) against just few seconds on sqlite. I set an index on
the mac field through pgAdmin and the results are really better,
unfortunately this hack is not portable, is there a way to define indexes
from web2py?

Paolo


2013/1/8 Niphlod <niph...@gmail.com>

> after a full day at work I may lack the usual fantasy, but what you're
> trying to do can't be even achieved by a "relatively simple" windowed
> function....
> What you're trying to do is recursing..... way out of DAL reach (if you
> want to do it in a single query)
> Trying to explain better....
>
> Real data helps.....
> For every station_id record with the same mac address you want to find the
> "min gathered_on" record from the same table (with another station_id) and
> "subtract it" for every next possible match.
>
> One thing is requiring
>
> "2013-01-21 11:23:35";"a";127167;"2013-01-21 11:23:45";"a";127168
> "2013-01-21 11:23:00";"a";127169;"2013-01-21 11:23:45";"a";127168
> That can be accomplished by something like this
>
> select * from (
>     select    start_point.gathered_on,start_point.mac,start_point.id,
> end_point.gathered_on,end_point.mac,end_point.id,
> row_number() over (partition by start_point.id order by end_point.
> gathered_on) as filter_field
> from record as start_point
> inner join
> record as end_point
> on start_point.mac = end_point.mac
> and start_point.gathered_on <= end_point.gathered_on
> where start_point.station_id = 13
> and end_point.station_id = 14
> ) as q
> where q.filter_field = 1
>
> because for the record 127167 the next record with another station_id is
> 127168, but then for the 127169 you don't want the 127168, you want 127170
> because 127168 "has been booked before" by 127169.
>
> Honestly, (beware of the lack of fantasy :P) I'd do a loop in python
> instead of using recursing in the db itself unless you have zillions of
> "windows" (i.e. you have 1000 station_id = 13 and 1000 station_id = 14, and
> 1000 distinct station_id), just because it's more readable than what it
> would be needed in raw sql....
>
> --
>
>
>
>



-- 
 Paolo

--

Re: [web2py] Re: self join on postgres

Reply via email to