Try basing the interval query on a correlated subquery, so instead of

    q = session.query(
                column, 
                func.row_number().\
                        over(order_by=column).\
                        label('rownum')
                ).\
                from_self(column)


try

    subquery = base_query.subquery()    q = session.query(
                column, 
                func.row_number().\
                        over(order_by=column).\
                        label('rownum')
                ).\
                join(subquery, subquery.columns[column.name] == column).\
                from_self(column)


This should give a SQL statement of

SELECT anon_1.widget_id AS anon_1_widget_id 
FROM (
 SELECT widget.id AS widget_id, row_number() OVER (ORDER BY widget.id) AS 
rownum 
 FROM widget JOIN (
   SELECT widget.id AS id, widget.data AS data 
   FROM widget 
   WHERE widget.id < :id_1) AS anon_2 
   ON anon_2.id = widget.id) AS anon_1 
WHERE rownum % 1000=1


where myQuery is a filtered query on the widget table, in this example I 
used the following to take the first half of the widget table

        base_query = session.query(Widget).filter(Widget.id < 5000)

For reference, the SQL statement in the original post is

SELECT anon_1.widget_id AS anon_1_widget_id 
FROM (
 SELECT widget.id AS widget_id, row_number() OVER (ORDER BY widget.id) AS 
rownum 
 FROM widget) AS anon_1 
WHERE rownum % 1000=1






On Tuesday, August 5, 2014 12:12:42 AM UTC+1, Jonathan Vanasco wrote:
>
> I've been working on a migration script, and turned to my trusty friend 
> the WindowedRangeQuery recipe:
>
>       
> https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery
>
> I ran into a small problem and hoping someone else may have done this and 
> saved me some work.
>
> I'm trying to work out some edge cases on a large table, and the default 
> performance is pretty bad in Postgres.
>
> The two big issues are:
>
> 1. The initial query to create an iterable range is painful.  It takes me 
> about 2 minutes.  This seems to be due to:
>     a - loading a lot of data into memory
>     b - sequentially scanning the 'id' column.  this function won't use 
> the primary key, or any other index, for id.  it seq scans the whole table.
>
> 2. Window ranges are created for all ids in the table.  If I only want to 
> work on half the records, with a min/max, the 'iterable' query doesn't use 
> the min/max filter values; it generates 100% of potential ranges based on 
> the id, and the filters are only used for the 'where' clause.  
>
> Has anyone worked out an improved version of this technique, or am I 
> cobbling something together myself?
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to