Try basing the interval query on a correlated subquery, so instead of q = session.query( column, func.row_number().\ over(order_by=column).\ label('rownum') ).\ from_self(column)
try subquery = base_query.subquery() q = session.query( column, func.row_number().\ over(order_by=column).\ label('rownum') ).\ join(subquery, subquery.columns[column.name] == column).\ from_self(column) This should give a SQL statement of SELECT anon_1.widget_id AS anon_1_widget_id FROM ( SELECT widget.id AS widget_id, row_number() OVER (ORDER BY widget.id) AS rownum FROM widget JOIN ( SELECT widget.id AS id, widget.data AS data FROM widget WHERE widget.id < :id_1) AS anon_2 ON anon_2.id = widget.id) AS anon_1 WHERE rownum % 1000=1 where myQuery is a filtered query on the widget table, in this example I used the following to take the first half of the widget table base_query = session.query(Widget).filter(Widget.id < 5000) For reference, the SQL statement in the original post is SELECT anon_1.widget_id AS anon_1_widget_id FROM ( SELECT widget.id AS widget_id, row_number() OVER (ORDER BY widget.id) AS rownum FROM widget) AS anon_1 WHERE rownum % 1000=1 On Tuesday, August 5, 2014 12:12:42 AM UTC+1, Jonathan Vanasco wrote: > > I've been working on a migration script, and turned to my trusty friend > the WindowedRangeQuery recipe: > > > https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery > > I ran into a small problem and hoping someone else may have done this and > saved me some work. > > I'm trying to work out some edge cases on a large table, and the default > performance is pretty bad in Postgres. > > The two big issues are: > > 1. The initial query to create an iterable range is painful. It takes me > about 2 minutes. This seems to be due to: > a - loading a lot of data into memory > b - sequentially scanning the 'id' column. this function won't use > the primary key, or any other index, for id. it seq scans the whole table. > > 2. Window ranges are created for all ids in the table. If I only want to > work on half the records, with a min/max, the 'iterable' query doesn't use > the min/max filter values; it generates 100% of potential ranges based on > the id, and the filters are only used for the 'where' clause. > > Has anyone worked out an improved version of this technique, or am I > cobbling something together myself? > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.