Hello,
I'm trying to optimize some queries in Hive that were recently switched to Tez.. had a general question regarding reducer parallelism .. A lot of our queries do the following style of simultaneous windowing .. SELECT row_number() OVER( PARTITION BY app, user, type ORDER BY ts ) as a_number, row_number() OVER( PARTITION BY day, app, user, type ORDER BY ts ) as type_rank, row_number() OVER( PARTITION BY day, app, user ORDER BY ts ) as dau_rank, FROM messages WHERE ... Since each OVER / PARTITION-By clause is independent they can the put into parallelized Reducer phases. But what I see is that these get serialized into M1 -> R1 -> R2 -> R3 .. instead of M1 -> [ R1, R2, R3 ] Is this something that Tez tries to do at all or an optimization that I can use to my benefit ? Cheers, -Gautam.