You could try caching the table. This would avoid the double read, but not
the shuffle (at least today with the current optimizer).
On Tue, Sep 29, 2015 at 5:21 PM, Data Science Education <
datasci...@gmail.com> wrote:
> As part of fairly complex processing, I am executing a self join query
> us
As part of fairly complex processing, I am executing a self join query
using HiveContext against a Hive table to find the latest Transaction,
oldest Transaction etc: for a given set of Attributes. I am still using
v1.3.1 and so Window functions are not an option. The simplified query
looks like bel