Hi Cheng,
Are you saying that by setting up the lineage
schemaRdd.keyBy(_.getString(1)).partitionBy(new
HashPartitioner(n)).values.applySchema(schema)
then Spark SQL will know that an SQL “group by” on Customer Code will not have
to shuffle?
But the prepared will have already shuffled so we p
n, would you like to discuss?
Thanks
Mick
> On 30 Dec 2014, at 17:40, Michael Davies wrote:
>
> Hi Michael,
>
> I’ve looked through the example and the test cases and I think I understand
> what we need to do - so I’ll give it a go.
>
> I think what I’d like to try
Hi Michael,
I’ve looked through the example and the test cases and I think I understand
what we need to do - so I’ll give it a go.
I think what I’d like to try to do is allow files to be added at anytime, so
perhaps I can cache partition info, and also what may be useful for us would be
to d