Perhaps it can avoid errors(exhausting executor and driver memory) to add
random numbers to the entity_id column when you solve the issue by
Patrick's way.
Daniel Chalef 于2020年10月31日周六
上午12:42写道:
> Yes, the resulting matrix would be sparse. Thanks for the suggestion. Will
> explore ways of
Yes, the resulting matrix would be sparse. Thanks for the suggestion. Will
explore ways of doing this using an agg and UDF.
On Fri, Oct 30, 2020 at 6:26 AM Patrick McCarthy
wrote:
> That's a very large vector. Is it sparse? Perhaps you'd have better luck
> performing an aggregate instead of a
That's a very large vector. Is it sparse? Perhaps you'd have better luck
performing an aggregate instead of a pivot, and assembling the vector using
a UDF.
On Thu, Oct 29, 2020 at 10:19 PM Daniel Chalef
wrote:
> Hello,
>
> I have a very large long-format dataframe (several billion rows) that