Hello,
I was going through the Spark strategies class and found that by default
Sort merge join is preferred over shuffled hash join. The
preferSortMergeJoin needs to be explicitly set to False if we have to force
a shuffled hash join.
1) why is Sort merge join preferred over hash join?
2) are
Hello,
I recently noticed that spark doesn't optimize the joins when we are
limiting it.
Say when we have
payment.join(customer,Seq("customerId"), "left").limit(1).explain(true)
Spark doesn't optimize it.
> == Physical Plan ==
> CollectLimit 1
> +- *(5) Project [customerId#29, paymentId#28,
What is the key difference between Typed and untyped transformation in
dataset API?
How do I determine if its typed or untyped?
Any gotchas when to use what apart from the reason that it does the job for
me?
. in general if you use
> Dataset you miss out on some optimizations. also Encoders are not very
> pleasant to work with.
>
>> On Mon, Feb 18, 2019 at 9:09 PM Akhilanand wrote:
>>
>> Hello,
>>
>> I have been recently exploring about dataset and datafram
couldn’t find anything that tells
it specifically. If its just for datasets , does that mean we miss out on the
project tungsten optimisation for dataframes?
Regards,
Akhilanand BV
-
To unsubscribe e-mail: user-unsubscr