from:"Akhilanand"

Join selection

2019-03-04 Thread Akhilanand

Hello, I was going through the Spark strategies class and found that by default Sort merge join is preferred over shuffled hash join. The preferSortMergeJoin needs to be explicitly set to False if we have to force a shuffled hash join. 1) why is Sort merge join preferred over hash join? 2) are

Spark sql join optimizations

2019-02-26 Thread Akhilanand

Hello, I recently noticed that spark doesn't optimize the joins when we are limiting it. Say when we have payment.join(customer,Seq("customerId"), "left").limit(1).explain(true) Spark doesn't optimize it. > == Physical Plan == > CollectLimit 1 > +- *(5) Project [customerId#29, paymentId#28,

Difference between Typed and untyped transformation in dataset API

2019-02-21 Thread Akhilanand

What is the key difference between Typed and untyped transformation in dataset API? How do I determine if its typed or untyped? Any gotchas when to use what apart from the reason that it does the job for me?

Re: Difference between dataset and dataframe

2019-02-18 Thread Akhilanand

. in general if you use > Dataset you miss out on some optimizations. also Encoders are not very > pleasant to work with. > >> On Mon, Feb 18, 2019 at 9:09 PM Akhilanand wrote: >> >> Hello, >> >> I have been recently exploring about dataset and datafram

Difference between dataset and dataframe

2019-02-18 Thread Akhilanand

couldn’t find anything that tells it specifically. If its just for datasets , does that mean we miss out on the project tungsten optimisation for dataframes? Regards, Akhilanand BV - To unsubscribe e-mail: user-unsubscr

Join selection

Spark sql join optimizations

Difference between Typed and untyped transformation in dataset API

Re: Difference between dataset and dataframe

Difference between dataset and dataframe

5 matches

Site Navigation

Mail list logo

Footer information