Hi,
How spark decides/optimizes internally as to when it needs to a
BroadcastHashJoin vs SortMergeJoin? Is there anyway we can guide from
outside or through options which Join to use?
Because in my case when i am trying to do a join, spark makes that join as
BroadCastHashJoin internally and when j
Hi,
I was trying to port my code from spark 1.5.2 to spark 2.0 however i faced
some outofMemory issues. On drilling down i could see that OOM is because of
join, because removing join fixes the issue. I then created a small
spark-app to reproduce this:
(48 cores, 300gb ram - divided among 4 worke
Thanks
So,
1) For joins (stream-batch) - are all types of joins supported - i mean
inner, leftouter etc or specific ones?
Also what is the timeline for complete support - I mean stream-stream joins?
2) So now outputMode is exposed via DataFrameWriter but will work in
specific cases as you mention
I accidentally deleted the original post.
So I am just pasting the response from Tathagata Das
Join is supported but only stream-batch joins.
Outmodes were added late last week, currently supports append mode for
non-aggregation queries and complete mode for aggregation queries.
And with complet
Hi,
I am Ravi, Computer scientist @ Adobe Systems. We have been actively using
Spark for our internal projects. Recently we had a need for ETL on streaming
data, so we were exploring Spark 2.0 for that.
*But as i could see, the streaming dataframes do not support basic
operations like Joins, group
Hi,
I am Ravi, Computer scientist @ Adobe Systems. We have been actively using
Spark for our internal projects. Recently we had a need for ETL on streaming
data, so we were exploring Spark 2.0 for that.
*But as i could see, the streaming dataframes do not support basic
operations like Joins, group