from:"raaggarw"

How spark decides whether to do BroadcastHashJoin or SortMergeJoin

2016-07-20 Thread raaggarw

Hi, How spark decides/optimizes internally as to when it needs to a BroadcastHashJoin vs SortMergeJoin? Is there anyway we can guide from outside or through options which Join to use? Because in my case when i am trying to do a join, spark makes that join as BroadCastHashJoin internally and when j

OutOfMemory when doing joins in spark 2.0 while same code runs fine in spark 1.5.2

2016-06-09 Thread raaggarw

Hi, I was trying to port my code from spark 1.5.2 to spark 2.0 however i faced some outofMemory issues. On drilling down i could see that OOM is because of join, because removing join fixes the issue. I then created a small spark-app to reproduce this: (48 cores, 300gb ram - divided among 4 worke

Re: Timeline for supporting basic operations like groupBy, joins etc on Streaming DataFrames

2016-06-05 Thread raaggarw

Thanks So, 1) For joins (stream-batch) - are all types of joins supported - i mean inner, leftouter etc or specific ones? Also what is the timeline for complete support - I mean stream-stream joins? 2) So now outputMode is exposed via DataFrameWriter but will work in specific cases as you mention

Re: Timeline for supporting basic operations like groupBy, joins etc on Streaming DataFrames

2016-06-05 Thread raaggarw

I accidentally deleted the original post. So I am just pasting the response from Tathagata Das Join is supported but only stream-batch joins. Outmodes were added late last week, currently supports append mode for non-aggregation queries and complete mode for aggregation queries. And with complet

Timeline for supporting basic operations like groupBy, joins etc on Streaming DataFrames

2016-06-05 Thread raaggarw

Hi, I am Ravi, Computer scientist @ Adobe Systems. We have been actively using Spark for our internal projects. Recently we had a need for ETL on streaming data, so we were exploring Spark 2.0 for that. *But as i could see, the streaming dataframes do not support basic operations like Joins, group

Timeline for supporting basic operations like groupBy, joins etc on Streaming DataFrames

2016-06-05 Thread raaggarw

Hi, I am Ravi, Computer scientist @ Adobe Systems. We have been actively using Spark for our internal projects. Recently we had a need for ETL on streaming data, so we were exploring Spark 2.0 for that. *But as i could see, the streaming dataframes do not support basic operations like Joins, group

How spark decides whether to do BroadcastHashJoin or SortMergeJoin

OutOfMemory when doing joins in spark 2.0 while same code runs fine in spark 1.5.2

Re: Timeline for supporting basic operations like groupBy, joins etc on Streaming DataFrames

Re: Timeline for supporting basic operations like groupBy, joins etc on Streaming DataFrames

Timeline for supporting basic operations like groupBy, joins etc on Streaming DataFrames

Timeline for supporting basic operations like groupBy, joins etc on Streaming DataFrames

6 matches

Site Navigation

Mail list logo

Footer information