Re: spark 1.4 GC issue

2015-11-15 Thread Ted Yu
Please take a look at http://www.infoq.com/articles/tuning-tips-G1-GC Cheers On Sat, Nov 14, 2015 at 10:03 PM, Renu Yadav wrote: > I have tried with G1 GC .Please if anyone can provide their setting for GC. > At code level I am : > 1.reading orc table usind dataframe > 2.map

Re: Are map tasks spilling data to disk?

2015-11-15 Thread Reynold Xin
It depends on what the next operator is. If the next operator is just an aggregation, then no, the hash join won't write anything to disk. It will just stream the data through to the next operator. If the next operator is shuffle (exchange), then yes. On Sun, Nov 15, 2015 at 10:52 AM, gsvic

Re: A proposal for Spark 2.0

2015-11-15 Thread Prashant Sharma
Hey Matei, > Regarding Scala 2.12, we should definitely support it eventually, but I > don't think we need to block 2.0 on that because it can be added later too. > Has anyone investigated what it would take to run on there? I imagine we > don't need many code changes, just maybe some REPL

Re: Support for local disk columnar storage for DataFrames

2015-11-15 Thread Reynold Xin
This (updates) is something we are going to think about in the next release or two. On Thu, Nov 12, 2015 at 8:57 AM, Cristian O wrote: > Sorry, apparently only replied to Reynold, meant to copy the list as well, > so I'm self replying and taking the opportunity

Re: Hive on Spark Vs Spark SQL

2015-11-15 Thread Reynold Xin
It's a completely different path. On Sun, Nov 15, 2015 at 10:37 PM, kiran lonikar wrote: > I would like to know if Hive on Spark uses or shares the execution code > with Spark SQL or DataFrames? > > More specifically, does Hive on Spark benefit from the changes made to >

Re: Hive on Spark Vs Spark SQL

2015-11-15 Thread kiran lonikar
So does not benefit from Project Tungsten right? On Mon, Nov 16, 2015 at 12:07 PM, Reynold Xin wrote: > It's a completely different path. > > > On Sun, Nov 15, 2015 at 10:37 PM, kiran lonikar wrote: > >> I would like to know if Hive on Spark uses or

Re: Hive on Spark Vs Spark SQL

2015-11-15 Thread Reynold Xin
No it does not -- although it'd benefit from some of the work to make shuffle more robust. On Sun, Nov 15, 2015 at 10:45 PM, kiran lonikar wrote: > So does not benefit from Project Tungsten right? > > > On Mon, Nov 16, 2015 at 12:07 PM, Reynold Xin

Map Tasks - Disk Spill (?)

2015-11-15 Thread gsvic
According to this paper Spak's map tasks writes the results to disk. My actual question is, in BroadcastHashJoin

Are map tasks spilling data to disk?

2015-11-15 Thread gsvic
According to this paper Spak's map tasks writes the results to disk. My actual question is, in BroadcastHashJoin