Re: Concurreny does not improve for Spark Jobs with Same Spark Context

2016-02-18 Thread Prabhu Joseph
Fair Scheduler, YARN Queue has the entire cluster resource as maxResource, preemption does not come into picture during test case, all the spark jobs got the requested resource. The concurrent jobs with different spark context runs fine, so suspecting on resource contention is not a correct one.

Re: Concurreny does not improve for Spark Jobs with Same Spark Context

2016-02-18 Thread Jörn Franke
How did you configure YARN queues? What scheduler? Preemption ? > On 19 Feb 2016, at 06:51, Prabhu Joseph wrote: > > Hi All, > >When running concurrent Spark Jobs on YARN (Spark-1.5.2) which share a > single Spark Context, the jobs take more time to complete

Concurreny does not improve for Spark Jobs with Same Spark Context

2016-02-18 Thread Prabhu Joseph
Hi All, When running concurrent Spark Jobs on YARN (Spark-1.5.2) which share a single Spark Context, the jobs take more time to complete comparing with when they ran with different Spark Context. The spark jobs are submitted on different threads. Test Case: A. 3 spark jobs submitted

Re: How to run PySpark tests?

2016-02-18 Thread Holden Karau
Great - I'll update the wiki. On Thu, Feb 18, 2016 at 8:34 PM, Jason White wrote: > Compiling with `build/mvn -Pyarn -Phadoop-2.4 -Phive -Dhadoop.version=2.4.0 > -DskipTests clean package` followed by `python/run-tests` seemed to do the > trick! Thanks! > > > > -- >

Re: How to run PySpark tests?

2016-02-18 Thread Jason White
Compiling with `build/mvn -Pyarn -Phadoop-2.4 -Phive -Dhadoop.version=2.4.0 -DskipTests clean package` followed by `python/run-tests` seemed to do the trick! Thanks! -- View this message in context:

Re: How to run PySpark tests?

2016-02-18 Thread Holden Karau
I've run into some problems with the Python tests in the past when I haven't built with hive support, you might want to build your assembly with hive support and see if that helps. On Thursday, February 18, 2016, Jason White wrote: > Hi, > > I'm trying to finish up a PR

Re: Ability to auto-detect input data for datasources (by file extension).

2016-02-18 Thread Reynold Xin
Thanks for the email. Don't make it that complicated. We just want to simplify the common cases (e.g. csv/parquet), and don't need this to work for everything out there. On Thu, Feb 18, 2016 at 9:25 PM, Hyukjin Kwon wrote: > Hi all, > > I am planning to submit a PR for >

Ability to auto-detect input data for datasources (by file extension).

2016-02-18 Thread Hyukjin Kwon
Hi all, I am planning to submit a PR for https://issues.apache.org/jira/browse/SPARK-8000. Currently, file format is not detected by the file extension unlike compression codecs are being detected. I am thinking of introducing another interface (a function) at DataSourceRegister just like

Re: Welcoming two new committers

2016-02-18 Thread 刘畅
Awesome! Congrats and welcome!! 2016-02-18 11:26 GMT+08:00 Cheng Lian : > Awesome! Congrats and welcome!! > > Cheng > > On Tue, Feb 9, 2016 at 2:55 AM, Shixiong(Ryan) Zhu < > shixi...@databricks.com> wrote: > >> Congrats!!! Herman and Wenchen!!! >> >> >> On Mon, Feb 8,

How to run PySpark tests?

2016-02-18 Thread Jason White
Hi, I'm trying to finish up a PR (https://github.com/apache/spark/pull/10089) which is currently failing PySpark tests. The instructions to run the test suite seem a little dated. I was able to find these: https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals

Re: DataFrame API and Ordering

2016-02-18 Thread Reynold Xin
You are correct and we should document that. Any suggestions on where we should document this? In DoubleType and FloatType? On Tuesday, February 16, 2016, Maciej Szymkiewicz wrote: > I am not sure if I've missed something obvious but as far as I can tell > DataFrame API

Re: Kafka connector mention in Matei's keynote

2016-02-18 Thread Reynold Xin
I think Matei was referring to the Kafka direct streaming source added in 2015. On Thu, Feb 18, 2016 at 11:59 AM, Cody Koeninger wrote: > I saw this slide: >

Kafka connector mention in Matei's keynote

2016-02-18 Thread Cody Koeninger
I saw this slide: http://image.slidesharecdn.com/east2016v2matei-160217154412/95/2016-spark-summit-east-keynote-matei-zaharia-5-638.jpg?cb=1455724433 Didn't see the talk - was this just referring to the existing work on the spark-streaming-kafka subproject, or is someone actually working on

Re: SPARK-9559

2016-02-18 Thread Daniel Darabos
YARN may be a workaround. On Thu, Feb 18, 2016 at 4:13 PM, Ashish Soni wrote: > Hi All , > > Just wanted to know if there is any work around or resolution for below > issue in Stand alone mode > > https://issues.apache.org/jira/browse/SPARK-9559 > > Ashish >

SPARK-9559

2016-02-18 Thread Ashish Soni
Hi All , Just wanted to know if there is any work around or resolution for below issue in Stand alone mode https://issues.apache.org/jira/browse/SPARK-9559 Ashish