Re: Spark Job on YARN Hogging the entire Cluster resource

2016-02-24 Thread Prabhu Joseph
YARN-2026 has fixed the issue. On Thu, Feb 25, 2016 at 4:17 AM, Prabhu Joseph wrote: > You are right, Hamel. It should get 10 TB /2. And In hadoop-2.7.0, it is > working fine. But in hadoop-2.5.1, it gets only 10TB/230. The same > configuration used in both versions.

Re: [build system] additional jenkins downtime next thursday

2016-02-24 Thread shane knapp
the security update has been released, and it's a doozy! https://wiki.jenkins-ci.org/display/SECURITY/Security+Advisory+2016-02-24 i will be putting jenkins in to quiet mode ~7am PST tomorrow morning for the upgrade, and expect to be back up and building by 9am PST at the latest.

Re: Spark 1.6.1

2016-02-24 Thread Yin Yang
Have you tried using scp ? scp file i...@people.apache.org Thanks On Wed, Feb 24, 2016 at 5:04 PM, Michael Armbrust wrote: > Unfortunately I don't think thats sufficient as they don't seem to support > sftp in the same way they did before. We'll still need to update

Re: Spark 1.6.1

2016-02-24 Thread Michael Armbrust
Unfortunately I don't think thats sufficient as they don't seem to support sftp in the same way they did before. We'll still need to update our release scripts. On Wed, Feb 24, 2016 at 2:09 AM, Yin Yang wrote: > Looks like access to people.apache.org has been restored. > >

Spark HANA jdbc connection issue

2016-02-24 Thread Dushyant Rajput
Hi, Will this be resolved in any forthcoming release? https://issues.apache.org/jira/browse/SPARK-10625 Rgds, Dushyant.

Re: Spark Job on YARN Hogging the entire Cluster resource

2016-02-24 Thread Prabhu Joseph
You are right, Hamel. It should get 10 TB /2. And In hadoop-2.7.0, it is working fine. But in hadoop-2.5.1, it gets only 10TB/230. The same configuration used in both versions. So i think a JIRA could have fixed the issue after hadoop-2.5.1. On Thu, Feb 25, 2016 at 1:28 AM, Hamel Kothari

how about a custom coalesce() policy?

2016-02-24 Thread Nezih Yigitbasi
Hi Spark devs, I have sent an email about my problem some time ago where I want to merge a large number of small files with Spark. Currently I am using Hive with the CombineHiveInputFormat and I can control the size of the output files with the max split size parameter (which is used for

Spark Summit (San Francisco, June 6-8) call for presentation due in less than week

2016-02-24 Thread Reynold Xin
Just want to send a reminder in case people don't know about it. If you are working on (or with, using) Spark, consider submitting your work to Spark Summit, coming up in June in San Francisco. https://spark-summit.org/2016/call-for-presentations/ Cheers.

Re: ORC file writing hangs in pyspark

2016-02-24 Thread James Barney
Thank you for the suggestions. We looked at the live spark UI and yarn app logs and found what we think to be the issue: in spark 1.5.2, the FPGrowth algorithm doesn't require you to specify the number of partitions in your input data. Without specifying, FPGrowth puts all of its data into one

Re: Spark Job on YARN Hogging the entire Cluster resource

2016-02-24 Thread Hamel Kothari
The instantaneous fair share is what Queue B should get according to the code (and my experience). Assuming your queues are all equal it would be 10TB/2. I can't help much more unless I can see your config files and ideally also the YARN Scheduler UI to get an idea of what your queues/actual

Re: Spark Job on YARN Hogging the entire Cluster resource

2016-02-24 Thread Prabhu Joseph
Hi Hamel, Thanks for looking into the issue. What i am not understanding is, after preemption what is the share that the second queue gets in case if the first queue holds the entire cluster resource without releasing, is it instantaneous fair share or fair share. Queue A and B are

Re: Build fails

2016-02-24 Thread Marcelo Vanzin
The error is right there. Just read the output more carefully. On Wed, Feb 24, 2016 at 11:37 AM, Minudika Malshan wrote: > [INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-versions) @ > spark-parent_2.11 --- > [WARNING] Rule 0:

Re: Build fails

2016-02-24 Thread Minudika Malshan
Here is the full stack trace.. @Yin : yeah it seems like a problem with maven version. I am going to update maven. @ Marcelo : Yes, couldn't decide what's wrong at first :) Thanks for your help! [INFO] Scanning for projects... [INFO]

Re: Build fails

2016-02-24 Thread Marcelo Vanzin
Well, did you do what the message instructed you to do and looked above the message you copied for more specific messages for why the build failed? On Wed, Feb 24, 2016 at 11:28 AM, Minudika Malshan wrote: > Hi, > > I am trying to build from spark source code which was

Build fails

2016-02-24 Thread Minudika Malshan
Hi, I am trying to build from spark source code which was cloned from https://github.com/apache/spark.git. But it fails with following error. [ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce (enforce-versions) on project spark-parent_2.11: Some Enforcer

Re: Spark Job on YARN Hogging the entire Cluster resource

2016-02-24 Thread Hamel Kothari
If all queues are identical, this behavior should not be happening. Preemption as designed in fair scheduler (IIRC) takes place based on the instantaneous fair share, not the steady state fair share. The fair scheduler docs

Re: Spark 1.6.1

2016-02-24 Thread Yin Yang
Looks like access to people.apache.org has been restored. FYI On Mon, Feb 22, 2016 at 10:07 PM, Luciano Resende wrote: > > > On Mon, Feb 22, 2016 at 9:08 PM, Michael Armbrust > wrote: > >> An update: people.apache.org has been shut down so the