Re: Use mvn to build Spark 1.2.0 failed

2014-12-22 Thread Sean Owen
I just tried the exact same command and do not see any error. Maybe you can make sure you're starting from a clean extraction of the distro, and check your environment. I'm on OSX, Maven 3.2, Java 8 but I don't know that any of those would be relevant. On Mon, Dec 22, 2014 at 4:10 AM, wyphao.2007

Spark exception when sending message to akka actor

2014-12-22 Thread Priya Ch
Hi All, I have akka remote actors running on 2 nodes. I submitted spark application from node1. In the spark code, in one of the rdd, i am sending message to actor running on node1. My Spark code is as follows: class ActorClient extends Actor with Serializable { import context._ val curre

Tuning Spark Streaming jobs

2014-12-22 Thread Gerard Maas
Hi, After facing issues with the performance of some of our Spark Streaming jobs, we invested quite some effort figuring out the factors that affect the performance characteristics of a Streaming job. We defined an empirical model that helps us reason about Streaming jobs and applied it to tune

Re: Tuning Spark Streaming jobs

2014-12-22 Thread Timothy Chen
Hi Gerard, Really nice guide! I'm particularly interested in the Mesos scheduling side to more evenly distribute cores across cluster. I wonder if you are using coarse grain mode or fine grain mode? I'm making changes to the spark mesos scheduler and I think we can propose a best way to achi

Re: Use mvn to build Spark 1.2.0 failed

2014-12-22 Thread Patrick Wendell
I also couldn't reproduce this issued. On Mon, Dec 22, 2014 at 2:24 AM, Sean Owen wrote: > I just tried the exact same command and do not see any error. Maybe > you can make sure you're starting from a clean extraction of the > distro, and check your environment. I'm on OSX, Maven 3.2, Java 8 but

cleaning up cache files left by SPARK-2713

2014-12-22 Thread Cody Koeninger
Is there a reason not to go ahead and move the _cache and _lock files created by Utils.fetchFiles into the work directory, so they can be cleaned up more easily? I saw comments to that effect in the discussion of the PR for 2713, but it doesn't look like it got done. And no, I didn't just have a

Re: cleaning up cache files left by SPARK-2713

2014-12-22 Thread Marcelo Vanzin
https://github.com/apache/spark/pull/3705 On Mon, Dec 22, 2014 at 10:19 AM, Cody Koeninger wrote: > Is there a reason not to go ahead and move the _cache and _lock files > created by Utils.fetchFiles into the work directory, so they can be cleaned > up more easily? I saw comments to that effect

Re: spark-yarn_2.10 1.2.0 artifacts

2014-12-22 Thread David McWhorter
Thank you, Sean, using spark-network-yarn seems to do the trick. On 12/19/2014 12:13 PM, Sean Owen wrote: I believe spark-yarn does not exist from 1.2 onwards. Have a look at spark-network-yarn for where some of that went, I believe. On Fri, Dec 19, 2014 at 5:09 PM, David McWhorter wrote: Hi

Re: Tuning Spark Streaming jobs

2014-12-22 Thread Gerard Maas
Hi Tim, That would be awesome. We have seen some really disparate Mesos allocations for our Spark Streaming jobs. (like (7,4,1) over 3 executors for 4 kafka consumer instead of the ideal (3,3,3,3)) For network dependent consumers, achieving an even deployment would provide a reliable and reproduc

Re: Data source interface for making multiple tables available for query

2014-12-22 Thread Michael Armbrust
I agree and this is something that we have discussed in the past. Essentially I think instead of creating a RelationProvider that returns a single table, we'll have something like an external catalog that can return multiple base relations. On Sun, Dec 21, 2014 at 6:43 PM, Venkata ramana gollamudi

Announcing Spark Packages

2014-12-22 Thread Xiangrui Meng
Dear Spark users and developers, I’m happy to announce Spark Packages (http://spark-packages.org), a community package index to track the growing number of open source packages and libraries that work with Apache Spark. Spark Packages makes it easy for users to find, discuss, rate, and install pac

More general submitJob API

2014-12-22 Thread Alessandro Baretta
Fellow Sparkers, I'm rather puzzled at the submitJob API. I can't quite figure out how it is supposed to be used. Is there any more documentation about it? Also, is there any simpler way to multiplex jobs on the cluster, such as starting multiple computations in as many threads in the driver and

Re: Announcing Spark Packages

2014-12-22 Thread Andrew Ash
Hi Xiangrui, That link is currently returning a 503 Over Quota error message. Would you mind pinging back out when the page is back up? Thanks! Andrew On Mon, Dec 22, 2014 at 12:37 PM, Xiangrui Meng wrote: > Dear Spark users and developers, > > I’m happy to announce Spark Packages (http://spa

Re: Announcing Spark Packages

2014-12-22 Thread Patrick Wendell
Xiangrui asked me to report that it's back and running :) On Mon, Dec 22, 2014 at 3:21 PM, peng wrote: > Me 2 :) > > > On 12/22/2014 06:14 PM, Andrew Ash wrote: > > Hi Xiangrui, > > That link is currently returning a 503 Over Quota error message. Would you > mind pinging back out when the page i

Re: Announcing Spark Packages

2014-12-22 Thread Hitesh Shah
Hello Xiangrui, If you have not already done so, you should look at http://www.apache.org/foundation/marks/#domains for the policy on use of ASF trademarked terms in domain names. thanks — Hitesh On Dec 22, 2014, at 12:37 PM, Xiangrui Meng wrote: > Dear Spark users and developers, > > I’m

Re: More general submitJob API

2014-12-22 Thread Andrew Ash
Hi Alex, SparkContext.submitJob() is marked as experimental -- most client programs shouldn't be using it. What are you looking to do? For multiplexing jobs, one thing you can do is have multiple threads in your client JVM each submit jobs on your SparkContext job. This is described here in the

Re: More general submitJob API

2014-12-22 Thread Alessandro Baretta
Andrew, Thanks, yes, this is what I wanted: basically just to start multiple jobs concurrently in threads. Alex On Mon, Dec 22, 2014 at 4:04 PM, Andrew Ash wrote: > > Hi Alex, > > SparkContext.submitJob() is marked as experimental -- most client programs > shouldn't be using it. What are you l

Re: More general submitJob API

2014-12-22 Thread Patrick Wendell
A SparkContext is thread safe, so you can just have different threads that create their own RDD's and do actions, etc. - Patrick On Mon, Dec 22, 2014 at 4:15 PM, Alessandro Baretta wrote: > Andrew, > > Thanks, yes, this is what I wanted: basically just to start multiple jobs > concurrently in th

Re: Announcing Spark Packages

2014-12-22 Thread Nicholas Chammas
Hitesh, >From your link : You may not use ASF trademarks such as “Apache” or “ApacheFoo” or “Foo” in your own domain names if that use would be likely to confuse a relevant consumer about the source of software or services provided through your web

Re: Announcing Spark Packages

2014-12-22 Thread Patrick Wendell
Hey Nick, I think Hitesh was just trying to be helpful and point out the policy - not necessarily saying there was an issue. We've taken a close look at this and I think we're in good shape her vis-a-vis this policy. - Patrick On Mon, Dec 22, 2014 at 5:29 PM, Nicholas Chammas wrote: > Hitesh, >

Re: Announcing Spark Packages

2014-12-22 Thread Nicholas Chammas
Okie doke! (I just assumed there was an issue since the policy was brought up.) On Mon Dec 22 2014 at 8:33:53 PM Patrick Wendell wrote: > Hey Nick, > > I think Hitesh was just trying to be helpful and point out the policy > - not necessarily saying there was an issue. We've taken a close look >

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Nicholas Chammas
Does this include contributions made against the spark-ec2 repo? On Wed Dec 17 2014 at 12:29:19 AM Patrick Wendell wrote: > Hey All, > > Due to the very high volume of contributions, we're switching to an > automated process for generating release credits. Th

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Patrick Wendell
Hey Josh, We don't explicitly track contributions to spark-ec2 in the Apache Spark release notes. The main reason is that usually updates to spark-ec2 include a corresponding update to spark so we get it there. This may not always be the case though, so let me know if you think there is something

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Patrick Wendell
s/Josh/Nick/ - sorry! On Mon, Dec 22, 2014 at 10:52 PM, Patrick Wendell wrote: > Hey Josh, > > We don't explicitly track contributions to spark-ec2 in the Apache > Spark release notes. The main reason is that usually updates to > spark-ec2 include a corresponding update to spark so we get it ther