Re: standard way of running a compiled jar

2014-02-24 Thread Sandy Ryza
nt to > create a script for it specifically. Maybe the same script could also allow > submitting to the cluster or something. > > Matei > > On Feb 23, 2014, at 1:55 PM, Sandy Ryza wrote: > > > Is the client=driver mode still a supported option (outside of the > R

Re: standard way of running a compiled jar

2014-02-23 Thread Sandy Ryza
hat takes into account all the > deploy modes though, because it would be confusing to use it one way on > YARN and another way on standalone for instance. Already the YARN submit > client kind of does what you're looking for. > > Matei > > On Feb 22, 2014, at 2:08 PM, Sa

standard way of running a compiled jar

2014-02-22 Thread Sandy Ryza
Hey All, I've encountered some confusion about how to run a Spark app from a compiled jar and wanted to bring up the recommended way. It seems like the current standard options are: * Build an uber jar that contains the user jar and all of Spark. * Explicitly include the locations of the Spark ja

Re: Signal/Noise Ratio

2014-02-22 Thread Sandy Ryza
Hadoop subprojects (MR, YARN, HDFS) each have a "dev" list that contains discussion as well as a single email whenever a JIRA is filed, and an "issues" list with all the JIRA activity. I think this works out pretty well. Subscribing just to the dev list, I can keep up with changes that are going

Re: [VOTE] Graduation of Apache Spark from the Incubator

2014-02-10 Thread Sandy Ryza
+1 On Mon, Feb 10, 2014 at 9:57 PM, Mark Hamstra wrote: > +1 > > > On Mon, Feb 10, 2014 at 8:27 PM, Chris Mattmann > wrote: > > > Hi Everyone, > > > > This is a new VOTE to decide if Apache Spark should graduate > > from the Incubator. Please VOTE on the resolution pasted below > > the ballot.

Re: Proposal for JIRA and Pull Request Policy

2014-02-06 Thread Sandy Ryza
+1 On Thu, Feb 6, 2014 at 4:10 PM, Mridul Muralidharan wrote: > +1 > > Would be great if the JIRA tag was 'clickable' to go to the actual JIRA :-) > > Regards, > Mridul > > > On Fri, Feb 7, 2014 at 5:35 AM, Patrick Wendell > wrote: > > As a break out from the other thread. I'd like to propose t

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Sandy Ryza
If the APIs are usable, stability and continuity are much more important than perfection. With many already relying on the current APIs, I think trying to clean them up will just cause pain for users and integrators. Hadoop made this mistake when they decided the original MapReduce APIs were ugly

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Sandy Ryza
Not codifying binary compatibility as a hard rule sounds fine to me. Would it make sense to put something in that . I.e. avoid making needless changes to class hierarchies. Whether Spark considers itself stable or not, users are beginning to treat it so. A responsible project will acknowledge th

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Sandy Ryza
*Would it make sense to put in something that strongly discourages binary incompatible changes when possible? On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza wrote: > Not codifying binary compatibility as a hard rule sounds fine to me. > Would it make sense to put something in that . I.e.

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Sandy Ryza
Bleh, hit send to early again. My second paragraph was to argue for 1.0.0 instead of 0.10.0, not to hammer on the binary compatibility point. On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza wrote: > *Would it make sense to put in something that strongly discourages binary > incompatible c

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Sandy Ryza
Thanks for all this Patrick. I like Heiko's proposal that requires every pull request to reference a JIRA. This is how things are done in Hadoop and it makes it much easier to, for example, find out whether an issue you came across when googling for an error is in a release. I agree with Mridul

Re: [VOTE] Graduation of Apache Spark

2014-01-26 Thread Sandy Ryza
+1! On Sun, Jan 26, 2014 at 1:49 PM, Matei Zaharia wrote: > Hi guys, > > Discussion has proceeded positively, so I'm calling for a community VOTE > for the graduation of Apache Spark (incubating) into a top level project. > If this VOTE is successful, then I'll call an Incubator PMC VOTE in 72 >

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc2)

2014-01-19 Thread Sandy Ryza
Has anybody tested against YARN 2.2? I tried it out against a pseudo-distributed cluster and ran into an issue I just filed as SPARK-1031 . thanks, Sandy On Sun, Jan 19, 2014 at 12:55 AM, Reynold Xin wrote: > +1 > > > On Sat, Jan 18, 2014

Should Spark on YARN example include --addJars?

2014-01-18 Thread Sandy Ryza
Hey All, I ran into an issue when trying to run SparkPi as described in the Spark on YARN doc. 14/01/18 10:52:09 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: spark-examples-assembly-0.9.0-incubating-SNAPSHOT.jar (No such file or directory)), was the --addJars option

Re: About Spark job web ui persist(JIRA-969)

2014-01-07 Thread Sandy Ryza
dump a series of named paths each with a JSON file. Then the > history server could load those paths and pass them through the second > rendering stage (JSON => XML) to create each page. > > It would be good if SPARK-969 had a good design doc before anyone > starts working on it. &

Re: About Spark job web ui persist(JIRA-969)

2014-01-07 Thread Sandy Ryza
As a sidenote, it would be nice to make sure that whatever done here will work with the YARN Application History Server (YARN-321), a generic history server that functions similarly to MapReduce's JobHistoryServer. It will eventually have the ability to store application-specific data. -Sandy O

Re: Spark streaming quantile?

2013-12-09 Thread Sandy Ryza
t; by Twitter's Algebird< > https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/QTree.scala > >. > > Due to the associative properties of Algebird's SemiGroup it is ideally > > suited for streaming computat

Spark streaming quantile?

2013-12-04 Thread Sandy Ryza
Hi All, We're working on a Spark application that could make use of a computing quantiles in a streaming fashion. Something in the vein of what DataFu has for Pig http://linkedin.github.io/datafu/docs/current/datafu/pig/stats/StreamingQuantile.html . Does anything like this exist in the Spark ec

Re: issue regarding akka, protobuf and Hadoop version

2013-11-06 Thread Sandy Ryza
For my own understanding, is this summary correct? Spark will move to scala 2.10, which means it can support akka 2.3-M1, which supports protobuf 2.5, which will allow Spark to run on Hadoop 2.2. What will be the first Spark version with these changes? Are the Akka features that Spark relies on s

Re: Current master doesn't compile against CDH 4.4.0

2013-10-29 Thread Sandy Ryza
Because YARN was considered unstable, APIs have been rapidly evolving, both in CDH4 and the Apache 2.0.x releases it is based on. CDH5, which was released in beta today, as well as the upstream 2.2 GA release it's based on, will maintain YARN API compatibility going forward. -Sandy On Tue, Oct