Notes on writing complex spark applications

2014-11-23 Thread Evan R. Sparks
Hi all, Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been working on a short document about writing high performance Spark applications based on our experience developing MLlib, GraphX, ml-matrix, pipelines, etc. It may be a useful document both for users and new Spark

Re: Notes on writing complex spark applications

2014-11-23 Thread andy petrella
Cool! On Sun Nov 23 2014 at 5:58:03 PM Evan R. Sparks evan.spa...@gmail.com wrote: Hi all, Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been working on a short document about writing high performance Spark applications based on our experience developing MLlib, GraphX,

Re: Notes on writing complex spark applications

2014-11-23 Thread Sam Bessalah
Thanks Evan, this is great. On Nov 23, 2014 5:58 PM, Evan R. Sparks evan.spa...@gmail.com wrote: Hi all, Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been working on a short document about writing high performance Spark applications based on our experience developing

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-23 Thread Debasish Das
-1 from me...same FetchFailed issue as what Hector saw... I am running Netflix dataset and dumping out recommendation for all users. It shuffles around 100 GB data on disk to run a reduceByKey per user on utils.BoundedPriorityQueue...The code runs fine with MovieLens1m dataset... I gave Spark 10

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-23 Thread Patrick Wendell
+1 (binding). Don't see any evidence of regressions at this point. The issue reported by Hector was not related to this rlease. On Sun, Nov 23, 2014 at 9:50 AM, Debasish Das debasish.da...@gmail.com wrote: -1 from me...same FetchFailed issue as what Hector saw... I am running Netflix dataset

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-23 Thread Stephen Haberman
Hi, I wanted to try 1.1.1-rc2 because we're running into SPARK-3633, but therc releases not being tagged with -rcX means the pre-built artifacts are basically useless to me. (Pedantically, to test a release, I have to upload it into our internal repo, to compile jobs, start clusters, etc.

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-23 Thread Matei Zaharia
Interesting, perhaps we could publish each one with two IDs, of which the rc one is unofficial. The problem is indeed that you have to vote on a hash for a potentially final artifact. Matei On Nov 23, 2014, at 7:54 PM, Stephen Haberman stephen.haber...@gmail.com wrote: Hi, I wanted to

Re: Notes on writing complex spark applications

2014-11-23 Thread Inkyu Lee
Very helpful!! thank you very much! 2014-11-24 2:17 GMT+09:00 Sam Bessalah samkiller@gmail.com: Thanks Evan, this is great. On Nov 23, 2014 5:58 PM, Evan R. Sparks evan.spa...@gmail.com wrote: Hi all, Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been working

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-23 Thread Patrick Wendell
Hey Stephen, Thanks for bringing this up. Technically when we call a release vote it needs to be on the exact commit that will be the final release. However, one thing I've thought of doing for a while would be to publish the maven artifacts using a version tag with $VERSION-rcX even if the

Re: Notes on writing complex spark applications

2014-11-23 Thread Patrick Wendell
Hey Evan, It might be nice to merge this into existing documentation. In particular, a lot of this could serve to update the current tuning section and programming guides. It could also work to paste this wholesale as a reference for Spark users, but in that case it's less likely to get updated

2 spark streaming questions

2014-11-23 Thread tian zhang
Hi, Dear Spark Streaming Developers and Users, We are prototyping using spark streaming and hit the following 2 issues thatI would like to seek your expertise. 1) We have a spark streaming application in scala, that reads  data from Kafka intoa DStream, does some processing and output a

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-23 Thread Stephen Haberman
Awesome, sounds great, guys; thanks for understanding. Depending on how badly I need 1.1.1-rc2 (I'll check my jobs tomorrow) I'll just build a local version for now. Should be easy, it's just been awhile. :-) Thanks, Stephen On Sun Nov 23 2014 at 11:01:09 PM Patrick Wendell pwend...@gmail.com

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-23 Thread Stephen Haberman
http://maven.apache.org/plugins/maven-install-plugin/ examples/specific-local-repo.html Hm, I didn't know about that plugin--assuming it does all of the jar/pom/sources/etc., then, yes, that could work... At first glance, I'm not sure it'll bring over the pom with all of the transitive