Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-11-30 Thread GuoQiang Li
+1 (non-binding‍) -- Original -- From: "Patrick Wendell";; Date: Sat, Nov 29, 2014 01:16 PM To: "dev@spark.apache.org"; Subject: [VOTE] Release Apache Spark 1.2.0 (RC1) Please vote on releasing the following candidate as Apache Spark version 1.2.0! The

Spurious test failures, testing best practices

2014-11-30 Thread Ryan Williams
In the course of trying to make contributions to Spark, I have had a lot of trouble running Spark's tests successfully. The main pain points I've experienced are: 1) frequent, spurious test failures 2) high latency of running tests 3) difficulty running specific tests in an iterative f

Re: Spurious test failures, testing best practices

2014-11-30 Thread York, Brennon
+1, you aren¹t alone in this. I certainly would like some clarity in these things well, but, as its been said on this listserv a few times (and you noted), most developers use `sbt` for their day-to-day compilations to greatly speed up the iterative testing process. I personally use `sbt` for all b

Re: Spurious test failures, testing best practices

2014-11-30 Thread Matei Zaharia
Hi Ryan, As a tip (and maybe this isn't documented well), I normally use SBT for development to avoid the slow build process, and use its interactive console to run only specific tests. The nice advantage is that SBT can keep the Scala compiler loaded and JITed across builds, making it faster t

Re: Spurious test failures, testing best practices

2014-11-30 Thread Ryan Williams
thanks for the info, Matei and Brennon. I will try to switch my workflow to using sbt. Other potential action items: - currently the docs only contain information about building with maven, and even then don't cover many important cases, as I described in my previous email. If SBT is as much bette

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
Hey Ryan, A few more things here. You should feel free to send patches to Jenkins to test them, since this is the reference environment in which we regularly run tests. This is the normal workflow for most developers and we spend a lot of effort provisioning/maintaining a very large jenkins cluste

Re: Spurious test failures, testing best practices

2014-11-30 Thread Nicholas Chammas
- currently the docs only contain information about building with maven, and even then don’t cover many important cases All other points aside, I just want to point out that the docs document both how to use Maven and SBT and clearly state

Re: Spurious test failures, testing best practices

2014-11-30 Thread Mark Hamstra
> > - Start the SBT interactive console with sbt/sbt > - Build your assembly by running the "assembly" target in the assembly > project: assembly/assembly > - Run all the tests in one module: core/test > - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this > also supports tab

Re: Spurious test failures, testing best practices

2014-11-30 Thread Ryan Williams
Thanks Nicholas, glad to hear that some of this info will be pushed to the main site soon, but this brings up yet another point of confusion that I've struggled with, namely whether the documentation on github or that on spark.apache.org should be considered the primary reference for people seeking

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
Hey Ryan, The existing JIRA also covers publishing nightly docs: https://issues.apache.org/jira/browse/SPARK-1517 - Patrick On Sun, Nov 30, 2014 at 5:53 PM, Ryan Williams wrote: > Thanks Nicholas, glad to hear that some of this info will be pushed to the > main site soon, but this brings up yet

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
Btw - the documnetation on github represents the source code of our docs, which is versioned with each release. Unfortunately github will always try to render ".md" files so it could look to a passerby like this is supposed to represent published docs. This is a feature limitation of github, AFAIK

Re: Spurious test failures, testing best practices

2014-11-30 Thread Ryan Williams
Thanks Mark, most of those commands are things I've been using and used in my original post except for "Start zinc". I now see the section about it on the "unpublished" building-spark page and wil

Re: [RESULT] [VOTE] Designating maintainers for some Spark components

2014-11-30 Thread Matei Zaharia
An update on this: After adding the initial maintainer list, we got feedback to add more maintainers for some components, so we added four others (Josh Rosen for core API, Mark Hamstra for scheduler, Shivaram Venkataraman for MLlib and Xiangrui Meng for Python). We also decided to lower the "tim

Re: Spurious test failures, testing best practices

2014-11-30 Thread Ryan Williams
Thanks Patrick, great to hear that docs-snapshots-via-jenkins is already JIRA'd; you can interpret some of this thread as a gigantic +1 from me on prioritizing that, which it looks like you are doing :) I do understand the limitations of the "github vs. official site" status quo; I was mostly resp

Re: Spurious test failures, testing best practices

2014-11-30 Thread Ganelin, Ilya
Hi, Patrick - with regards to testing on Jenkins, is the process for this to submit a pull request for the branch or is there another interface we can use to submit a build to Jenkins for testing? On 11/30/14, 6:49 PM, "Patrick Wendell" wrote: >Hey Ryan, > >A few more things here. You should fee

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
Hi Ilya - you can just submit a pull request and the way we test them is to run it through jenkins. You don't need to do anything special. On Sun, Nov 30, 2014 at 8:57 PM, Ganelin, Ilya wrote: > Hi, Patrick - with regards to testing on Jenkins, is the process for this > to submit a pull request f

Re: [mllib] Which is the correct package to add a new algorithm?

2014-11-30 Thread Yu Ishikawa
Hi Joseph, Thank you for your nice work and telling us the draft! > During the next development cycle, new algorithms should be contributed to > spark.mllib. Optionally, wrappers for new (and old) algorithms can be > contributed to spark.ml. I understand that we should contribute new algori