Re: Guidance for becoming Spark contributor

2015-04-10 Thread Nicholas Chammas
Have you reviewed this guide? https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Nick On Fri, Apr 10, 2015 at 7:29 PM Nitin Mathur ntnmat...@gmail.com wrote: Hi Spark Dev Team, I want to start contributing to Spark Open source. This is the first time I will be doing

Contributor CLAs

2015-04-07 Thread Nicholas Chammas
I've seen many other OSS projects ask contributors to sign CLAs. I've never seen us do that. I assume it's not an issue, since people opening PRs generally understand what it means. But legally I'm sure there's some danger in taking an implied vs. explicit license to do something. So: Do we need

Re: Contributor CLAs

2015-04-07 Thread Nicholas Chammas
made a contribution, didn't state anything about the license, but did not intend somehow that the work could be licensed as the rest of the project is. For reference Apache projects do not in general require a CLA. On Tue, Apr 7, 2015 at 8:59 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote

Windows CI infra

2015-04-03 Thread Nicholas Chammas
I've seen other projects use Appveyor http://www.appveyor.com/ for CI on Windows. Has anyone used them before? I've seen on more than one occasion something break on Windows without us knowing, so it might be worth looking into using something like this if it's relatively straightforward. Nick

Test all the things (Was: Unit test logs in Jenkins?)

2015-04-02 Thread Nicholas Chammas
:54 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: This is secondary to Marcelo’s question, but I wanted to comment on this: Its main limitation is more cultural than technical: you need to get people to care about intermittent test runs, otherwise you can end up with failures

Re: spark-ec2 default to Hadoop 2

2015-03-02 Thread Nicholas Chammas
shift towards 2.x at least as defaults. On Sun, Mar 1, 2015 at 10:59 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: https://github.com/apache/spark/blob/fd8d283eeb98e310b1e85ef8c3a8af 9e547ab5e0/ec2/spark_ec2.py#L162-L164 Is there any reason we shouldn't update the default Hadoop

spark-ec2 default to Hadoop 2

2015-03-01 Thread Nicholas Chammas
https://github.com/apache/spark/blob/fd8d283eeb98e310b1e85ef8c3a8af9e547ab5e0/ec2/spark_ec2.py#L162-L164 Is there any reason we shouldn't update the default Hadoop major version in spark-ec2 to 2? Nick

Re: [jenkins infra -- pls read ] installing anaconda, moving default python from 2.6 - 2.7

2015-02-23 Thread Nicholas Chammas
The first concern for Spark will probably be to ensure that we still build and test against Python 2.6, since that's the minimum version of Python we support. Otherwise this seems OK. We use numpy and other Python packages in PySpark, but I don't think we're pinned to any particular version of

Re: Improving metadata in Spark JIRA

2015-02-22 Thread Nicholas Chammas
it advance the house-cleaning a bit more, but I'm sure we'd rediscover some important work and issues that need attention. On Sun, Feb 22, 2015 at 7:54 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: As of right now, there are no more open JIRA issues without an assigned component

Git Achievements

2015-02-22 Thread Nicholas Chammas
For fun: http://acha-acha.co/#/repo/https://github.com/apache/spark I just added Spark to this site. Some of these “achievements” are hilarious. Leo Tolstoy: More than 10 lines in a commit message Dangerous Game: Commit after 6PM friday Nick ​

Re: textFile() ordering and header rows

2015-02-22 Thread Nicholas Chammas
I guess on a technicality the docs just say first item in this RDD, not first line in the source text file. AFAIK there is no way apart from filtering to remove header lines http://stackoverflow.com/a/24734612/877069. As long as first() always returns the same value for a given RDD, I think it's

Re: Improving metadata in Spark JIRA

2015-02-21 Thread Nicholas Chammas
for the cleanup! Nick On Sat Feb 07 2015 at 8:29:42 PM Nicholas Chammas nicholas.cham...@gmail.com http://mailto:nicholas.cham...@gmail.com wrote: Oh derp, missed the YARN component. JIRA, does allow admins to make fields mandatory: https://confluence.atlassian.com/display/JIRA/Specifying+Field

Re: Building Spark with Pants

2015-02-14 Thread Nicholas Chammas
FYI: Here is the matching discussion over on the Pants dev list. https://groups.google.com/forum/#!topic/pants-devel/rTaU-iIOIFE On Mon Feb 02 2015 at 4:50:33 PM Nicholas Chammas nicholas.cham...@gmail.com http://mailto:nicholas.cham...@gmail.com wrote: To reiterate, I'm asking from

numpy on PyPy - potential benefit to PySpark

2015-02-11 Thread Nicholas Chammas
Random question for the PySpark and Python experts/enthusiasts on here: How big of a deal would it be for PySpark and PySpark users if you could run numpy on PyPy? PySpark already supports running on PyPy https://github.com/apache/spark/pull/2144, but libraries like MLlib that use numpy are not

Re: 1.2.1 start-all.sh broken?

2015-02-11 Thread Nicholas Chammas
Found it: https://github.com/apache/spark/compare/v1.2.0...v1.2.1#diff-73058f8e51951ec0b4cb3d48ade91a1fR73 GRRR BASH WORD SPLITTING My path has a space in it... Nick On Wed Feb 11 2015 at 2:37:39 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: This is what get: spark-1.2.1-bin

Re: 1.2.1 start-all.sh broken?

2015-02-11 Thread Nicholas Chammas
lol yeah, I changed the path for the email... turned out to be the issue itself. On Wed Feb 11 2015 at 2:43:09 PM Ted Yu yuzhih...@gmail.com wrote: I see. '/path/to/spark-1.2.1-bin-hadoop2.4' didn't contain space :-) On Wed, Feb 11, 2015 at 2:41 PM, Nicholas Chammas nicholas.cham

Re: 1.2.1 start-all.sh broken?

2015-02-11 Thread Nicholas Chammas
The tragic thing here is that I was asked to review the patch that introduced this https://github.com/apache/spark/pull/3377#issuecomment-68077315, and totally missed it... :( On Wed Feb 11 2015 at 2:46:35 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: lol yeah, I changed the path

Re: 1.2.1 start-all.sh broken?

2015-02-11 Thread Nicholas Chammas
got the following working (against a directory with space in its name): #!/usr/bin/env bash OLDIFS=$IFS # save it IFS= # don't split on any white space dir=$1/* for f in $dir; do cat $f done IFS=$OLDIFS # restore IFS Cheers On Wed, Feb 11, 2015 at 2:47 PM, Nicholas Chammas

Re: 1.2.1 start-all.sh broken?

2015-02-11 Thread Nicholas Chammas
-hadoop2.4.0.jar FYI On Wed, Feb 11, 2015 at 2:27 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I just downloaded 1.2.1 pre-built for Hadoop 2.4+ and ran sbin/start-all.sh on my OS X. Failed to find Spark assembly in /path/to/spark-1.2.1-bin-hadoop2.4/lib You need to build Spark

1.2.1 start-all.sh broken?

2015-02-11 Thread Nicholas Chammas
I just downloaded 1.2.1 pre-built for Hadoop 2.4+ and ran sbin/start-all.sh on my OS X. Failed to find Spark assembly in /path/to/spark-1.2.1-bin-hadoop2.4/lib You need to build Spark before running this program. Did the same for 1.2.0 and it worked fine. Nick ​

Re: Keep or remove Debian packaging in Spark?

2015-02-09 Thread Nicholas Chammas
+1 to an official deprecation + redirecting users to some other project that will or already is taking this on. Nate? On Mon Feb 09 2015 at 10:08:27 AM Patrick Wendell pwend...@gmail.com wrote: I have wondered whether we should sort of deprecated it more officially, since otherwise I think

Re: Improving metadata in Spark JIRA

2015-02-08 Thread Nicholas Chammas
: I think we already have a YARN component. https://issues.apache.org/jira/issues/?jql=project%20% 3D%20SPARK%20AND%20component%20%3D%20YARN I don't think JIRA allows it to be mandatory, but if it does, that would be useful. On Sat, Feb 7, 2015 at 5:08 PM, Nicholas Chammas nicholas.cham

Re: Improving metadata in Spark JIRA

2015-02-08 Thread Nicholas Chammas
at 11:53 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Do we need some new components to be added to the JIRA project? Like: - scheduler - YARN - spark-submit - ...? Nick On Fri Feb 06 2015 at 10:50:41 AM Nicholas Chammas nicholas.cham

Re: Using CUDA within Spark / boosting linear algebra

2015-02-08 Thread Nicholas Chammas
Lemme butt in randomly here and say there is an interesting discussion on this Spark PR https://github.com/apache/spark/pull/4448 about netlib-java, JBLAS, Breeze, and other things I know nothing of, that y'all may find interesting. Among the participants is the author of netlib-java. On Sun Feb

Re: Improving metadata in Spark JIRA

2015-02-06 Thread Nicholas Chammas
Do we need some new components to be added to the JIRA project? Like: - scheduler - YARN - spark-submit - …? Nick ​ On Fri Feb 06 2015 at 10:50:41 AM Nicholas Chammas nicholas.cham...@gmail.com wrote: +9000 on cleaning up JIRA. Thank you Sean for laying out some

PSA: Maven supports parallel builds

2015-02-05 Thread Nicholas Chammas
Y’all may already know this, but I haven’t seen it mentioned anywhere in our docs on here and it’s a pretty easy win. Maven supports parallel builds https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3 with the -T command line option. For example: ./build/mvn -T 1C

Re: [VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-03 Thread Nicholas Chammas
I believe this was changed for 1.2.1. Here are the relevant JIRA issues https://issues.apache.org/jira/browse/SPARK-5289?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%201.2.1%20AND%20text%20~%20%22publish%22%20order%20by%20priority . On Tue Feb 03 2015 at 10:43:59 AM Dirceu Semighini Filho

Re: Welcoming three new committers

2015-02-03 Thread Nicholas Chammas
Congratulations guys! On Tue Feb 03 2015 at 2:36:12 PM Matei Zaharia matei.zaha...@gmail.com wrote: Hi all, The PMC recently voted to add three new committers: Cheng Lian, Joseph Bradley and Sean Owen. All three have been major contributors to Spark in the past year: Cheng on Spark SQL,

Re: Building Spark with Pants

2015-02-02 Thread Nicholas Chammas
for sbt and with a little bit of tweaking with maven as well. 2015-02-02 16:25 GMT-08:00 Nicholas Chammas nicholas.cham...@gmail.com: Does anyone here have experience with Pants http://pantsbuild.github.io/index.html or interest in trying to build Spark with it? Pants has an interesting story

Re: Building Spark with Pants

2015-02-02 Thread Nicholas Chammas
2015 at 4:40:45 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: I'm asking from an experimental standpoint; this is not happening anytime soon. Of course, if the experiment turns out very well, Pants would replace both sbt and Maven (like it has at Twitter, for example). Pants also works

Building Spark with Pants

2015-02-02 Thread Nicholas Chammas
Does anyone here have experience with Pants http://pantsbuild.github.io/index.html or interest in trying to build Spark with it? Pants has an interesting story. It was born at Twitter to help them build their Scala, Java, and Python projects as several independent components in one monolithic

Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-31 Thread Nicholas Chammas
Do we have any open JIRA issues to add automated testing on Windows to Jenkins? I assume that's something we want to do. On Sat Jan 31 2015 at 10:37:42 PM Matei Zaharia matei.zaha...@gmail.com wrote: This looks like a pretty serious problem, thanks! Glad people are testing on Windows. Matei

Re: Extending Scala style checks

2015-01-28 Thread Nicholas Chammas
...@databricks.com wrote: Thanks. I added one. On Wed, Oct 8, 2014 at 8:49 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I've created SPARK-3849: Automate remaining Scala style rules https://issues.apache.org/jira/browse/SPARK-3849. Please create sub-tasks on this issue for rules

Standardized Spark dev environment

2015-01-20 Thread Nicholas Chammas
What do y'all think of creating a standardized Spark development environment, perhaps encoded as a Vagrantfile, and publishing it under `dev/`? The goal would be to make it easier for new developers to get started with all the right configs and tools pre-installed. If we use something like

Re: Standardized Spark dev environment

2015-01-20 Thread Nicholas Chammas
Message - From: Nicholas Chammas nicholas.cham...@gmail.com To: Spark dev list dev@spark.apache.org Sent: Tuesday, January 20, 2015 6:13:31 PM Subject: Standardized Spark dev environment What do y'all think of creating a standardized Spark development environment, perhaps encoded

Re: Results of tests

2015-01-09 Thread Nicholas Chammas
Just created: Integrate Python unit tests into Jenkins https://issues.apache.org/jira/browse/SPARK-5178 Nick On Fri Jan 09 2015 at 2:48:48 PM Josh Rosen rosenvi...@gmail.com wrote: The Test Result pages for Jenkins builds shows some nice statistics for the test run, including individual

Re: Spark development with IntelliJ

2015-01-08 Thread Nicholas Chammas
Side question: Should this section https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-IDESetup in the wiki link to Useful Developer Tools https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools? On Thu Jan 08 2015 at 6:19:55 PM Sean Owen

Re: Sample Spark Program Error

2014-12-30 Thread Nicholas Chammas
You sent this to the dev list. Please send it instead to the user list. We use the dev list to discuss development on Spark itself, new features, fixes to known bugs, and so forth. The user list is to discuss issues using Spark, which I believe is what you are looking for. Nick On Tue Dec 30

Re: ANNOUNCE: New build script ./build/mvn

2014-12-27 Thread Nicholas Chammas
Linkies for the curious: - SPARK-4501 https://issues.apache.org/jira/browse/SPARK-4501: Create build/mvn to automatically download maven/zinc/scalac - https://github.com/apache/spark/pull/3707 - New build folder (mvn and sbt): https://github.com/apache/spark/tree/master/build Nick

SQL specification for reference during Spark SQL development

2014-12-26 Thread Nicholas Chammas
Do we have access to the SQL specification (say, SQL-92) for reference during Spark SQL development? I know it's not freely available on the web. Usually, you can only access drafts. I know that, generally, we look to other systems (especially Hive) when figuring out how something in Spark SQL

Re: Starting with Spark

2014-12-25 Thread Nicholas Chammas
://github.com/apache/spark Search with Build Spark with Maven On Thu, Dec 25, 2014 at 1:49 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: The correct docs link is: https://spark.apache.org/docs/1.2.0/building-spark.html Where did you get that bad link from? Nick On Thu Dec 25

Re: Starting with Spark

2014-12-24 Thread Nicholas Chammas
The correct docs link is: https://spark.apache.org/docs/1.2.0/building-spark.html Where did you get that bad link from? Nick On Thu Dec 25 2014 at 12:00:53 AM Naveen Madhire vmadh...@umail.iu.edu wrote: Hi All, I am starting to use Spark. I am having trouble getting the latest code from

Re: Announcing Spark Packages

2014-12-22 Thread Nicholas Chammas
a close look at this and I think we're in good shape her vis-a-vis this policy. - Patrick On Mon, Dec 22, 2014 at 5:29 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Hitesh, From your link: You may not use ASF trademarks such as Apache or ApacheFoo or Foo in your own

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Nicholas Chammas
Does this include contributions made against the spark-ec2 https://github.com/mesos/spark-ec2 repo? On Wed Dec 17 2014 at 12:29:19 AM Patrick Wendell pwend...@gmail.com wrote: Hey All, Due to the very high volume of contributions, we're switching to an automated process for generating

Re: Handling stale PRs

2014-12-21 Thread Nicholas Chammas
Nicholas Chammas nicholas.cham...@gmail.com wrote: I recently came across this blog post, which reminded me of this thread. How to Discourage Open Source Contributions http://danluu.com/discourage-oss/ We are currently at 320+ open PRs, many of which haven't been updated in over a month. We

Re: Scala's Jenkins setup looks neat

2014-12-16 Thread Nicholas Chammas
to give us OAuth keys with repo:status access? Nick On Sat Sep 06 2014 at 1:29:53 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: Aww, that's a bummer... On Sat, Sep 6, 2014 at 1:10 PM, Reynold Xin r...@databricks.com wrote: that would require github hooks permission and unfortunately asf

Re: Scala's Jenkins setup looks neat

2014-12-16 Thread Nicholas Chammas
at 6:02 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: News flash! From the latest version of the GitHub API https://developer.github.com/v3/repos/statuses/: Note that the repo:status OAuth scope https://developer.github.com/v3/oauth/#scopes grants targeted access

Re: Scala's Jenkins setup looks neat

2014-12-16 Thread Nicholas Chammas
://issues.apache.org/jira/browse/INFRA-7918 On Tue, Dec 16, 2014 at 6:23 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Actually, reading through the existing issue opened for this https://issues.apache.org/jira/browse/INFRA-7367 back in February, I don’t see any explanation from ASF

Archiving XML test reports for analysis

2014-12-15 Thread Nicholas Chammas
Every time we run a test cycle on our Jenkins cluster, we generate hundreds of XML reports covering all the tests we have (e.g. `streaming/target/test-reports/org.apache.spark.streaming.util.WriteAheadLogSuite.xml`). These reports contain interesting information about whether tests succeeded or

Re: Archiving XML test reports for analysis

2014-12-15 Thread Nicholas Chammas
request builder? what others? On Mon, Dec 15, 2014 at 1:33 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Every time we run a test cycle on our Jenkins cluster, we generate hundreds of XML reports covering all the tests we have (e.g. `streaming/target/test-reports

Re: Spark JIRA Report

2014-12-14 Thread Nicholas Chammas
, 2014 at 11:31 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: What do y’all think of a report like this emailed out to the dev list on a monthly basis? The goal would be to increase visibility into our open issues and encourage developers to tend to our issue tracker more frequently

Re: Spark JIRA Report

2014-12-14 Thread Nicholas Chammas
https://issues.apache.org/jira/browse/SPARK-636: Add mechanism to run system management/configuration tasks on all workers Andrew, Does that seem more useful? Nick ​ On Sun Dec 14 2014 at 3:20:54 AM Nicholas Chammas nicholas.cham...@gmail.com wrote: I formatted this report using

Spark JIRA Report

2014-12-13 Thread Nicholas Chammas
What do y’all think of a report like this emailed out to the dev list on a monthly basis? The goal would be to increase visibility into our open issues and encourage developers to tend to our issue tracker more frequently. Nick There are 1,236 unresolved issues

Re: Is Apache JIRA down?

2014-12-10 Thread Nicholas Chammas
Nevermind, seems to be back up now. On Wed Dec 10 2014 at 7:46:30 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: For example: https://issues.apache.org/jira/browse/SPARK-3431 Where do we report/track issues with JIRA itself being down? Nick

Re: Is this a little bug in BlockTransferMessage ?

2014-12-09 Thread Nicholas Chammas
So all this time the tests that Jenkins has been running via Jenkins and SBT + ScalaTest... those haven't been running any of the Java unit tests? SPARK-4159 https://issues.apache.org/jira/browse/SPARK-4159 only mentions Maven as a problem, but I'm wondering how these tests got through Jenkins

Re: Is this a little bug in BlockTransferMessage ?

2014-12-09 Thread Nicholas Chammas
on SPARK-4159. On Tue, Dec 9, 2014 at 11:30 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: So all this time the tests that Jenkins has been running via Jenkins and SBT + ScalaTest... those haven't been running any of the Java unit tests? SPARK-4159 only mentions Maven

Re: Handling stale PRs

2014-12-08 Thread Nicholas Chammas
went out to the dev list once a week that a) reported the number of stale PRs, and b) directly linked to the 5 least recently updated PRs? Nick On Sat Aug 30 2014 at 3:41:39 AM Nicholas Chammas nicholas.cham...@gmail.com wrote: On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com

Re: Handling stale PRs

2014-12-08 Thread Nicholas Chammas
-obvious things we (as contributors) could do to make the committers¹ lives easier? Thanks! -Ilya On 12/8/14, 11:58 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I recently came across this blog post, which reminded me of this thread. How to Discourage Open Source Contributions

Re: Unit tests in 5 minutes

2014-12-06 Thread Nicholas Chammas
Ted, I posted some updates https://issues.apache.org/jira/browse/SPARK-3431?focusedCommentId=14236540page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14236540 on JIRA on my progress (or lack thereof) getting SBT to parallelize test suites properly. I'm currently stuck

zinc invocation examples

2014-12-04 Thread Nicholas Chammas
https://github.com/apache/spark/blob/master/docs/building-spark.md#speeding-up-compilation-with-zinc Could someone summarize how they invoke zinc as part of a regular build-test-etc. cycle? I'll add it in to the aforelinked page if appropriate. Nick

Re: zinc invocation examples

2014-12-04 Thread Nicholas Chammas
to do anything for each build. On Wed, Dec 3, 2014 at 3:44 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: https://github.com/apache/spark/blob/master/docs/ building-spark.md#speeding-up-compilation-with-zinc Could someone summarize how they invoke zinc as part of a regular build

Re: Unit tests in 5 minutes

2014-12-04 Thread Nicholas Chammas
it (either on this thread or in the JIRA issue). Nick ​ On Sun Sep 07 2014 at 8:28:51 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: On Fri, Aug 8, 2014 at 1:12 PM, Reynold Xin r...@databricks.com wrote: Nick, Would you like to file a ticket to track this? SPARK-3431 https

Re: Spurious test failures, testing best practices

2014-11-30 Thread Nicholas Chammas
- currently the docs only contain information about building with maven, and even then don’t cover many important cases All other points aside, I just want to point out that the docs document both how to use Maven and SBT and clearly state

Re: Time taken to merge Spark PR's?

2014-11-27 Thread Nicholas Chammas
1.1.1 was just released, and 1.2 is close to a release. That, plus Thanksgiving in the US (where most Spark committers AFAIK are located), probably means a temporary lull in committer activity on non-critical items should be expected. On Mon Nov 24 2014 at 9:33:27 AM York, Brennon

Re: Troubleshooting JVM OOM during Spark Unit Tests

2014-11-22 Thread Nicholas Chammas
, Nicholas Chammas nicholas.cham...@gmail.com wrote: Howdy folks, I’m trying to understand why I’m getting “insufficient memory” errors when trying to run Spark Units tests within a CentOS Docker container. I’m building Spark and running the tests as follows: # build sbt/sbt -Pyarn -Phadoop

Troubleshooting JVM OOM during Spark Unit Tests

2014-11-21 Thread Nicholas Chammas
Howdy folks, I’m trying to understand why I’m getting “insufficient memory” errors when trying to run Spark Units tests within a CentOS Docker container. I’m building Spark and running the tests as follows: # build sbt/sbt -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive

Re: mvn or sbt for studying and developing Spark?

2014-11-17 Thread Nicholas Chammas
The docs on using sbt are here: https://github.com/apache/spark/blob/master/docs/building-spark.md#building-with-sbt They'll be published with 1.2.0 presumably. On 2014년 11월 17일 (월) at 오후 2:49 Michael Armbrust mich...@databricks.com wrote: * I moved from sbt to maven in June specifically due

Re: JIRA + PR backlog

2014-11-11 Thread Nicholas Chammas
Yeah, kudos to Josh for putting that together. On Tue, Nov 11, 2014 at 3:26 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: Great jobs! I didn't know Spark PR Dashboard. Thanks Yu Ishikawa - -- Yu Ishikawa -- View this message in context:

Re: JIRA + PR backlog

2014-11-11 Thread Nicholas Chammas
or the wiki... On Tue, Nov 11, 2014 at 12:23 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Yeah, kudos to Josh for putting that together. On Tue, Nov 11, 2014 at 3:26 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: Great jobs! I didn't know Spark PR Dashboard. Thanks

Re: Replacing Spark's native scheduler with Sparrow

2014-11-10 Thread Nicholas Chammas
On Sun, Nov 9, 2014 at 1:51 AM, Tathagata Das tathagata.das1...@gmail.com wrote: This causes a scalability vs. latency tradeoff - if your limit is 1000 tasks per second (simplifying from 1500), you could either configure it to use 100 receivers at 100 ms batches (10 blocks/sec), or 1000

Re: EC2 clusters ready in launch time + 30 seconds

2014-11-08 Thread Nicholas Chammas
AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Thanks for posting that script, Patrick. It looks like a good place to start. Regarding Docker vs. Packer, as I understand it you can use Packer to create Docker containers at the same time as AMIs and other image types. Nick

Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Nicholas Chammas
I just watched Kay's talk from 2013 on Sparrow https://www.youtube.com/watch?v=ayjH_bG-RC0. Is replacing Spark's native scheduler with Sparrow still on the books? The Sparrow repo https://github.com/radlab/sparrow hasn't been updated recently, and I don't see any JIRA issues about it. It would

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Nicholas Chammas
If, for example, you have a cluster of 100 machines, this means the scheduler can launch 150 tasks per machine per second. Did you mean 15 tasks per machine per second here? Or alternatively, 10 machines? I don't know of any existing Spark clusters that have a large enough number of

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Nicholas Chammas
Sounds good. I'm looking forward to tracking improvements in this area. Also, just to connect some more dots here, I just remembered that there is currently an initiative to add an IndexedRDD https://issues.apache.org/jira/browse/SPARK-2365 interface. Some interesting use cases mentioned there

Re: Replacing Spark's native scheduler with Sparrow

2014-11-07 Thread Nicholas Chammas
to complete in milliseconds. So it looks like I misunderstood the current cost of task initialization. It's already as low as 5ms (and not 100ms)? Nick On Fri, Nov 7, 2014 at 11:15 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: On Fri, Nov 7, 2014 at 8:04 PM, Nicholas Chammas

Re: JIRA + PR backlog

2014-11-06 Thread Nicholas Chammas
I think better tooling will make it much easier for committers to trim the list of stale JIRA issues and PRs. Convenience enables action. - Spark PR Dashboard https://spark-prs.appspot.com/: Additional filters for stale PRs https://github.com/databricks/spark-pr-dashboard/issues/1 or PRs

Re: Using partitioning to speed up queries in Shark

2014-11-06 Thread Nicholas Chammas
Did you mean to send this to the user list? This is the dev list, where we discuss things related to development on Spark itself. On Thu, Nov 6, 2014 at 5:01 PM, Gordon Benjamin gordon.benjami...@gmail.com wrote: Hi All, I'm using Spark/Shark as the foundation for some reporting that I'm

Re: Surprising Spark SQL benchmark

2014-11-05 Thread Nicholas Chammas
On Fri, Oct 31, 2014 at 3:45 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I believe that benchmark has a pending certification on it. See http://sortbenchmark.org under Process. Regarding this comment, Reynold has just announced that this benchmark is now certified

Re: Surprising Spark SQL benchmark

2014-11-05 Thread Nicholas Chammas
Steve Nunez, I believe the information behind the links below should address your concerns earlier about Databricks's submission to the Daytona Gray benchmark. On Wed, Nov 5, 2014 at 6:43 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: On Fri, Oct 31, 2014 at 3:45 PM, Nicholas Chammas

Re: Surprising Spark SQL benchmark

2014-11-05 Thread Nicholas Chammas
forgetting to mention that the last record was held by a 2001 Toyota Celica. - Steve From: Nicholas Chammas nicholas.cham...@gmail.com Date: Wednesday, November 5, 2014 at 15:56 To: Steve Nunez snu...@hortonworks.com Cc: Patrick Wendell pwend...@gmail.com, dev dev@spark.apache.org Subject: Re

create_image.sh contains broken hadoop web link

2014-11-05 Thread Nicholas Chammas
As part of my work for SPARK-3821 https://issues.apache.org/jira/browse/SPARK-3821, I tried building an AMI today using create_image.sh. This line https://github.com/mesos/spark-ec2/blob/f6773584dd71afc49f1225be48439653313c0341/create_image.sh#L68 appears to be broken now (it wasn’t a week or so

Re: create_image.sh contains broken hadoop web link

2014-11-05 Thread Nicholas Chammas
? http://search-hadoop.com/m/LgpTk2Pnw6O/andrew+apache+mirrorsubj=Re+All+mirrored+download+links+from+the+Apache+Hadoop+site+are+broken Cheers On Wed, Nov 5, 2014 at 7:36 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: As part of my work for SPARK-3821 https://issues.apache.org/jira

Re: create_image.sh contains broken hadoop web link

2014-11-05 Thread Nicholas Chammas
Yup, I just stumbled on that. I'll submit a PR to fix that link. Thanks Ted. On Wed, Nov 5, 2014 at 11:13 PM, Ted Yu yuzhih...@gmail.com wrote: The artifacts are in archive: http://archive.apache.org/dist/hadoop/common/hadoop-2.4.1/ Cheers On Nov 5, 2014, at 8:07 PM, Nicholas Chammas

Re: Build fails on master (f90ad5d)

2014-11-04 Thread Nicholas Chammas
FWIW, the official build instructions are here: https://github.com/apache/spark#building-spark On Tue, Nov 4, 2014 at 5:11 PM, Ted Yu yuzhih...@gmail.com wrote: I built based on this commit today and the build was successful. What command did you use ? Cheers On Tue, Nov 4, 2014 at 2:08

Re: Build fails on master (f90ad5d)

2014-11-04 Thread Nicholas Chammas
is not available at port 3030 - reverting to normal incremental compile Alex On Tue, Nov 4, 2014 at 3:11 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: FWIW, the official build instructions are here: https://github.com/apache/spark#building-spark On Tue, Nov 4, 2014 at 5:11 PM, Ted Yu

Re: Build fails on master (f90ad5d)

2014-11-04 Thread Nicholas Chammas
in downloading and building dependencies. Anyway, if sbt is supported it would be great to add docs about somewhere, especially since, as you point out, most devs are using it. Thanks for your help. Alex On Tue, Nov 4, 2014 at 5:42 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote

Re: branch-1.2 has been cut

2014-11-03 Thread Nicholas Chammas
Minor question, but when would be the right time to update the default Spark version https://github.com/apache/spark/blob/76386e1a23c55a58c0aeea67820aab2bac71b24b/ec2/spark_ec2.py#L42 in the EC2 script? On Mon, Nov 3, 2014 at 3:55 AM, Patrick Wendell pwend...@gmail.com wrote: Hi All, I've

Re: Surprising Spark SQL benchmark

2014-11-01 Thread Nicholas Chammas
into documenting it, it's still hard to reproduce :( On Friday, October 31, 2014, Nicholas Chammas nicholas.cham...@gmail.com javascript:_e(%7B%7D,'cvml','nicholas.cham...@gmail.com'); wrote: I believe that benchmark has a pending certification on it. See http://sortbenchmark.org under Process. It's true

Re: Surprising Spark SQL benchmark

2014-11-01 Thread Nicholas Chammas
.; we'll share the code on the list as soon as we're done. -Kay On Fri, Oct 31, 2014 at 12:45 PM, Nicholas Chammas nicholas.cham...@gmail.com javascript:_e(%7B%7D,'cvml','nicholas.cham...@gmail.com'); wrote: I believe that benchmark has a pending certification on it. See http

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Nicholas Chammas
to this or any other similar vendor benchmark. - Patrick On Fri, Oct 31, 2014 at 10:38 AM, Nicholas Chammas nicholas.cham...@gmail.com javascript:; wrote: I know we don't want to be jumping at every benchmark someone posts out there, but this one surprised me: http://www.citusdata.com

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Nicholas Chammas
and publish enough information and code to let others repeat the exercise easily. - Steve On 10/31/14, 11:30, Nicholas Chammas nicholas.cham...@gmail.com javascript:; wrote: Thanks for the response, Patrick. I guess the key takeaways are 1) the tuning/config details are everything

Building and Running Spark on OS X

2014-10-20 Thread Nicholas Chammas
If one were to put together a short but comprehensive guide to setting up Spark to run locally on OS X, would it look like this? # Install Maven. On OS X, we suggest using Homebrew. brew install maven # Set some important Java and Maven environment variables.export

Re: Building and Running Spark on OS X

2014-10-20 Thread Nicholas Chammas
usually use SBT on Mac and that one doesn't require any setup ... On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: If one were to put together a short but comprehensive guide to setting up Spark to run locally on OS X, would it look like this? # Install

Re: Building and Running Spark on OS X

2014-10-20 Thread Nicholas Chammas
even have to brew install it. Surely SBT isn't in the dev tools even? I recall I had to install it. I'd be surprised to hear it required zero setup. On Mon, Oct 20, 2014 at 8:04 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Yeah, I would use sbt too, but I thought if I wanted to publish

Re: Building and Running Spark on OS X

2014-10-20 Thread Nicholas Chammas
So back to my original question... :) If we wanted to post this guide to the user list or to a gist for easy reference, would we rather have Maven or SBT listed? And is there anything else about the steps that should be modified? Nick On Mon, Oct 20, 2014 at 8:25 PM, Sean Owen

Using Docker to Parallelize Tests

2014-10-17 Thread Nicholas Chammas
https://news.ycombinator.com/item?id=8471812 The parent thread has lots of interesting use cases for Docker, and the linked comment seems most relevant to our testing predicament. I might look into this after I finish something presentable with Packer and our EC2 scripts, but if anyone else is

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread Nicholas Chammas
is that the frequency that these happen has decreased significantly (3 in the past ~18hr). seems like the git plugin downgrade has helped relieve the problem, but hasn't fixed it. i'll be looking in to this more today. On Wed, Oct 15, 2014 at 7:05 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread Nicholas Chammas
On Thu, Oct 16, 2014 at 3:55 PM, shane knapp skn...@berkeley.edu wrote: i really, truly hate non-deterministic failures. Amen bruddah.

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-15 Thread Nicholas Chammas
I support this effort. :thumbsup: On Wed, Oct 15, 2014 at 4:52 PM, shane knapp skn...@berkeley.edu wrote: i'm going to be downgrading our git plugin (from 2.2.7 to 2.2.2) to see if that helps w/the git fetch timeouts. this will require a short downtime (~20 mins for builds to finish, ~20

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-15 Thread Nicholas Chammas
. :crossestoes: :) On Wed, Oct 15, 2014 at 2:19 PM, shane knapp skn...@berkeley.edu wrote: ok, we're up and building... :crossesfingersfortheumpteenthtime: On Wed, Oct 15, 2014 at 1:59 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I support this effort. :thumbsup

<    1   2   3   4   5   >