Re: Which part of the code deals with communication?

2014-10-22 Thread Patrick Wendell
The best documentation about communication interfaces is the SecurityManager doc written by Tom Graves. With this as a starting point I'd recommend digging through the code for each component.

Re: scalastyle annoys me a little bit

2014-10-23 Thread Patrick Wendell
Hey Koert, I think disabling the style checks in maven package could be a good idea for the reason you point out. I was sort of mixed on that when it was proposed for this exact reason. It's just annoying to developers. In terms of changing the global limit, this is more religion than anything

Spark 1.2 feature freeze on November 1

2014-10-23 Thread Patrick Wendell
Hey All, Just a reminder that as planned [1] we'll go into a feature freeze on November 1. On that date I'll cut a 1.2 release branch and make the up-or-down call on any patches that go into that branch, along with individual committers. It is common for us to receive a very large volume of

Re: your weekly git timeout update! TL;DR: i'm now almost certain we're not hitting rate limits.

2014-10-24 Thread Patrick Wendell
Thanks for the update Shane. As a point of process, for things like this where we re debugging specific issues - can we use JIRA instead of notifying everyone on the spark-dev list? I'd prefer if ops/infra announcements on the dev list are restricted to things that are widely applicable to

Re: Moving PR Builder to mvn

2014-10-24 Thread Patrick Wendell
compilation to be much better than before I had it running. Is the sbt build still faster (sorry, long time since I did a build with sbt). Thanks, Hari On Fri, Oct 24, 2014 at 1:46 PM, Patrick Wendell pwend...@gmail.com wrote: Overall I think this would be a good idea. The main blocker

Re: Support Hive 0.13 .1 in Spark SQL

2014-10-28 Thread Patrick Wendell
Hey Cheng, Right now we aren't using stable API's to communicate with the Hive Metastore. We didn't want to drop support for Hive 0.12 so right now we are using a shim layer to support compiling for 0.12 and 0.13. This is very costly to maintain. If Hive has a stable meta-data API for talking to

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Patrick Wendell
Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Patrick Wendell
. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Patrick Wendell
Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Patrick Wendell
to do this? On Tue, Oct 28, 2014 at 9:57 PM, Patrick Wendell pwend...@gmail.com wrote: I just started a totally fresh IntelliJ project importing from our root pom. I used all the default options and I added hadoop-2.4, hive, hive-0.13.1 profiles. I was able to run spark core tests from within

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Patrick Wendell
Oops - I actually should have added v0.13.0 (i.e. to match whatever I did in the profile). On Tue, Oct 28, 2014 at 10:05 PM, Patrick Wendell pwend...@gmail.com wrote: Cheng - to make it recognize the new HiveShim for 0.12 I had to click on spark-hive under packages in the left pane, then go

Re: How to run tests properly?

2014-10-29 Thread Patrick Wendell
One thing is you need to do a maven package before you run tests. The local-cluster tests depend on Spark already being packaged. - Patrick On Wed, Oct 29, 2014 at 10:02 AM, Niklas Wilcke 1wil...@informatik.uni-hamburg.de wrote: Hi Sean, thanks for your reply. The tests still don't work. I

Re: How to run tests properly?

2014-10-30 Thread Patrick Wendell
packaging would only build the jar and place it in the target folder. How does that affect the tests? If tests depend on the assembly a mvn install would be more sensible to me. Probably I misunderstand the maven build life-cycle. Thanks, Niklas On 29.10.2014 19:01, Patrick Wendell wrote: One

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Patrick Wendell
Hey Nick, Unfortunately Citus Data didn't contact any of the Spark or Spark SQL developers when running this. It is really easy to make one system look better than others when you are running a benchmark yourself because tuning and sizing can lead to a 10X performance improvement. This benchmark

Changes to Spark's networking subsystem

2014-11-01 Thread Patrick Wendell
== Short version == A recent commit replaces Spark's networking subsystem with one based on Netty rather than raw sockets. Users running off of master can disable this change by setting spark.shuffle.blockTransferService=nio. We will be testing with this during the QA period for Spark 1.2. The new

Re: sbt scala compiler crashes on spark-sql

2014-11-02 Thread Patrick Wendell
Does this happen if you clean and recompile? I've seen failures on and off, but haven't been able to find one that I could reproduce from a clean build such that we could hand it to the scala team. - Patrick On Sun, Nov 2, 2014 at 7:25 PM, Imran Rashid im...@therashids.com wrote: I'm finding

Re: sbt scala compiler crashes on spark-sql

2014-11-02 Thread Patrick Wendell
versa. A clean rebuild can always solve this. On Mon, Nov 3, 2014 at 11:28 AM, Patrick Wendell pwend...@gmail.com wrote: Does this happen if you clean and recompile? I've seen failures on and off, but haven't been able to find one that I could reproduce from a clean build such that we

branch-1.2 has been cut

2014-11-03 Thread Patrick Wendell
Hi All, I've just cut the release branch for Spark 1.2, consistent with then end of the scheduled feature window for the release. New commits to master will need to be explicitly merged into branch-1.2 in order to be in the release. This begins the transition into a QA period for Spark 1.2, with

Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Patrick Wendell
I'm a +1 on this as well, I think it will be a useful model as we scale the project in the future and recognizes some informal process we have now. To respond to Sandy's comment: for changes that fall in between the component boundaries or are straightforward, my understanding of this model is

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Patrick Wendell
I think new committers might or might not be maintainers (it would depend on the PMC vote). I don't think it would affect what you could merge, you can merge in any part of the source tree, you just need to get sign off if you want to touch a public API or make major architectural changes. Most

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Patrick Wendell
Hey Greg, Regarding subversion - I think the reference is to partial vs full committers here: https://subversion.apache.org/docs/community-guide/roles.html - Patrick On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein gst...@gmail.com wrote: -1 (non-binding) This is an idea that runs COMPLETELY

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Patrick Wendell
In fact, if you look at the subversion commiter list, the majority of people here have commit access only for particular areas of the project: http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Greg

Re: Should new YARN shuffle service work with yarn-alpha?

2014-11-07 Thread Patrick Wendell
I bet it doesn't work. +1 on isolating it's inclusion to only the newer YARN API's. - Patrick On Fri, Nov 7, 2014 at 11:43 PM, Sean Owen so...@cloudera.com wrote: I noticed that this doesn't compile: mvn -Pyarn-alpha -Phadoop-0.23 -Dhadoop.version=0.23.7 -DskipTests clean package [error]

Re: Should new YARN shuffle service work with yarn-alpha?

2014-11-08 Thread Patrick Wendell
refactoring needed? Either to support YARN alpha as a separate shuffle module, or sever this dependency? Of course this goes away when yarn-alpha goes away too. On Sat, Nov 8, 2014 at 7:45 AM, Patrick Wendell pwend...@gmail.com wrote: I bet it doesn't work. +1 on isolating it's inclusion

Re: Should new YARN shuffle service work with yarn-alpha?

2014-11-08 Thread Patrick Wendell
. That makes yarn-alpha work. I'll run tests and open a quick JIRA / PR for the change. On Sat, Nov 8, 2014 at 8:23 AM, Patrick Wendell pwend...@gmail.com wrote: This second error is something else. Maybe you are excluding network-shuffle instead of spark-network-yarn

Re: JIRA + PR backlog

2014-11-11 Thread Patrick Wendell
I wonder if we should be linking to that dashboard somewhere from our official docs or the wiki... On Tue, Nov 11, 2014 at 12:23 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Yeah, kudos to Josh for putting that together. On Tue, Nov 11, 2014 at 3:26 AM, Yu Ishikawa

[NOTICE] [BUILD] Minor changes to Spark's build

2014-11-11 Thread Patrick Wendell
Hey All, I've just merged a patch that adds support for Scala 2.11 which will have some minor implications for the build. These are due to the complexities of supporting two versions of Scala in a single project. 1. The JDBC server will now require a special flag to build -Phive-thriftserver on

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Patrick Wendell
, IntelliJ will temporarily think things like the Kafka module are being removed. Say 'no' when it asks if you want to remove them. - Can we go straight to Scala 2.11.4? On Wed, Nov 12, 2014 at 5:47 AM, Patrick Wendell pwend...@gmail.com wrote: Hey All, I've just merged a patch that adds

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Patrick Wendell
scrapco...@gmail.com wrote: One thing we can do it is print a helpful error and break. I don't know about how this can be done, but since now I can write groovy inside maven build so we have more control. (Yay!!) Prashant Sharma On Thu, Nov 13, 2014 at 12:05 PM, Patrick Wendell pwend

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Patrick Wendell
: Currently there are no mandatory profiles required to build Spark. I.e. mvn package just works. It seems sad that we would need to break this. On Wed, Nov 12, 2014 at 10:59 PM, Patrick Wendell pwend...@gmail.com wrote: I think printing an error that says -Pscala-2.10 must be enabled

Has anyone else observed this build break?

2014-11-14 Thread Patrick Wendell
A recent patch broke clean builds for me, I am trying to see how widespread this issue is and whether we need to revert the patch. The error I've seen is this when building the examples project: spark-examples_2.10: Could not resolve dependencies for project

Re: Has anyone else observed this build break?

2014-11-14 Thread Patrick Wendell
A work around for this fix is identified here: http://dbknickerbocker.blogspot.com/2013/04/simple-fix-to-missing-toolsjar-in-jdk.html However, if this affects more users I'd prefer to just fix it properly in our build. On Fri, Nov 14, 2014 at 12:17 PM, Patrick Wendell pwend...@gmail.com wrote

Re: Has anyone else observed this build break?

2014-11-14 Thread Patrick Wendell
this can fix it? Thanks, Hari On Fri, Nov 14, 2014 at 12:21 PM, Patrick Wendell pwend...@gmail.com wrote: A work around for this fix is identified here: http://dbknickerbocker.blogspot.com/2013/04/simple-fix-to-missing-toolsjar-in-jdk.html However, if this affects more users I'd prefer

Re: Has anyone else observed this build break?

2014-11-15 Thread Patrick Wendell
Server VM (build 24.60-b09, mixed mode) Let me see if the problem can be solved upstream in HBase hbase-annotations module. Cheers On Fri, Nov 14, 2014 at 12:32 PM, Patrick Wendell pwend...@gmail.com wrote: I think in this case we can probably just drop that dependency, so there is a simpler

Re: mvn or sbt for studying and developing Spark?

2014-11-16 Thread Patrick Wendell
Neither is strictly optimal which is why we ended up supporting both. Our reference build for packaging is Maven so you are less likely to run into unexpected dependency issues, etc. Many developers use sbt as well. It's somewhat religion and the best thing might be to try both and see which you

Re: [VOTE] Release Apache Spark 1.1.1 (RC1)

2014-11-17 Thread Patrick Wendell
Hey Kevin, If you are upgrading from 1.0.X to 1.1.X checkout the upgrade notes here [1] - it could be that default changes caused a regression for your workload. Do you still see a regression if you restore the configuration changes? It's great to hear specifically about issues like this, so

Build break

2014-11-19 Thread Patrick Wendell
Hey All, Just a heads up. I merged this patch last night which caused the Spark build to break: https://github.com/apache/spark/commit/397d3aae5bde96b01b4968dde048b6898bb6c914 The patch itself was fine and previously had passed on Jenkins. The issue was that other intermediate changes merged

Spark development with IntelliJ

2014-11-20 Thread Patrick Wendell
Hi All, I noticed people sometimes struggle to get Spark set up in IntelliJ. I'd like to maintain comprehensive instructions on our Wiki to make this seamless for future developers. Due to some nuances of our build, getting to the point where you can build + test every module from within the IDE

Automated github closing of issues is not working

2014-11-21 Thread Patrick Wendell
After we merge pull requests in Spark they are closed via a special message we put in each commit description (Closes #XXX). This feature stopped working around 21 hours ago causing already-merged pull requests to display as open. I've contacted Github support with the issue. No word from them

Re: How spark and hive integrate in long term?

2014-11-22 Thread Patrick Wendell
There are two distinct topics when it comes to hive integration. Part of the 1.3 roadmap will likely be better defining the plan for Hive integration as Hive adds future versions. 1. Ability to interact with Hive metastore's from different versions == I.e. if a user has a metastore, can Spark SQL

Re: Apache infra github sync down

2014-11-22 Thread Patrick Wendell
Hi All, Unfortunately this went back down again. I've opened a new JIRA to track it: https://issues.apache.org/jira/browse/INFRA-8688 - Patrick On Tue, Nov 18, 2014 at 10:24 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, The Apache--github mirroring is not working right now

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-23 Thread Patrick Wendell
+1 (binding). Don't see any evidence of regressions at this point. The issue reported by Hector was not related to this rlease. On Sun, Nov 23, 2014 at 9:50 AM, Debasish Das debasish.da...@gmail.com wrote: -1 from me...same FetchFailed issue as what Hector saw... I am running Netflix dataset

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-23 Thread Patrick Wendell
Hey Stephen, Thanks for bringing this up. Technically when we call a release vote it needs to be on the exact commit that will be the final release. However, one thing I've thought of doing for a while would be to publish the maven artifacts using a version tag with $VERSION-rcX even if the

Re: Notes on writing complex spark applications

2014-11-23 Thread Patrick Wendell
Hey Evan, It might be nice to merge this into existing documentation. In particular, a lot of this could serve to update the current tuning section and programming guides. It could also work to paste this wholesale as a reference for Spark users, but in that case it's less likely to get updated

[VOTE] Release Apache Spark 1.2.0 (RC1)

2014-11-28 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc.

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-11-29 Thread Patrick Wendell
Patrick, unfortunately you got some of the text here wrong, saying 1.1.0 instead of 1.2.0. Not sure it will matter since there can well be another RC after testing, but we should be careful. Matei On Nov 28, 2014, at 9:16 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing

Re: Trouble testing after updating to latest master

2014-11-29 Thread Patrick Wendell
Thanks for reporting this. One thing to try is to just do a git clean to make sure you have a totally clean working space (git clean -fdx will blow away any differences you have from the repo, of course only do that if you don't have other files around). Can you reproduce this if you just run

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
for the Contributing to Spark https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark guide. Nick On Sun Nov 30 2014 at 6:50:55 PM Patrick Wendell pwend...@gmail.com wrote: Hey Ryan, A few more things here. You should feel free to send patches to Jenkins to test them, since

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
larger than for snapshot docs). However we could just add /snapshot and publish docs there. - Patrick On Sun, Nov 30, 2014 at 6:15 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Ryan, The existing JIRA also covers publishing nightly docs: https://issues.apache.org/jira/browse/SPARK-1517

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-01 Thread Patrick Wendell
additional logging in order to help with debugging. I just wanted to give an early head's up about this issue and to get more eyes on it in case anyone else has run into it or wants to help with debugging. - Josh On November 28, 2014 at 9:18:09 PM, Patrick Wendell (pwend...@gmail.com

Re: keeping PR titles / descriptions up to date

2014-12-02 Thread Patrick Wendell
Also a note on this for committers - it's possible to re-word the title during merging, by just running git commit -a --amend before you push the PR. - Patrick On Tue, Dec 2, 2014 at 12:50 PM, Mridul Muralidharan mri...@gmail.com wrote: I second that ! Would also be great if the JIRA was

Re: Spurious test failures, testing best practices

2014-12-02 Thread Patrick Wendell
Hey Ryan, What if you run a single mvn install to install all libraries locally - then can you mvn compile -pl core? I think this may be the only way to make it work. - Patrick On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams ryan.blake.willi...@gmail.com wrote: Following on Mark's Maven

Re: Ooyala Spark JobServer

2014-12-04 Thread Patrick Wendell
Hey Jun, The Ooyala server is being maintained by it's original author (Evan Chan) here: https://github.com/spark-jobserver/spark-jobserver This is likely to stay as a standalone project for now, since it builds directly on Spark's public API's. - Patrick On Wed, Dec 3, 2014 at 9:02 PM, Jun

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Patrick Wendell
Thanks for flagging this. I reverted the relevant YARN fix in Spark 1.2 release. We can try to debug this in master. On Thu, Dec 4, 2014 at 9:51 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: I created a ticket for this: https://issues.apache.org/jira/browse/SPARK-4757 Jianshi On Fri,

Re: zinc invocation examples

2014-12-05 Thread Patrick Wendell
One thing I created a JIRA for a while back was to have a similar script to sbt/sbt that transparently downloads Zinc, Scala, and Maven in a subdirectory of Spark and sets it up correctly. I.e. build/mvn. Outside of brew for MacOS there aren't good Zinc packages, and it's a pain to figure out how

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-05 Thread Patrick Wendell
the web UI. On Sat, Nov 29, 2014 at 2:16 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h

Re: Is this a little bug in BlockTransferMessage ?

2014-12-09 Thread Patrick Wendell
Hey Nick, Thanks for bringing this up. I believe these Java tests are running in the sbt build right now, the issue is that this particular bug was flagged by the triggering of a runtime Java assert (not a normal Junit test assertion) and those are not enabled in our sbt tests. It would be good

Re: Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

2014-12-10 Thread Patrick Wendell
Hi Andrew, It looks like somehow you are including jars from the upstream Apache Hive 0.13 project on your classpath. For Spark 1.2 Hive 0.13 support, we had to modify Hive to use a different version of Kryo that was compatible with Spark's Kryo version.

[RESULT] [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-10 Thread Patrick Wendell
This vote is closed in favor of RC2. On Fri, Dec 5, 2014 at 2:02 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Thanks all for the continued testing! The issue I mentioned earlier SPARK-4498 was fixed earlier this week (hat tip to Mark Hamstra who contributed to fix

[VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-10 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc.

Re: Is Apache JIRA down?

2014-12-10 Thread Patrick Wendell
I believe many apache services are/were down due to an outage. On Wed, Dec 10, 2014 at 5:24 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Nevermind, seems to be back up now. On Wed Dec 10 2014 at 7:46:30 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: For example:

Re: zinc invocation examples

2014-12-12 Thread Patrick Wendell
) and would be great to get your initial read on it. Per this thread I need to add in the -scala-home call to zinc, but its close to ready for a PR. On 12/5/14, 2:10 PM, Patrick Wendell pwend...@gmail.com wrote: One thing I created a JIRA for a while back was to have a similar script to sbt/sbt

Re: Test failures after Jenkins upgrade

2014-12-15 Thread Patrick Wendell
/apache/spark/pull/3701 We might be close to fixing this via one of those PRs, so maybe we should try using one of those instead? On December 15, 2014 at 10:51:46 AM, Patrick Wendell (pwend...@gmail.com) wrote: Hey All, It appears that a single test suite is failing after the jenkins upgrade

[RESULT] [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-16 Thread Patrick Wendell
This vote has PASSED with 12 +1 votes (8 binding) and no 0 or -1 votes: +1: Matei Zaharia* Madhu Siddalingaiah Reynold Xin* Sandy Ryza Josh Rozen* Mark Hamstra* Denny Lee Tom Graves* GuiQiang Li Nick Pentreath* Sean McNamara* Patrick Wendell* 0: -1: I'll finalize and package this release

Re: [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-16 Thread Patrick Wendell
...@databricks.com wrote: +1 Tested on OS X. On Wednesday, December 10, 2014, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p

[ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-16 Thread Patrick Wendell
Hey All, Due to the very high volume of contributions, we're switching to an automated process for generating release credits. This process relies on JIRA for categorizing contributions, so it's not possible for us to provide credits in the case where users submit pull requests with no associated

Re: Which committers care about Kafka?

2014-12-18 Thread Patrick Wendell
Hey Cody, Thanks for reaching out with this. The lead on streaming is TD - he is traveling this week though so I can respond a bit. To the high level point of whether Kafka is important - it definitely is. Something like 80% of Spark Streaming deployments (anecdotally) ingest data from Kafka.

Re: [RESULT] [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-18 Thread Patrick Wendell
Update: An Apache infrastructure issue prevented me from pushing this last night. The issue was resolved today and I should be able to push the final release artifacts tonight. On Tue, Dec 16, 2014 at 9:20 PM, Patrick Wendell pwend...@gmail.com wrote: This vote has PASSED with 12 +1 votes (8

Announcing Spark 1.2!

2014-12-19 Thread Patrick Wendell
I'm happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is the third release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 172 developers and more than 1,000 commits! This release brings operational and performance improvements in Spark

Re: Announcing Spark 1.2!

2014-12-19 Thread Patrick Wendell
to different commits in https://github.com/apache/spark/releases Best Regards, Shixiong Zhu 2014-12-19 16:52 GMT+08:00 Patrick Wendell pwend...@gmail.com: I'm happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is the third release on the API-compatible 1.X line. It is Spark's largest

Re: Use mvn to build Spark 1.2.0 failed

2014-12-22 Thread Patrick Wendell
I also couldn't reproduce this issued. On Mon, Dec 22, 2014 at 2:24 AM, Sean Owen so...@cloudera.com wrote: I just tried the exact same command and do not see any error. Maybe you can make sure you're starting from a clean extraction of the distro, and check your environment. I'm on OSX, Maven

Re: Announcing Spark Packages

2014-12-22 Thread Patrick Wendell
Xiangrui asked me to report that it's back and running :) On Mon, Dec 22, 2014 at 3:21 PM, peng pc...@uowmail.edu.au wrote: Me 2 :) On 12/22/2014 06:14 PM, Andrew Ash wrote: Hi Xiangrui, That link is currently returning a 503 Over Quota error message. Would you mind pinging back out

Re: More general submitJob API

2014-12-22 Thread Patrick Wendell
A SparkContext is thread safe, so you can just have different threads that create their own RDD's and do actions, etc. - Patrick On Mon, Dec 22, 2014 at 4:15 PM, Alessandro Baretta alexbare...@gmail.com wrote: Andrew, Thanks, yes, this is what I wanted: basically just to start multiple jobs

Re: Announcing Spark Packages

2014-12-22 Thread Patrick Wendell
Hey Nick, I think Hitesh was just trying to be helpful and point out the policy - not necessarily saying there was an issue. We've taken a close look at this and I think we're in good shape her vis-a-vis this policy. - Patrick On Mon, Dec 22, 2014 at 5:29 PM, Nicholas Chammas

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Patrick Wendell
missing we should add. - Patrick On Mon, Dec 22, 2014 at 6:17 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Does this include contributions made against the spark-ec2 repo? On Wed Dec 17 2014 at 12:29:19 AM Patrick Wendell pwend...@gmail.com wrote: Hey All, Due to the very high

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Patrick Wendell
s/Josh/Nick/ - sorry! On Mon, Dec 22, 2014 at 10:52 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Josh, We don't explicitly track contributions to spark-ec2 in the Apache Spark release notes. The main reason is that usually updates to spark-ec2 include a corresponding update to spark so

Re: Problems with large dataset using collect() and broadcast()

2014-12-24 Thread Patrick Wendell
Hi Will, When you call collect() the item you are collecting needs to fit in memory on the driver. Is it possible your driver program does not have enough memory? - Patrick On Wed, Dec 24, 2014 at 9:34 PM, Will Yang era.ye...@gmail.com wrote: Hi all, In my occasion, I have a huge

Re: Question on saveAsTextFile with overwrite option

2014-12-24 Thread Patrick Wendell
: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 25, 2014 3:22 PM To: Shao, Saisai Cc: u...@spark.apache.org; dev@spark.apache.org Subject: Re: Question on saveAsTextFile with overwrite option Is it sufficient to set spark.hadoop.validateOutputSpecs to false? http

ANNOUNCE: New build script ./build/mvn

2014-12-27 Thread Patrick Wendell
Hi All, A consistent piece of feedback from Spark developers has been that the Maven build is very slow. Typesafe provides a tool called Zinc which improves Scala complication speed substantially with Maven, but is difficult to install and configure, especially for platforms other than Mac OS.

Re: Spark driver main thread hanging after SQL insert

2015-01-02 Thread Patrick Wendell
Hi Alessandro, Can you create a JIRA for this rather than reporting it on the dev list? That's where we track issues like this. Thanks!. - Patrick On Wed, Dec 31, 2014 at 8:48 PM, Alessandro Baretta alexbare...@gmail.com wrote: Here's what the console shows: 15/01/01 01:12:29 INFO

Re: 1.2.1-rc3 - Avro input format for Hadoop 2 broken/fix?

2015-02-04 Thread Patrick Wendell
Hi Markus, That won't be included in 1.2.1 most likely because the release votes have already started, and at that point we don't hold the release except for major regression issues from 1.2.0. However, if this goes through we can backport it into the 1.2 branch and it will end up in a future

Re: multi-line comment style

2015-02-04 Thread Patrick Wendell
Personally I have no opinion, but agree it would be nice to standardize. - Patrick On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen so...@cloudera.com wrote: One thing Marcelo pointed out to me is that the // style does not interfere with commenting out blocks of code with /* */, which is a small

Re: Spark Master Maven with YARN build is broken

2015-02-02 Thread Patrick Wendell
It's my fault, I'm sending a hot fix now. On Mon, Feb 2, 2015 at 1:44 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/ Is this is a known issue? It seems to have

Temporary jenkins issue

2015-02-02 Thread Patrick Wendell
Hey All, I made a change to the Jenkins configuration that caused most builds to fail (attempting to enable a new plugin), I've reverted the change effective about 10 minutes ago. If you've seen recent build failures like below, this was caused by that change. Sorry about that. ERROR:

[RESULT] [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-02-02 Thread Patrick Wendell
This is cancelled in favor of RC2. On Mon, Feb 2, 2015 at 8:50 PM, Patrick Wendell pwend...@gmail.com wrote: The windows issue reported only affects actually running Spark on Windows (not job submission). However, I agree it's worth cutting a new RC. I'm going to cancel this vote and propose

Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-02-02 Thread Patrick Wendell
The windows issue reported only affects actually running Spark on Windows (not job submission). However, I agree it's worth cutting a new RC. I'm going to cancel this vote and propose RC3 with a single additional patch. Let's try to vote that through so we can ship Spark 1.2.1. - Patrick On Sat,

[VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-02 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc3 (commit b6eaf77): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b6eaf77d4332bfb0a698849b1f5f917d20d70e97 The release files, including signatures, digests, etc.

Re: Improving metadata in Spark JIRA

2015-02-06 Thread Patrick Wendell
Per Nick's suggestion I added two components: 1. Spark Submit 2. Spark Scheduler I figured I would just add these since if we decide later we don't want them, we can simply merge them into Spark Core. On Fri, Feb 6, 2015 at 11:53 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Do we

[ANNOUNCE] branch-1.3 has been cut

2015-02-03 Thread Patrick Wendell
Hey All, Just wanted to announce that we've cut the 1.3 branch which will become the 1.3 release after community testing. There are still some features that will go in (in higher level libraries, and some stragglers in spark core), but overall this indicates the end of major feature development

Re: PSA: Maven supports parallel builds

2015-02-05 Thread Patrick Wendell
I've done this in the past, but back when I wasn't using Zinc it didn't make a big difference. It's worth doing this in our jenkins environment though. - Patrick On Thu, Feb 5, 2015 at 4:52 PM, Dirceu Semighini Filho dirceu.semigh...@gmail.com wrote: Thanks Nicholas, I didn't knew this.

Re: Improving metadata in Spark JIRA

2015-02-08 Thread Patrick Wendell
:52 PM Patrick Wendell pwend...@gmail.com wrote: Per Nick's suggestion I added two components: 1. Spark Submit 2. Spark Scheduler I figured I would just add these since if we decide later we don't want them, we can simply merge them into Spark Core. On Fri, Feb 6, 2015 at 11:53 AM

[RESULT] [VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-08 Thread Patrick Wendell
This vote passes with 5 +1 votes (3 binding) and no 0 or -1 votes. +1 Votes: Krishna Sankar Sean Owen* Chip Senkbeil Matei Zaharia* Patrick Wendell* 0 Votes: (none) -1 Votes: (none) On Fri, Feb 6, 2015 at 5:12 PM, Patrick Wendell pwend...@gmail.com wrote: I'll add a +1 as well. On Fri, Feb

Re: [VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-08 Thread Patrick Wendell
I'll add a +1 as well. On Fri, Feb 6, 2015 at 2:38 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Tested on Mac OS X. Matei On Feb 2, 2015, at 8:57 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.1

Unit tests

2015-02-08 Thread Patrick Wendell
Hey All, The tests are in a not-amazing state right now due to a few compounding factors: 1. We've merged a large volume of patches recently. 2. The load on jenkins has been relatively high, exposing races and other behavior not seen at lower load. For those not familiar, the main issue is

Re: Standardized Spark dev environment

2015-01-21 Thread Patrick Wendell
, this will at least serve as an up-to-date list of packages/versions they should try to install locally in whatever environment they have. - Patrick On Wed, Jan 21, 2015 at 5:42 AM, Will Benton wi...@redhat.com wrote: - Original Message - From: Patrick Wendell pwend...@gmail.com To: Sean

Re: Bouncing Mails

2015-01-17 Thread Patrick Wendell
Akhil, Those are handled by ASF infrastructure, not anyone in the Spark project. So this list is not the appropriate place to ask for help. - Patrick On Sat, Jan 17, 2015 at 12:56 AM, Akhil Das ak...@sigmoidanalytics.com wrote: My mails to the mailing list are getting rejected, have opened a

Semantics of LGTM

2015-01-17 Thread Patrick Wendell
Hey All, Just wanted to ping about a minor issue - but one that ends up having consequence given Spark's volume of reviews and commits. As much as possible, I think that we should try and gear towards Google Style LGTM on reviews. What I mean by this is that LGTM has the following semantics: I

Re: Semantics of LGTM

2015-01-19 Thread Patrick Wendell
The wiki does not seem to be operational ATM, but I will do this when it is back up. On Mon, Jan 19, 2015 at 12:00 PM, Patrick Wendell pwend...@gmail.com wrote: Okay - so given all this I was going to put the following on the wiki tentatively: ## Reviewing Code Community code review

Re: Semantics of LGTM

2015-01-19 Thread Patrick Wendell
the latter unless qualified in some other way. I don't have any opinion on the specific characters, but I agree with Aaron that it would be nice to have some sort of abbreviation for both the strong and weak forms of approval. -Sandy On Jan 17, 2015, at 7:25 PM, Patrick Wendell

Re: Standardized Spark dev environment

2015-01-21 Thread Patrick Wendell
If the goal is a reproducible test environment then I think that is what Jenkins is. Granted you can only ask it for a test. But presumably you get the same result if you start from the same VM image as Jenkins and run the same steps. But the issue is when users can't reproduce Jenkins

<    1   2   3   4   5   6   >