Re: Working Formula for Hive 0.13?

2014-07-28 Thread Patrick Wendell
It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu wrote: > Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still >

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Patrick Wendell
f hive repo & github mirror. > > > On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell wrote: > >> It would be great if the hive team can fix that issue. If not, we'll >> have to continue forking our own version of Hive to change the way it >> publishes artifact

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Patrick Wendell
be able to directly >> pull in hive-exec-core.jar >> >> Cheers >> >> >> On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell >> wrote: >> >> > It would be great if the hive team can fix that issue. If not, we'll >> > have to continue

Re: 'Proper' Build Tool

2014-07-28 Thread Patrick Wendell
Yeah for packagers we officially recommend using maven. Spark's dependency graph is very complicated and Maven and SBT use different conflict resolution strategies, so we've opted to official support Maven. SBT is still around though and it's used more often by day-to-day developers. - Patrick

Github mirroring is running behind

2014-07-28 Thread Patrick Wendell
https://issues.apache.org/jira/browse/INFRA-8116 Just a heads up, the github mirroring is running behind. You can follow that JIRA to keep up to date on the fix. In the mean time you can use the Apache git itself: https://git-wip-us.apache.org/repos/asf/spark.git Some people have reported issue

Re: replacement for SPARK_JAVA_OPTS

2014-07-30 Thread Patrick Wendell
Cody - in your example you are using the '=' character, but in our documentation and tests we use a whitespace to separate the key and value in the defaults file. docs: http://spark.apache.org/docs/latest/configuration.html spark.driver.extraJavaOptions -Dfoo.bar.baz=23 I'm not sure if the java

Re: replacement for SPARK_JAVA_OPTS

2014-07-30 Thread Patrick Wendell
v --class >> org.apache.spark.repl.Main >> >> >> Here's an example of it when the command line --driver-java-options is >> used (and thus things work): >> >> >> $ ps -ef | grep spark >> 514 5392 2058 0 21:15 pts/200:00:00 bash ./bin/spark-sh

Re: replacement for SPARK_JAVA_OPTS

2014-07-30 Thread Patrick Wendell
The third issue may be related to this: https://issues.apache.org/jira/browse/SPARK-2022 We can take a look at this during the bug fix period for the 1.1 release next week. If we come up with a fix we can backport it into the 1.0 branch also. On Wed, Jul 30, 2014 at 11:31 PM, Patrick Wendell

Re: Compiling Spark master (284771ef) with sbt/sbt assembly fails on EC2

2014-08-01 Thread Patrick Wendell
This is a Scala bug - I filed something upstream, hopefully they can fix it soon and/or we can provide a work around: https://issues.scala-lang.org/browse/SI-8772 - Patrick On Fri, Aug 1, 2014 at 3:15 PM, Holden Karau wrote: > Currently scala 2.10.2 can't be pulled in from maven central it se

Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow

2014-08-01 Thread Patrick Wendell
Andrew - I think Spark is using Guava 14... are you using Guava 16 in your user app (i.e. you inverted the versions in your earlier e-mail)? - Patrick On Fri, Aug 1, 2014 at 4:15 PM, Colin McCabe wrote: > On Fri, Aug 1, 2014 at 2:45 PM, Andrew Ash wrote: > > After several days of debugging, w

ASF JIRA is down for maintenance

2014-08-01 Thread Patrick Wendell
Please don't let this prevent you from merging patches, just keep a list and we can update the JIRA later. - Patrick

branch-1.1 of Spark has been cut

2014-08-02 Thread Patrick Wendell
Hey All, I'm happy to announce branch-1.1 of Spark [1] - this branch will eventually become the 1.1 release. Committers: new patches will need to be explicitly back-ported into this branch in order to appear in the 1.1 release. Thanks so much to all the committers and contributors who were extrem

Re: -1s on pull requests?

2014-08-03 Thread Patrick Wendell
> >1. Include the commit hash in the "tests have started/completed" >messages, so that it's clear what code exactly is/has been tested for > each >test cycle. > Great idea - I think this is easy to do given the current architecture. We already have access to the commit ID in the same s

Re: Low Level Kafka Consumer for Spark

2014-08-03 Thread Patrick Wendell
I'll let TD chime on on this one, but I'm guessing this would be a welcome addition. It's great to see community effort on adding new streams/receivers, adding a Java API for receivers was something we did specifically to allow this :) - Patrick On Sat, Aug 2, 2014 at 10:09 AM, Dibyendu Bhattach

Re: Scala 2.11 external dependencies

2014-08-03 Thread Patrick Wendell
Hey Anand, Thanks for looking into this - it's great to see momentum towards Scala 2.11 and I'd love if this land in Spark 1.2. For the external dependencies, it would be good to create a sub-task of SPARK-1812 to track our efforts encouraging other projects to upgrade. In certain cases (e.g. Kaf

Re: -1s on pull requests?

2014-08-03 Thread Patrick Wendell
Sure thing - feel free to ping me off list if you need pointers. The script just does string concatenation and a curl to post the comment... I think it should be pretty accessible! - Patrick On Sun, Aug 3, 2014 at 9:12 PM, Nicholas Chammas wrote: > On Sun, Aug 3, 2014 at 11:29 PM, Patr

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Patrick Wendell
For hortonworks, I believe it should work to just link against the corresponding upstream version. I.e. just set the Hadoop version to "2.4.0" Does that work? - Patrick On Mon, Aug 4, 2014 at 12:13 AM, Ron's Yahoo! wrote: > Hi, > Not sure whose issue this is, but if I run make-distribution

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Patrick Wendell
4 -Dhadoop.version=2.4.0.2.1.1.0-385 > -DskipTests clean package > > I haven¹t tried building a distro, but it should be similar. > > > - SteveN > > On 8/4/14, 1:25, "Sean Owen" wrote: > > For any Hadoop 2.4 distro, yes, set hadoop.version but also set > -Phadoop

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Patrick Wendell
gt; > Thanks, > Ron > > On Aug 4, 2014, at 10:01 AM, Ron's Yahoo! wrote: > > That failed since it defaulted the versions for yarn and hadoop > I'll give it a try with just 2.4.0 for both yarn and hadoop... > > Thanks, > Ron > > On Aug 4, 2014, at 9:44

[SNAPSHOT] Snapshot1 of Spark 1.1.0 has been posted

2014-08-06 Thread Patrick Wendell
Hi All, I've packaged and published a snapshot release of Spark 1.1 for testing. This is being distributed to the community for QA and preview purposes. It is not yet an official RC for voting. Going forward, we'll do preview releases like this for testing ahead of official votes. The tag of this

Re: [SNAPSHOT] Snapshot1 of Spark 1.1.0 has been posted

2014-08-06 Thread Patrick Wendell
Minor correction: the encoded URL in the staging repo link was wrong. The correct repo is: https://repository.apache.org/content/repositories/orgapachespark-1025/ On Wed, Aug 6, 2014 at 11:23 PM, Patrick Wendell wrote: > > Hi All, > > I've packaged and published a snapshot rel

Re: Unit test best practice for Spark-derived projects

2014-08-07 Thread Patrick Wendell
In the past I've found if I do a jstack when running some tests, it sits forever inside of a hostname resolution step or something. I never narrowed it down, though. - Patrick On Thu, Aug 7, 2014 at 10:45 AM, Dmitriy Lyubimov wrote: > Thanks. > > let me check this hypothesis (i have dhcp connect

Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Patrick Wendell
t escaping equals sign, it doesn't >>> affect >>> > the >>> > >> results. >>> > >> >>> > >> 2. Yeah, exporting SPARK_SUBMIT_OPTS from spark-env.sh works for >>> getting >>> > >> syste

Re: Fine-Grained Scheduler on Yarn

2014-08-07 Thread Patrick Wendell
The current YARN is equivalent to what is called "fine grained" mode in Mesos. The scheduling of tasks happens totally inside of the Spark driver. On Thu, Aug 7, 2014 at 7:50 PM, Jun Feng Liu wrote: > Any one know the answer? > > Best Regards > > > *Jun Feng Liu* > IBM China Systems & Technolog

Re: Fine-Grained Scheduler on Yarn

2014-08-07 Thread Patrick Wendell
duling requires scheduling at the granularity of individual cores. On Thu, Aug 7, 2014 at 9:43 PM, Patrick Wendell wrote: > The current YARN is equivalent to what is called "fine grained" mode in > Mesos. The scheduling of tasks happens totally inside of the Spark driver. > >

Re: Unit tests in < 5 minutes

2014-08-08 Thread Patrick Wendell
I dug around this a bit a while ago, I think if someone sat down and profiled the tests it's likely we could find some things to optimize. In particular, there may be overheads in starting up a local spark context that could be minimized and speed up all the tests. Also, there are some tests (espec

Re: Unit tests in < 5 minutes

2014-08-08 Thread Patrick Wendell
while also > running another Spark shell, I've noticed that the test logs fill up with > errors when the web UI attempts to bind to the default port, fails, and > tries a higher one. > > - Josh > > On August 8, 2014 at 11:54:24 AM, Patrick Wendell (pwend...@gmail.com) > wr

Re: spark-shell is broken! (bad option: '--master')

2014-08-08 Thread Patrick Wendell
Cheng Lian also has a fix for this. I've asked him to make a PR - he is on China time so it probably won't come until tonight: https://github.com/liancheng/spark/compare/apache:master...liancheng:spark-2894 On Fri, Aug 8, 2014 at 3:46 PM, Sandy Ryza wrote: > Hi Chutium, > > This is currently bei

Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-11 Thread Patrick Wendell
lugins.jira-bitbucket-connector-plugin/86ff1a21-44fb-4227-aa4f-44c77aec2c97.png > > > > that might be nice to have for heavy JIRA users. > > > > Nick > > > > > > > > On Sun, Jul 20, 2014 at 12:50 PM, Patrick Wendell > > wrote: > > >

Re: [SPARK-3050] Spark program running with 1.0.2 jar cannot run against a 1.0.1 cluster

2014-08-14 Thread Patrick Wendell
I commented on the bug. For driver mode, you'll need to get the corresponding version of spark-submit for Spark 1.0.2. On Thu, Aug 14, 2014 at 3:43 PM, Gary Malouf wrote: > To be clear, is it 'compiled' against 1.0.2 or it packaged with it? > > > On Thu, Aug 14, 2014 at 6:39 PM, Mingyu Kim wro

Tests failing

2014-08-15 Thread Patrick Wendell
Hi All, I noticed that all PR tests run overnight had failed due to timeouts. The patch that updates the netty shuffle I believe somehow inflated to the build time significantly. That patch had been tested, but one change was made before it was merged that was not tested. I've reverted the patch

Re: Tests failing

2014-08-15 Thread Patrick Wendell
8 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> Also I think Jenkins doesn't post build timeouts to github. Is there >> anyway >> we can fix that ? >> On Aug 15, 2014 9:04 AM, "Patrick Wendell" wrote: >> >> >

Re: [SPARK-3050] Spark program running with 1.0.2 jar cannot run against a 1.0.1 cluster

2014-08-15 Thread Patrick Wendell
Hadoop, as opposed to currently being able to use the Spark package with > CDH4 against most of the CDH4 Hadoop clusters. > > Is it correct that Spark is focusing and prioritizing around the > spark-submit use cases than the aforementioned use cases? I just wanted to > better

Re: Tests failing

2014-08-15 Thread Patrick Wendell
, >>> >>> Can you point us to an example of that happening? The Jenkins console >>> output, that is. >>> >>> Nick >>> >>> >>> On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman < >>> shiva...@eecs.berkeley.edu>

Re: Tests failing

2014-08-15 Thread Patrick Wendell
ng a build can run. Okie doke. > > Perhaps then I'll wrap the run-tests step as you suggest and limit it to > 100 minutes or something, and cleanly report if it times out. > > Sound good? > > > On Fri, Aug 15, 2014 at 4:43 PM, Patrick Wendell > wrote: > >> H

Re: Spark 1.1.0 Progress

2014-08-18 Thread Patrick Wendell
Hey Gary, There are couple of blockers in Spark core and SQL - but we're quite close. The goal was to have rc1 on Friday (ish) of last week... I think by tonight I will be able to cut one. If not, I'll cut a preview release tonight that does a full package but doesn't trigger an official vote yet

Re: Akka usage in Spark

2014-08-20 Thread Patrick Wendell
Hey Deb, Can you be specific what changes you are mentioning? We have not, to my knowledge, made major architectural changes around akka use. I think in general we don't want people to be using Spark's actor system directly - it is an internal communication component in Spark and could e.g. be re

[SNAPSHOT] Snapshot2 of Spark 1.1 has been posted

2014-08-21 Thread Patrick Wendell
Hi All, I've packaged and published a snapshot release of Spark 1.1 for testing. This is very close to RC1 and we are distributing it for testing. Please test this and report any issues on this thread. The tag of this release is v1.1.0-snapshot1 (commit e1535ad3): *https://git-wip-us.apache.org/r

Re: [SNAPSHOT] Snapshot2 of Spark 1.1 has been posted

2014-08-21 Thread Patrick Wendell
The docs for this release are also available here: http://people.apache.org/~pwendell/spark-1.1.0-snapshot2-docs/ On Thu, Aug 21, 2014 at 1:12 AM, Patrick Wendell wrote: > Hi All, > > I've packaged and published a snapshot release of Spark 1.1 for testing. > This is very cl

Re: reference to dstream in package org.apache.spark.streaming which is not available

2014-08-22 Thread Patrick Wendell
Hey All, We can sort this out ASAP. Many of the Spark committers were at a company offsite for the last 72 hours, so sorry that it is broken. - Patrick On Fri, Aug 22, 2014 at 4:07 PM, Hari Shreedharan wrote: > Sean - I think only the ones in 1726 are enough. It is weird that any > class that

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Patrick Wendell
One other idea - when things freeze up, try to run jstack on the spark shell process and on the executors and attach the results. It could be that somehow you are encountering a deadlock somewhere. On Mon, Aug 25, 2014 at 1:26 PM, Matei Zaharia wrote: > Was the original issue with Spark 1.1 (i.

Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-25 Thread Patrick Wendell
INFRA to install/configure the >> JIRA-GitHub plugin while we continue to use the Python script we have? I >> wouldn't mind opening that JIRA issue with them. >> >> Nick >> >> >> On Mon, Aug 11, 2014 at 12:52 PM, Patrick Wendell >> wrote: >> &

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Patrick Wendell
Hey Amnon, So just to make sure I understand - you also saw the same issue with 1.0.2? Just asking because whether or not this regresses the 1.0.2 behavior is important for our own bug tracking. - Patrick On Mon, Aug 25, 2014 at 10:22 PM, Amnon Khen wrote: > There were no failures nor excepti

Re: Handling stale PRs

2014-08-25 Thread Patrick Wendell
Hey Nicholas, Thanks for bringing this up. There are a few dimensions to this... one is that it's actually precedurally difficult for us to close pull requests. I've proposed several different solutions to ASF infra to streamline the process, but thus far they haven't been open to any of my ideas:

Submit to the "Powered By Spark" Page!

2014-08-26 Thread Patrick Wendell
Hi All, I want to invite users to submit to the Spark "Powered By" page. This page is a great way for people to learn about Spark use cases. Since Spark activity has increased a lot in the higher level libraries and people often ask who uses each one, we'll include information about which componen

Re: Handling stale PRs

2014-08-27 Thread Patrick Wendell
Hey Nishkam, To some extent we already have this process - many community members help review patches and some earn a reputation where committer's will take an LGTM from them seriously. I'd be interested in seeing if any other projects recognize people who do this. - Patrick On Wed, Aug 27, 2014

[VOTE] Release Apache Spark 1.1.0 (RC1)

2014-08-28 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.1.0! The tag to be voted on is v1.1.0-rc1 (commit f0718324): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f07183249b74dd857069028bf7d570b35f265585 The release files, including signatures, digests, etc. ca

Re: [VOTE] Release Apache Spark 1.1.0 (RC1)

2014-08-28 Thread Patrick Wendell
ARK-3277 applicable to 1.1 ? > If yes, until it is fixed, I am -1 on the release (I am on break, so can't > verify or help fix, sorry). > > Regards > Mridul > > On 28-Aug-2014 9:33 pm, "Patrick Wendell" wrote: >> >> Please vote on releasing

Re: [VOTE] Release Apache Spark 1.1.0 (RC1)

2014-08-28 Thread Patrick Wendell
Okay I'm cancelling this vote in favor of RC2. On Thu, Aug 28, 2014 at 3:27 PM, Mridul Muralidharan wrote: > Thanks for being on top of this Patrick ! And apologies for not being able > to help more. > > Regards, > Mridul > > On Aug 29, 2014 1:30 AM, "Patric

[VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-28 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.1.0! The tag to be voted on is v1.1.0-rc2 (commit 711aebb3): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=711aebb329ca28046396af1e34395a0df92b5327 The release files, including signatures, digests, etc. ca

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-28 Thread Patrick Wendell
I'll kick off the vote with a +1. On Thu, Aug 28, 2014 at 7:14 PM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.1.0! > > The tag to be voted on is v1.1.0-rc2 (commit 711aebb3): > https://git-wip-us.apache.org/rep

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-28 Thread Patrick Wendell
vendor rightly noted this could look like favoritism. They > changed to remove vendor releases. > > On Fri, Aug 29, 2014 at 3:14 AM, Patrick Wendell wrote: >> Please vote on releasing the following candidate as Apache Spark version >> 1.1.0!

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-28 Thread Patrick Wendell
ved to be too connected to other vendors. I'd like > to maximize Spark's distribution and there's some argument you do this > by not making vendor profiles. But as I say a different question to > just think about over time... > > (oh and PS for my part I think it

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Patrick Wendell
vendors. I'd like > > to maximize Spark's distribution and there's some argument you do this > > by not making vendor profiles. But as I say a different question to > > just think about over time... > > > > (oh and PS for my part I think it's a good

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Patrick Wendell
ting and typographical errors in the SQL docs that > I've fixed in this PR. Dunno if we want to roll that into the release. > > > On Fri, Aug 29, 2014 at 12:17 PM, Patrick Wendell > wrote: >> >> Okay I'll plan to add cdh4 binary as well for the final release! >> >

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Patrick Wendell
default Spark version in spark-ec2 be updated for this release? > > Nick > > > > On Fri, Aug 29, 2014 at 12:55 PM, Patrick Wendell > wrote: >> >> Hey Nicholas, >> >> Thanks for this, we can merge in doc changes outside of the actual >> release t

Re: Compie error with XML elements

2014-08-29 Thread Patrick Wendell
In some cases IntelliJ's Scala compiler can't compile valid Scala source files. Hopefully they fix (or have fixed) this in a newer version. - Patrick On Fri, Aug 29, 2014 at 11:38 AM, Yi Tian wrote: > Hi, Devl! > > I got the same problem. > > You can try to upgrade your scala plugins to 0.41.2

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-30 Thread Patrick Wendell
Thanks to Nick Chammas and Cheng Lian who pointed out two issues with the release candidate. I'll cancel this in favor of RC3. On Fri, Aug 29, 2014 at 1:33 PM, Jeremy Freeman wrote: > +1. Validated several custom analysis pipelines on a private cluster in > standalone mode. Tested new PySpark sup

[VOTE] Release Apache Spark 1.1.0 (RC3)

2014-08-30 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.1.0! The tag to be voted on is v1.1.0-rc3 (commit b2d0493b): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b2d0493b223c5f98a593bb6d7372706cc02bebad The release files, including signatures, digests, etc. ca

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-08-31 Thread Patrick Wendell
For my part I'm +1 on this, though Sean it would be great separately to fix the test environment. For those who voted on rc2, this is almost identical, so feel free to +1 unless you think there are issues with the two minor bug fixes. On Sun, Aug 31, 2014 at 10:18 AM, Sean Owen wrote: > Fantasti

Re: Run the "Big Data Benchmark" for new releases

2014-09-01 Thread Patrick Wendell
Yeah, this wasn't detected in our performance tests. We even have a test in PySpark that I would have though might catch this (it just schedules a bunch of really small tasks, similar to the regression case). https://github.com/databricks/spark-perf/blob/master/pyspark-tests/tests.py#L51 Anyways,

Re: hey spark developers! intro from shane knapp, devops engineer @ AMPLab

2014-09-02 Thread Patrick Wendell
Hey Shane, Thanks for your work so far and I'm really happy to see investment in this infrastructure. This is a key productivity tool for us and something we'd love to expand over time to improve the development process of Spark. - Patrick On Tue, Sep 2, 2014 at 10:47 AM, Nicholas Chammas wrote

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-02 Thread Patrick Wendell
s not appear to be serious. > > > On Sun, Aug 31, 2014 at 5:14 PM, Nicholas Chammas > wrote: >> >> -1: I believe I've found a regression from 1.0.2. The report is captured >> in SPARK-3333. >> >> >> On Sat, Aug 30, 2014 at 6:07 PM, Patrick Wendel

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-03 Thread Patrick Wendell
I'm cancelling this release in favor of RC4. Happy voting! On Tue, Sep 2, 2014 at 9:55 PM, Patrick Wendell wrote: > Thanks everyone for voting on this. There were two minor issues (one a > blocker) were found that warrant cutting a new RC. For those who voted > +1 on this release,

[VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.1.0! The tag to be voted on is v1.1.0-rc4 (commit 2f9b2bd): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=2f9b2bd7844ee8393dc9c319f4fefedf95f5e460 The release files, including signatures, digests, etc. can

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Patrick Wendell
I'll kick it off with a +1 On Wed, Sep 3, 2014 at 12:24 AM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.1.0! > > The tag to be voted on is v1.1.0-rc4 (commit 2f9b2bd): > https://git-wip-us.apache.org/repos/asf?p=

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Patrick Wendell
Hey Nick, Yeah we'll put those in the release notes. On Wed, Sep 3, 2014 at 7:23 AM, Nicholas Chammas wrote: > On Wed, Sep 3, 2014 at 3:24 AM, Patrick Wendell wrote: >> >> == What default changes should I be aware of? == >> 1. The default value of "spark.io.co

Re: memory size for caching RDD

2014-09-03 Thread Patrick Wendell
Changing this is not supported, it si immutable similar to other spark configuration settings. On Wed, Sep 3, 2014 at 8:13 PM, 牛兆捷 wrote: > Dear all: > > Spark uses memory to cache RDD and the memory size is specified by > "spark.storage.memoryFraction". > > One the Executor starts, does Spark su

Re: amplab jenkins is down

2014-09-04 Thread Patrick Wendell
Hm yeah it seems that it hasn't been polling since 3:45. On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas wrote: > It appears that our main man is having trouble hearing new requests. > > Do we need some smelling salts? > > > On Thu, Sep 4, 2014 at 5:49 PM, shane knapp wrote: >> >> i'd ping the

Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Patrick Wendell
Hey There, I believe this is on the roadmap for the 1.2 next release. But Xiangrui can comment on this. - Patrick On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa wrote: > Hi Evan, > > That's sounds interesting. > > Here is the ticket which I created. > https://issues.apache.org/jira/browse/SPARK-34

[RESULT] [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-07 Thread Patrick Wendell
This vote passes with 8 binding +1 votes and no -1 votes. I'll post the final release in the next 48 hours... just finishing the release notes and packaging (which now takes a long time given the number of contributors!). +1: Reynold Xin* Michael Armbrust* Xiangrui Meng* Andrew Or* Sean Owen Matth

RFC: Deprecating YARN-alpha API's

2014-09-09 Thread Patrick Wendell
Hi Everyone, This is a call to the community for comments on SPARK-3445 [1]. In a nutshell, we are trying to figure out timelines for deprecation of the YARN-alpha API's as Yahoo is now moving off of them. It's helpful for us to have a sense of whether anyone else uses these. Please comment on th

Re: parquet predicate / projection pushdown into unionAll

2014-09-09 Thread Patrick Wendell
I think what Michael means is people often use this to read existing partitioned Parquet tables that are defined in a Hive metastore rather than data generated directly from within Spark and then reading it back as a table. I'd expect the latter case to become more common, but for now most users co

Re: [RESULT] [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-10 Thread Patrick Wendell
count. Next time we can pipeline this work to avoid a delay. I did cut the v1.1.0 tag today. We should be able to do the full announce tomorrow. Thanks, Patrick On Sun, Sep 7, 2014 at 5:50 PM, Patrick Wendell wrote: > This vote passes with 8 binding +1 votes and no -1 votes. I'll post >

Announcing Spark 1.1.0!

2014-09-11 Thread Patrick Wendell
I am happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is the second release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 171 developers! This release brings operational and performance improvements in Spark core including a new implement

Re: Use Case of mutable RDD - any ideas around will help.

2014-09-12 Thread Patrick Wendell
[moving to user@] This would typically be accomplished with a union() operation. You can't mutate an RDD in-place, but you can create a new RDD with a union() which is an inexpensive operator. On Fri, Sep 12, 2014 at 5:28 AM, Archit Thakur wrote: > Hi, > > We have a use case where we are plannin

Re: Adding abstraction in MLlib

2014-09-12 Thread Patrick Wendell
We typically post design docs on JIRA's before major work starts. For instance, pretty sure SPARk-1856 will have a design doc posted shortly. On Fri, Sep 12, 2014 at 12:10 PM, Erik Erlandson wrote: > > Are interface designs being captured anywhere as documents that the community > can follow alo

Tests and Test Infrastructure

2014-09-13 Thread Patrick Wendell
Hey All, Wanted to send a quick update about test infrastructure. With the number of contributors we have and the rate of development, maintaining a well-oiled test infra is really important. Every time a flaky test fails a legitimate pull request, it wastes developer time and effort. 1. Master

Re: Wiki page for Operations/Monitoring tools?

2014-09-16 Thread Patrick Wendell
Hey Otis, Could you describe a bit more about what your program is. Is it an open source project? A product? This would help understand a bit where it should go. - Patrick On Mon, Sep 15, 2014 at 6:49 PM, Otis Gospodnetic wrote: > Hi, > > I'm looking for a suitable place on the Wiki to add some

Re: greeting from new member and jira 3489

2014-09-16 Thread Patrick Wendell
Hi Mohit, Welcome to the Spark community! We normally look at feature proposals using github pull requests mind submitting one? The contribution process is covered here: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Tue, Sep 16, 2014 at 9:16 PM, Mohit Jaggi wrote: >

Re: Spark spilling location

2014-09-18 Thread Patrick Wendell
Yes - I believe we use the local dirs for spilling as well. On Thu, Sep 18, 2014 at 7:57 AM, Tom Hubregtsen wrote: > Hi all, > > Just one line of context, since last post mentioned this would help: > I'm currently writing my masters thesis (Computer Engineering) on storage > and memory in both Sp

Re: BlockManager issues

2014-09-21 Thread Patrick Wendell
Hey the numbers you mentioned don't quite line up - did you mean PR 2711? On Sun, Sep 21, 2014 at 8:45 PM, Reynold Xin wrote: > It seems like you just need to raise the ulimit? > > > On Sun, Sep 21, 2014 at 8:41 PM, Nishkam Ravi wrote: > >> Recently upgraded to 1.1.0. Saw a bunch of fetch failur

Re: BlockManager issues

2014-09-21 Thread Patrick Wendell
29 PM, Patrick Wendell wrote: > Hey the numbers you mentioned don't quite line up - did you mean PR 2711? > > On Sun, Sep 21, 2014 at 8:45 PM, Reynold Xin wrote: >> It seems like you just need to raise the ulimit? >> >> >> On Sun, Sep 21, 2014 at 8:41 PM, Ni

Re: hash vs sort shuffle

2014-09-22 Thread Patrick Wendell
Hey Cody, In terms of Spark 1.1.1 - we wouldn't change a default value in a spot release. Changing this to default is slotted for 1.2.0: https://issues.apache.org/jira/browse/SPARK-3280 - Patrick On Mon, Sep 22, 2014 at 9:08 AM, Cody Koeninger wrote: > Unfortunately we were somewhat rushed to

Re: do MIMA checking before all test cases start?

2014-09-24 Thread Patrick Wendell
Have you considered running the mima checks locally? We prefer people not use Jenkins for very frequent checks since it takes resources away from other people trying to run tests. On Wed, Sep 24, 2014 at 6:44 PM, Nan Zhu wrote: > Hi, all > > It seems that, currently, Jenkins makes MIMA checking a

Re: do MIMA checking before all test cases start?

2014-09-25 Thread Patrick Wendell
gt; -- >> Nan Zhu >> >> >> On Thursday, September 25, 2014 at 12:04 AM, Patrick Wendell wrote: >> >> > Have you considered running the mima checks locally? We prefer people >> > not use Jenkins for very frequent checks since it takes resources awa

Re: Extending Scala style checks

2014-10-01 Thread Patrick Wendell
Hey Nick, We can always take built-in rules. Back when we added this Prashant Sharma actually did some great work that lets us write our own style rules in cases where rules don't exist. You can see some existing rules here: https://github.com/apache/spark/tree/master/project/spark-style/src/main

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-03 Thread Patrick Wendell
Hey All, Just a couple notes. I recently posted a shell script for creating the AMI's from a clean Amazon Linux AMI. https://github.com/mesos/spark-ec2/blob/v3/create_image.sh I think I will update the AMI's soon to get the most recent security updates. For spark-ec2's purpose this is probably s

Re: Unneeded branches/tags

2014-10-07 Thread Patrick Wendell
Actually - weirdly - we can delete old tags and it works with the mirroring. Nick if you put together a list of un-needed tags I can delete them. On Tue, Oct 7, 2014 at 6:27 PM, Reynold Xin wrote: > Those branches are no longer active. However, I don't think we can delete > branches from github d

Re: Scalastyle improvements / large code reformatting

2014-10-12 Thread Patrick Wendell
Another big problem with these patches are that they make it almost impossible to backport changes to older branches cleanly (there becomes like 100% chance of a merge conflict). One proposal is to do this: 1. We only consider new style rules at the end of a release cycle, when there is the smalle

Re: Scalastyle improvements / large code reformatting

2014-10-13 Thread Patrick Wendell
own development and >> backporting for trivial reasons. Let's not do that at this point, the style >> of the current code is quite consistent and we have plenty of other things >> to worry about. Instead, what you can do is as you edit a file when you're >> workin

Re: Get attempt number in a closure

2014-10-20 Thread Patrick Wendell
There is a deeper issue here which is AFAIK we don't even store a notion of attempt inside of Spark, we just use a new taskId with the same index. On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai wrote: > Yeah, seems we need to pass the attempt id to executors through > TaskDescription. I have created

Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread Patrick Wendell
The failure is in the Kinesis compoent, can you reproduce this if you build with -Pkinesis-asl? - Patrick On Mon, Oct 20, 2014 at 5:08 PM, shane knapp wrote: > hmm, strange. i'll take a look. > > On Mon, Oct 20, 2014 at 5:11 PM, Nan Zhu wrote: > >> yes, I can compile locally, too >> >> but it

Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread Patrick Wendell
sorry about that! > > shane > > > On Mon, Oct 20, 2014 at 5:16 PM, Patrick Wendell wrote: >> >> The failure is in the Kinesis compoent, can you reproduce this if you >> build with -Pkinesis-asl? >> >> - Patrick >> >> On Mon, Oct 20, 2014 at

Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread Patrick Wendell
I created an issue to fix this: https://issues.apache.org/jira/browse/SPARK-4021 On Mon, Oct 20, 2014 at 5:32 PM, Patrick Wendell wrote: > Thanks Shane - we should fix the source code issues in the Kinesis > code that made stricter Java compilers reject it. > > - Patrick > &

Re: something wrong with Jenkins or something untested merged?

2014-10-21 Thread Patrick Wendell
clipse Java Compiler 0.894_R34x, 3.4.2 release, Copyright IBM >> > > > > Corp 2000, 2008. All rights reserved. >> > > > > >> > > > >> > > > >> > > > Which JDK is actually used by Jenkins? >> > > > >> > >

Re: Which part of the code deals with communication?

2014-10-22 Thread Patrick Wendell
The best documentation about communication interfaces is the SecurityManager doc written by Tom Graves. With this as a starting point I'd recommend digging through the code for each component. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala#L5

Re: scalastyle annoys me a little bit

2014-10-23 Thread Patrick Wendell
Hey Koert, I think disabling the style checks in maven package could be a good idea for the reason you point out. I was sort of mixed on that when it was proposed for this exact reason. It's just annoying to developers. In terms of changing the global limit, this is more religion than anything el

Spark 1.2 feature freeze on November 1

2014-10-23 Thread Patrick Wendell
Hey All, Just a reminder that as planned [1] we'll go into a feature freeze on November 1. On that date I'll cut a 1.2 release branch and make the up-or-down call on any patches that go into that branch, along with individual committers. It is common for us to receive a very large volume of patch

Re: Moving PR Builder to mvn

2014-10-24 Thread Patrick Wendell
Overall I think this would be a good idea. The main blocker is just that I think the Maven build is much slower right now than the SBT build. However, if we were able to e.g. parallelize the test build on Jenkins that might make up for it. I'd actually like to have a trigger where we could tests p

<    1   2   3   4   5   6   7   >