Re: Standardized Spark dev environment

2015-01-20 Thread Patrick Wendell
To respond to the original suggestion by Nick. I always thought it would be useful to have a Docker image on which we run the tests and build releases, so that we could have a consistent environment that other packagers or people trying to exhaustively run Spark tests could replicate (or at least

[ANNOUNCE] Spark 1.3.0 Snapshot 1

2015-02-11 Thread Patrick Wendell
Hey All, I've posted Spark 1.3.0 snapshot 1. At this point the 1.3 branch is ready for community testing and we are strictly merging fixes and documentation across all components. The release files, including signatures, digests, etc can be found at:

Re: Replacing Jetty with TomCat

2015-02-17 Thread Patrick Wendell
Hey Niranda, It seems to me a lot of effort to support multiple libraries inside of Spark like this, so I'm not sure that's a great solution. If you are building an application that embeds Spark, is it not possible for you to continue to use Jetty for Spark's internal servers and use tomcat for

Merging code into branch 1.3

2015-02-18 Thread Patrick Wendell
Hey Committers, Now that Spark 1.3 rc1 is cut, please restrict branch-1.3 merges to the following: 1. Fixes for issues blocking the 1.3 release (i.e. 1.2.X regressions) 2. Documentation and tests. 3. Fixes for non-blocker issues that are surgical, low-risk, and/or outside of the core. If there

[VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-18 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.3.0! The tag to be voted on is v1.3.0-rc1 (commit f97b0d4a): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f97b0d4a6b26504916816d7aefcf3132cd1da6c2 The release files, including signatures, digests, etc.

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-18 Thread Patrick Wendell
UISeleniumSuite: *** RUN ABORTED *** java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal ... This is a newer test suite. There is something flaky about it, we should definitely fix it, IMO it's not a blocker though. Patrick this link gives a 404:

Re: driver fail-over in Spark streaming 1.2.0

2015-02-12 Thread Patrick Wendell
It will create and connect to new executors. The executors are mostly stateless, so the program can resume with new executors. On Wed, Feb 11, 2015 at 11:24 PM, lin kurtt@gmail.com wrote: Hi, all In Spark Streaming 1.2.0, when the driver fails and a new driver starts with the most updated

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-02-12 Thread Patrick Wendell
The map will start with a capacity of 64, but will grow to accommodate new data. Are you using the groupBy operator in Spark or are you using Spark SQL's group by? This usually happens if you are grouping or aggregating in a way that doesn't sufficiently condense the data created from each input

Re: How to track issues that must wait for Spark 2.x in JIRA?

2015-02-12 Thread Patrick Wendell
Yeah my preferred is also having a more open ended 2+ for issues that are clearly desirable but blocked by compatibility concerns. What I would really want to avoid is major feature proposals sitting around in our JIRA and tagged under some 2.X version. IMO JIRA isn't the place for thoughts about

[VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-26 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc1 (commit 3e2d7d3): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3e2d7d310b76c293b9ac787f204e6880f508f6ec The release files, including signatures, digests, etc.

Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Patrick Wendell
, Sean On Jan 27, 2015, at 12:04 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc1 (commit 3e2d7d3): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h

Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Patrick Wendell
, at 11:35 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Sean, Right now we don't publish every 2.11 binary to avoid combinatorial explosion of the number of build artifacts we publish (there are other parameters such as whether hive is included, etc). We can revisit this in future feature

Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Patrick Wendell
create: configuration createChecksumtrue/createChecksum /configuration As for the key issue, I think it's just a matter of uploading the new key in both places. We should all of course test the release anyway. On Tue, Jan 27, 2015 at 5:55 PM, Patrick Wendell pwend...@gmail.com wrote: Hey

Friendly reminder/request to help with reviews!

2015-01-27 Thread Patrick Wendell
Hey All, Just a reminder, as always around release time we have a very large volume of patches show up near the deadline. One thing that can help us maximize the number of patches we get in is to have community involvement in performing code reviews. And in particular, doing a thorough review

[RESULT] [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-28 Thread Patrick Wendell
And Scale OK Fixed : org.apache.spark.SparkException in zip ! 2.5. rdd operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. recommendation OK Cheers k/ On Mon, Jan 26, 2015 at 11:02 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote

Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-28 Thread Patrick Wendell
://issues.apache.org/jira/browse/SPARK-5144 Thanks, Aniket On Wed Jan 28 2015 at 15:39:43 Patrick Wendell [via Apache Spark Developers List] ml-node+s1001551n1031...@n3.nabble.com wrote: Minor typo in the above e-mail - the tag is named v1.2.1-rc2 (not v1.2.1-rc1). On Wed, Jan 28, 2015 at 2:06 AM

[VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-28 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc1 (commit b77f876): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b77f87673d1f9f03d4c83cf583158227c551359b The release files, including signatures, digests, etc.

Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-28 Thread Patrick Wendell
Minor typo in the above e-mail - the tag is named v1.2.1-rc2 (not v1.2.1-rc1). On Wed, Jan 28, 2015 at 2:06 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc1 (commit b77f876

Re: spark akka fork : is the source anywhere?

2015-01-28 Thread Patrick Wendell
It's maintained here: https://github.com/pwendell/akka/tree/2.2.3-shaded-proto Over time, this is something that would be great to get rid of, per rxin On Wed, Jan 28, 2015 at 3:33 PM, Reynold Xin r...@databricks.com wrote: Hopefully problems like this will go away entirely in the next couple

Re: Questions about Spark standalone resource scheduler

2015-02-02 Thread Patrick Wendell
Hey Jerry, I think standalone mode will still add more features over time, but the goal isn't really for it to become equivalent to what Mesos/YARN are today. Or at least, I doubt Spark Standalone will ever attempt to manage _other_ frameworks outside of Spark and become a general purpose

Re: Job priority

2015-01-11 Thread Patrick Wendell
Priority scheduling isn't something we've supported in Spark and we've opted to support FIFO and Fair scheduling and asked users to try and fit these to the needs of their applications. In practice from what I've seen of priority schedulers, such as the linux CPU scheduler, is that strict

Re: Spark development with IntelliJ

2015-01-08 Thread Patrick Wendell
Nick - yes. Do you mind moving it? I should have put it in the Contributing to Spark page. On Thu, Jan 8, 2015 at 3:22 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Side question: Should this section

Re: Spark development with IntelliJ

2015-01-08 Thread Patrick Wendell
Actually I went ahead and did it. On Thu, Jan 8, 2015 at 10:25 PM, Patrick Wendell pwend...@gmail.com wrote: Nick - yes. Do you mind moving it? I should have put it in the Contributing to Spark page. On Thu, Jan 8, 2015 at 3:22 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Side

Re: Mail to u...@spark.apache.org failing

2015-02-09 Thread Patrick Wendell
Ah - we should update it to suggest mailing the dev@ list (and if there is enough traffic maybe do something else). I'm happy to add you if you can give an organization name, URL, a list of which Spark components you are using, and a short description of your use case.. On Mon, Feb 9, 2015 at

Re: New Metrics Sink class not packaged in spark-assembly jar

2015-02-09 Thread Patrick Wendell
Hi Judy, If you have added source files in the sink/ source folder, they should appear in the assembly jar when you build. One thing I noticed is that you are looking inside the /dist folder. That only gets populated if you run make-distribution. The normal development process is just to do mvn

Re: Powered by Spark: Concur

2015-02-10 Thread Patrick Wendell
Thanks Paolo - I've fixed it. On Mon, Feb 9, 2015 at 11:10 PM, Paolo Platter paolo.plat...@agilelab.it wrote: Hi, I checked the powered by wiki too and Agile Labs should be Agile Lab. The link is wrong too, it should be www.agilelab.it. The description is correct. Thanks a lot Paolo

Re: Keep or remove Debian packaging in Spark?

2015-02-09 Thread Patrick Wendell
I have wondered whether we should sort of deprecated it more officially, since otherwise I think people have the reasonable expectation based on the current code that Spark intends to support complete Debian packaging as part of the upstream build. Having something that's sort-of maintained but no

Re: multi-line comment style

2015-02-09 Thread Patrick Wendell
can find Thanks Shivaram [1] https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell pwend...@gmail.com wrote: Personally I have no opinion, but agree it would be nice to standardize. - Patrick

[ANNOUNCE] Apache Spark 1.2.1 Released

2015-02-09 Thread Patrick Wendell
Hi All, I've just posted the 1.2.1 maintenance release of Apache Spark. We recommend all 1.2.0 users upgrade to this release, as this release includes stability fixes across all components of Spark. - Download this release: http://spark.apache.org/downloads.html - View the release notes:

Re: Spark UI history job duration is wrong

2015-01-05 Thread Patrick Wendell
Thanks for reporting this - it definitely sounds like a bug. Please open a JIRA for it. My guess is that we define the start or end time of the job based on the current time instead of looking at data encoded in the underlying event stream. That would cause it to not work properly when loading

Re: [Performance] Possible regression in rdd.take()?

2015-02-18 Thread Patrick Wendell
I believe the heuristic governing the way that take() decides to fetch partitions changed between these versions. It could be that in certain cases the new heuristic is worse, but it might be good to just look at the source code and see, for your number of elements taken and number of partitions,

Re: Wrong version on the Spark documentation page

2015-03-15 Thread Patrick Wendell
Cheng - what if you hold shift+refresh? For me the /latest link correctly points to 1.3.0 On Sun, Mar 15, 2015 at 10:40 AM, Cheng Lian lian.cs@gmail.com wrote: It's still marked as 1.2.1 here http://spark.apache.org/docs/latest/ But this page is updated (1.3.0)

Re: enum-like types in Spark

2015-03-16 Thread Patrick Wendell
{ private[this] case object _MemoryOnly extends StorageLevel final val MemoryOnly: StorageLevel = _MemoryOnly private[this] case object _DiskOnly extends StorageLevel final val DiskOnly: StorageLevel = _DiskOnly } On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell pwend...@gmail.com wrote: I

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Patrick Wendell
on yarn on hadoop 2.6 in cluster and client mode. Tom On Thursday, March 5, 2015 8:53 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.3.0! The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4): https://git-wip

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
Hey Sean, SPARK-5310 Update SQL programming guide for 1.3 SPARK-5183 Document data source API SPARK-6128 Update Spark Streaming Guide for Spark 1.3 For these, the issue is that they are documentation JIRA's, which don't need to be timed exactly with the release vote, since we can update the

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
affects a subset of build profiles. On Fri, Mar 6, 2015 at 6:43 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Sean, SPARK-5310 Update SQL programming guide for 1.3 SPARK-5183 Document data source API SPARK-6128 Update Spark Streaming Guide for Spark 1.3 For these, the issue

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Patrick Wendell
We probably want to revisit the way we do binaries in general for 1.4+. IMO, something worth forking a separate thread for. I've been hesitating to add new binaries because people (understandably) complain if you ever stop packaging older ones, but on the other hand the ASF has complained that we

Re: Block Transfer Service encryption support

2015-03-08 Thread Patrick Wendell
I think that yes, longer term we want to have encryption of all communicated data. However Jeff, can you open a JIRA to discuss the design before opening a pull request (it's fine to link to a WIP branch if you'd like)? I'd like to better understand the performance and operational complexity of

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Patrick Wendell
. There the vendors can add the latest downloads - for example when 1.4 is released, HDP can build a release of HDP Spark 1.4 bundle. Cheers k/ On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell pwend...@gmail.com wrote: We probably want to revisit the way we do binaries in general for 1.4+. IMO, something

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
, Mar 6, 2015 at 9:17 PM, Patrick Wendell pwend...@gmail.com wrote: Sean, The docs are distributed and consumed in a fundamentally different way than Spark code itself. So we've always considered the deadline for doc changes to be when the release is finally posted. If there are small

[ANNOUNCE] Announcing Spark 1.3!

2015-03-13 Thread Patrick Wendell
Hi All, I'm happy to announce the availability of Spark 1.3.0! Spark 1.3.0 is the fourth release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 172 developers and more than 1,000 commits! Visit the release notes [1] to read about the new features, or

Re: hadoop input/output format advanced control

2015-03-25 Thread Patrick Wendell
and pullreq when i have some time. On Wed, Mar 25, 2015 at 1:23 AM, Patrick Wendell pwend...@gmail.com wrote: I see - if you look, in the saving functions we have the option for the user to pass an arbitrary Configuration. https://github.com/apache/spark/blob/master/core/src/main/scala/org

Re: [VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-04 Thread Patrick Wendell
consider https://issues.apache.org/jira/browse/SPARK-6144 a serious regression from 1.2 (since it affects existing addFile() functionality if the URL is hdfs:...). Will test other parts separately. On Tue, Mar 3, 2015 at 8:19 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote

Re: Task result is serialized twice by serializer and closure serializer

2015-03-04 Thread Patrick Wendell
Hey Mingyu, I think it's broken out separately so we can record the time taken to serialize the result. Once we serializing it once, the second serialization should be really simple since it's just wrapping something that has already been turned into a byte buffer. Do you see a specific issue

Re: enum-like types in Spark

2015-03-04 Thread Patrick Wendell
I like #4 as well and agree with Aaron's suggestion. - Patrick On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson ilike...@gmail.com wrote: I'm cool with #4 as well, but make sure we dictate that the values should be defined within an object with the same name as the enumeration (like we do for

Re: enum-like types in Spark

2015-03-05 Thread Patrick Wendell
= _MemoryOnly private[this] case object _DiskOnly extends StorageLevel final val DiskOnly: StorageLevel = _DiskOnly } On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell pwend...@gmail.com wrote: I like #4 as well and agree with Aaron's suggestion. - Patrick On Wed, Mar 4, 2015

Re: Task result is serialized twice by serializer and closure serializer

2015-03-04 Thread Patrick Wendell
for the serialized task result shouldn¹t account for the majority of memory footprint anyways, I¹m okay with leaving it as is, then. Thanks, Mingyu On 3/4/15, 5:07 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Mingyu, I think it's broken out separately so we can record the time taken

[VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-03 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.3.0! The tag to be voted on is v1.3.0-rc2 (commit 3af2687): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3af26870e5163438868c4eb2df88380a533bb232 The release files, including signatures, digests, etc.

[RESULT] [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-03-03 Thread Patrick Wendell
, Patrick Wendell pwend...@gmail.com wrote: Hey All, Just a quick updated on this thread. Issues have continued to trickle in. Not all of them are blocker level but enough to warrant another RC: I've been keeping the JIRA dashboard up and running with the latest status (sorry, long link): https

Re: spark-ec2 default to Hadoop 2

2015-03-01 Thread Patrick Wendell
Yeah calling it Hadoop 2 was a very bad naming choice (of mine!), this was back when CDH4 was the only real distribution available with some of the newer Hadoop API's and packaging. I think to not surprise people using this, it's best to keep v1 as the default. Overall, we try not to change

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-23 Thread Patrick Wendell
So actually, the list of blockers on JIRA is a bit outdated. These days I won't cut RC1 unless there are no known issues that I'm aware of that would actually block the release (that's what the snapshot ones are for). I'm going to clean those up and push others to do so also. The main issues I'm

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-23 Thread Patrick Wendell
, Patrick Wendell pwend...@gmail.com wrote: So actually, the list of blockers on JIRA is a bit outdated. These days I won't cut RC1 unless there are no known issues that I'm aware of that would actually block the release (that's what the snapshot ones are for). I'm going to clean those up and push

Re: UnusedStubClass in 1.3.0-rc1

2015-02-25 Thread Patrick Wendell
This has been around for multiple versions of Spark, so I am a bit surprised to see it not working in your build. - Patrick On Wed, Feb 25, 2015 at 9:41 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Cody, What build command are you using? In any case, we can actually comment out

Re: UnusedStubClass in 1.3.0-rc1

2015-02-25 Thread Patrick Wendell
Hey Cody, What build command are you using? In any case, we can actually comment out the unused thing now in the root pom.xml. It existed just to ensure that at least one dependency was listed in the shade plugin configuration (otherwise, some work we do that requires the shade plugin does not

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-25 Thread Patrick Wendell
:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) any ideas on this? Tom On Wednesday, February 18, 2015 2:14 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote

Re: Any guidance on when to back port and how far?

2015-03-24 Thread Patrick Wendell
My philosophy has been basically what you suggested, Sean. One thing you didn't mention though is if a bug fix seems complicated, I will think very hard before back-porting it. This is because fixes can introduce their own new bugs, in some cases worse than the original issue. It's really bad to

Re: hadoop input/output format advanced control

2015-03-24 Thread Patrick Wendell
Yeah - to Nick's point, I think the way to do this is to pass in a custom conf when you create a Hadoop RDD (that's AFAIK why the conf field is there). Is there anything you can't do with that feature? On Tue, Mar 24, 2015 at 11:50 AM, Nick Pentreath nick.pentre...@gmail.com wrote: Imran, on

Experience using binary packages on various Hadoop distros

2015-03-24 Thread Patrick Wendell
Hey All, For a while we've published binary packages with different Hadoop client's pre-bundled. We currently have three interfaces to a Hadoop cluster (a) the HDFS client (b) the YARN client (c) the Hive client. Because (a) and (b) are supposed to be backwards compatible interfaces. My working

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
I'll kick it off with a +1. On Thu, Mar 5, 2015 at 6:52 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.3.0! The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4): https://git-wip-us.apache.org/repos/asf?p=spark.git

Re: hadoop input/output format advanced control

2015-03-25 Thread Patrick Wendell
Great - that's even easier. Maybe we could have a simple example in the doc. On Wed, Mar 25, 2015 at 7:06 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Regarding Patrick's question, you can just do new Configuration(oldConf) to get a cloned Configuration object and add any new properties to it.

Re: enum-like types in Spark

2015-03-23 Thread Patrick Wendell
If the official solution from the Scala community is to use Java enums, then it seems strange they aren't generated in scaldoc? Maybe we can just fix that w/ Typesafe's help and then we can use them. On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Yeah the fully realized #4,

Re: Unit test logs in Jenkins?

2015-04-01 Thread Patrick Wendell
Hey Marcelo, Great question. Right now, some of the more active developers have an account that allows them to log into this cluster to inspect logs (we copy the logs from each run to a node on that cluster). The infrastructure is maintained by the AMPLab. I will put you in touch the someone

Re: Spark 1.2.2 prebuilt release for Hadoop 2.4 didn't get deployed

2015-04-21 Thread Patrick Wendell
Good catch Olivier - I'll take care of it. Tracking this on SPARK-7027. On Tue, Apr 21, 2015 at 6:06 AM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Hi everyone, It seems the some of the Spark 1.2.2 prebuilt versions (I tested mainly for Hadoop 2.4 and later) didn't get deploy on

Re: Is spark-ec2 for production use?

2015-04-21 Thread Patrick Wendell
It could be a good idea to document this a bit. The original goals were to give people an easy way to get started with Spark and also to provide a consistent environment for our own experiments and benchmarking of Spark at the AMPLab. Over time I've noticed a huge amount of scope increase in terms

Reminder about Spark 1.4.0 deadline of May 1st

2015-04-25 Thread Patrick Wendell
Hey All, Just a friendly reminder that May 1st is the feature freeze for Spark 1.4, meaning major outstanding changes will need to land in the next week. After May 1st we'll package a release for testing and then go into the normal triage process where bugs are prioritized and some smaller

Re: Contributing Documentation Changes

2015-04-25 Thread Patrick Wendell
It is true that in the past we've posted community tutorials on the site. Spark has grown a lot since then and it might be a better fit at this point to curate community tutorials on the wiki (something like the powered by page) and link to them from the documentation website. The documentation

Re: WebUI shows poor locality when task scheduling

2015-04-26 Thread Patrick Wendell
Hi Eric - please direct this to the user@ list. This list is for development of Spark itself. On Sun, Apr 26, 2015 at 1:12 AM, eric wong win19...@gmail.com wrote: Hi developers, I have sent to user mail list but no response... When running a exprimental KMeans job for expriment, the

Re: Design docs: consolidation and discoverability

2015-04-26 Thread Patrick Wendell
are on Google Docs. Perhaps Apache should consider opening up parts of the wiki to a larger group, to better serve this use case. Punya On Fri, Apr 24, 2015 at 5:01 PM Patrick Wendell pwend...@gmail.com wrote: Using our ASF git repository as a working area for design docs, it seems potentially

Re: Should we let everyone set Assignee?

2015-04-22 Thread Patrick Wendell
One over arching issue is that it's pretty unclear what Assigned to X in JIAR means from a process perspective. Personally I actually feel it's better for this to be more historical - i.e. who ended up submitting a patch for this feature that was merged - rather than creating an exclusive

Re: Should we let everyone set Assignee?

2015-04-24 Thread Patrick Wendell
It's a bit of a digression - but Steve's suggestion that we have a mailing list for new issues is a great idea and we can do it easily. We could nave new-issues@s.a.o or something (we already have issues@s.a.o). - Patrick On Fri, Apr 24, 2015 at 9:50 AM, Ted Yu yuzhih...@gmail.com wrote: bq.

Re: Should we let everyone set Assignee?

2015-04-22 Thread Patrick Wendell
: Agreed. The Spark project and community that Vinod describes do not resemble the ones with which I am familiar. On Wed, Apr 22, 2015 at 1:20 PM, Patrick Wendell pwend...@gmail.com wrote: Hi Vinod, Thanks for you thoughts - However, I do not agree with your sentiment

Re: Should we let everyone set Assignee?

2015-04-22 Thread Patrick Wendell
at Apache. +Vinod On Apr 22, 2015, at 12:32 PM, Patrick Wendell pwend...@gmail.com wrote: One over arching issue is that it's pretty unclear what Assigned to X in JIAR means from a process perspective. Personally I actually feel it's better for this to be more historical - i.e. who ended up

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Patrick Wendell
Using our ASF git repository as a working area for design docs, it seems potentially concerning to me. It's difficult process wise because all commits need to go through committers and also, we'd pollute our git history a lot with random incremental design updates. The git history is used a lot

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Patrick Wendell
I'd also support this. In general, I think it's good that we try to have Spark support different versions of things (Hadoop, Hive, etc). But at some point you need to weigh the costs of doing so against the number of users affected. In the case of Java 6, we are seeing increasing cost from this.

[ANNOUNCE] Spark branch-1.4

2015-05-04 Thread Patrick Wendell
Hi Devs, Just an announcement that I've cut Spark's branch 1.4 to form the basis of the 1.4 release. Other than a few stragglers, this represents the end of active feature development for Spark 1.4. Per usual, if committers are merging any features, please be in touch so I can help coordinate.

Re: [discuss] ending support for Java 6?

2015-05-04 Thread Patrick Wendell
If we just set JAVA_HOME in dev/run-test-jenkins, I think it should work. On Mon, May 4, 2015 at 7:20 PM, shane knapp skn...@berkeley.edu wrote: ...and now the workers all have java6 installed. https://issues.apache.org/jira/browse/SPARK-1437 sadly, the built-in jenkins jdk management

Re: Mima test failure in the master branch?

2015-04-30 Thread Patrick Wendell
I reverted the patch that I think was causing this: SPARK-5213 Thanks On Thu, Apr 30, 2015 at 7:59 PM, zhazhan zzh...@hortonworks.com wrote: Any PR open for this? -- View this message in context:

Re: What is the location in the source code of the computation of the elements in a map transformation?

2015-05-02 Thread Patrick Wendell
Maybe I can help a bit. What happens when you call .map(my func) is that you create a MapPartitionsRDD that has a reference to that closure in it's compute() function. When a job is run (jobs are run as the result of RDD actions):

Thanking Test Partners

2015-05-04 Thread Patrick Wendell
Hey All, Community testing during the QA window is an important part of the release cycle in Spark. It helps us deliver higher quality releases by vetting out issues not covered by our unit tests. I was thinking that from now on, it would be nice to recognize the organizations that donate time

Re: [discuss] ending support for Java 6?

2015-05-05 Thread Patrick Wendell
If there is broad consensus here to drop Java 1.6 in Spark 1.5, should we do an ANNOUNCE to user and dev? On Mon, May 4, 2015 at 7:24 PM, shane knapp skn...@berkeley.edu wrote: sgtm On Mon, May 4, 2015 at 11:23 AM, Patrick Wendell pwend...@gmail.com wrote: If we just set JAVA_HOME in dev/run

Pull request builder errors (taking Jenkins worker 3 offline)

2015-05-05 Thread Patrick Wendell
For unknown reasons, pull requests on Jenkins worker 3 have been failing with an exception[1]. After trying to fix this by clearing the ivy and maven caches on the node, I've given up and simply blacklisted that worker. [error] oro#oro;2.0.8!oro.jar origin location must be absolute:

Adding/Using More Resolution Types on JIRA

2015-05-12 Thread Patrick Wendell
In Spark we sometimes close issues as something other than Fixed, and this is an important part of maintaining our JIRA. The current resolution types we use are the following: Won't Fix - bug fix or (more often) feature we don't want to add Invalid - issue is underspecified or not appropriate

[IMPORTANT] Committers please update merge script

2015-05-12 Thread Patrick Wendell
Due to an ASF infrastructure change (bug?) [1] the default JIRA resolution status has switched to Pending Closed. I've made a change to our merge script to coerce the correct status of Fixed when resolving [2]. Please upgrade the merge script to master. I've manually corrected JIRA's that were

Re: How to link code pull request with JIRA ID?

2015-05-14 Thread Patrick Wendell
Yeah I wrote the original script and I intentionally made it easy for other projects to use (you'll just need to tweak some variables at the top). You just need somewhere to run it... we were using a jenkins cluster to run it every 5 minutes. BTW - I looked and there is one instance where it hard

Re: Tentative due dates for Spark 1.3.2 release

2015-05-15 Thread Patrick Wendell
Hi Niranda, Maintenance releases are not done on a predetermined schedule but instead according to which fixes show up and their severity. Since we just did a 1.3.1 release I'm not sure I see 1.3.2 on the immediate horizon. However, the maintenance releases are simply builds at the head of the

Re: [IMPORTANT] Committers please update merge script

2015-05-13 Thread Patrick Wendell
Hi All - unfortunately the fix introduced another bug, which is that fixVersion was not updated properly. I've updated the script and had one other person test it. So committers please pull from master again thanks! - Patrick On Tue, May 12, 2015 at 6:25 PM, Patrick Wendell pwend...@gmail.com

Re: Change for submitting to yarn in 1.3.1

2015-05-13 Thread Patrick Wendell
Sent from my iPad On May 12, 2015, at 20:54, Patrick Wendell pwend...@gmail.com wrote: Hey Kevin and Ron, So is the main shortcoming of the launcher library the inability to get an app ID back from YARN? Or are there other issues here that fundamentally regress things for you

Re: Adding/Using More Resolution Types on JIRA

2015-05-15 Thread Patrick Wendell
it mean too many TODOs are filed and forgotten? That's no comment on the current state, just something to watch. So: yes I like the idea. On May 12, 2015 8:50 AM, Patrick Wendell pwend...@gmail.com wrote: In Spark we sometimes close issues as something other than Fixed

Re: Recent Spark test failures

2015-05-15 Thread Patrick Wendell
The PR builder currently builds against Hadoop 2.3. - Patrick On Fri, May 15, 2015 at 11:40 AM, Marcelo Vanzin van...@cloudera.com wrote: Funny thing, since I asked this question in a PR a few minutes ago... Ignoring the rotation suggestion for a second, can the PR builder at least cover

Re: Recent Spark test failures

2015-05-15 Thread Patrick Wendell
, Patrick Wendell pwend...@gmail.com wrote: The PR builder currently builds against Hadoop 2.3. - Patrick On Fri, May 15, 2015 at 11:40 AM, Marcelo Vanzin van...@cloudera.com wrote: Funny thing, since I asked this question in a PR a few minutes ago... Ignoring the rotation suggestion for a second

[RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-16 Thread Patrick Wendell
on OS X +1 Sean On Apr 14, 2015, at 10:59 PM, Patrick Wendell pwend...@gmail.com wrote: I'd like to close this vote to coincide with the 1.3.1 release, however, it would be great to have more people test this release first. I'll leave it open for a bit longer and see if others can give

Re: [VOTE] Release Apache Spark 1.2.2

2015-04-14 Thread Patrick Wendell
I'd like to close this vote to coincide with the 1.3.1 release, however, it would be great to have more people test this release first. I'll leave it open for a bit longer and see if others can give a +1. On Tue, Apr 14, 2015 at 9:55 PM, Patrick Wendell pwend...@gmail.com wrote: +1 from me ass

Re: [VOTE] Release Apache Spark 1.3.1 (RC3)

2015-04-14 Thread Patrick Wendell
+1 from myself as well On Mon, Apr 13, 2015 at 8:35 PM, GuoQiang Li wi...@qq.com wrote: +1 (non-binding) -- Original -- From: Patrick Wendell;pwend...@gmail.com; Date: Sat, Apr 11, 2015 02:05 PM To: dev@spark.apache.orgdev@spark.apache.org; Subject

[RESULT] [VOTE] Release Apache Spark 1.3.1 (RC3)

2015-04-14 Thread Patrick Wendell
This vote passes with 10 +1 votes (5 binding) and no 0 or -1 votes. +1: Sean Owen* Reynold Xin* Krishna Sankar Denny Lee Mark Hamstra* Sean McNamara* Sree V Marcelo Vanzin GuoQiang Li Patrick Wendell* 0: -1: I will work on packaging this release in the next 48 hours. - Patrick

Re: [VOTE] Release Apache Spark 1.2.2

2015-04-14 Thread Patrick Wendell
,1/14/15 SPARK-4888,Spark EC2 doesn't mount local disks for i2.8xlarge instances,,Open,1/27/15 SPARK-4879,Missing output partitions after job completes with speculative execution,Josh Rosen,Open,3/5/15 SPARK-4568,Publish release candidates under $VERSION-RCX instead of $VERSION,Patrick Wendell

Announcing Spark 1.3.1 and 1.2.2

2015-04-17 Thread Patrick Wendell
Hi All, I'm happy to announce the Spark 1.3.1 and 1.2.2 maintenance releases. We recommend all users on the 1.3 and 1.2 Spark branches upgrade to these releases, which contain several important bug fixes. Download Spark 1.3.1 or 1.2.2: http://spark.apache.org/downloads.html Release notes:

Re: [VOTE] Release Apache Spark 1.3.1

2015-04-06 Thread Patrick Wendell
,,Open,3/24/15 SPARK-5098,Number of running tasks become negative after tasks lost,,Open,1/14/15 SPARK-4925,Publish Spark SQL hive-thriftserver maven artifact,Patrick Wendell,Reopened,3/23/15 SPARK-4922,Support dynamic allocation for coarse-grained Mesos,,Open,3/31/15 SPARK-4888,Spark EC2

Re: 1.3 Build Error with Scala-2.11

2015-04-06 Thread Patrick Wendell
as possible. On Mon, Apr 6, 2015 at 7:31 PM Patrick Wendell pwend...@gmail.com wrote: What if you don't run zinc? I.e. just download maven and run that mvn package It might take longer, but I wonder if it will work. On Mon, Apr 6, 2015 at 10:26 PM, mjhb sp...@mjhb.com wrote: Similar

Re: 1.3 Build Error with Scala-2.11

2015-04-06 Thread Patrick Wendell
attempt. Trying to build as clean as possible. On Mon, Apr 6, 2015 at 7:31 PM Patrick Wendell pwend...@gmail.com wrote: What if you don't run zinc? I.e. just download maven and run that mvn package It might take longer, but I wonder if it will work. On Mon, Apr 6, 2015 at 10:26 PM, mjhb

Re: 1.3 Build Error with Scala-2.11

2015-04-06 Thread Patrick Wendell
What if you don't run zinc? I.e. just download maven and run that mvn package It might take longer, but I wonder if it will work. On Mon, Apr 6, 2015 at 10:26 PM, mjhb sp...@mjhb.com wrote: Similar problem on 1.2 branch: [ERROR] Failed to execute goal on project spark-core_2.11: Could not

<    1   2   3   4   5   6   >