Re: Python Spark Improvements (forked from Spark Improvement Proposals)

2016-10-31 Thread Holden Karau
I believe Bryan is also working on this a little - and I'm a little busy with the other stuff but would love to stay in the loop on Arrow progress :) On Monday, October 31, 2016, mariusvniekerk wrote: > So i've been working on some very very early stage apache arrow

Re: Python Spark Improvements (forked from Spark Improvement Proposals)

2016-10-31 Thread mariusvniekerk
So i've been working on some very very early stage apache arrow integration. My current plan it to emulate some of how the R function execution works. If there are any other people working on similar efforts it would be good idea to combine efforts. I can see how much effort is involved in

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-31 Thread Denny Lee
Oh, I forgot to note that when downloading and running against the Spark 2.0.2 without Hadoop binaries, I got a JNI error due to an exception with org / slf4j / logger (i.e. org.slf4j.Logger class is not found). On Mon, Oct 31, 2016 at 4:35 PM Reynold Xin wrote: > OK I

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-31 Thread Reynold Xin
OK I will cut a new RC tomorrow. Any other issues people have seen? On Fri, Oct 28, 2016 at 2:58 PM, Shixiong(Ryan) Zhu wrote: > -1. > > The history server is broken because of some refactoring work in > Structured Streaming:

Re: Odp.: Spark Improvement Proposals

2016-10-31 Thread Marcelo Vanzin
The proposal looks OK to me. I assume, even though it's not explicitly called, that voting would happen by e-mail? A template for the proposal document (instead of just a bullet nice) would also be nice, but that can be done at any time. BTW, shameless plug: I filed SPARK-18085 which I consider a

Re: Odp.: Spark Improvement Proposals

2016-10-31 Thread Ryan Blue
I agree, we should push forward on this. I think there is enough consensus to call a vote, unless someone else thinks that there is more to discuss? rb On Mon, Oct 31, 2016 at 10:34 AM, Cody Koeninger wrote: > Now that spark summit europe is over, are any committers

Updating Parquet dep to 1.9

2016-10-31 Thread Michael Allman
Hi All, Is anyone working on updating Spark's Parquet library dep to 1.9? If not, I can at least get started on it and publish a PR. Cheers, Michael - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Issue with repartition and cache

2016-10-31 Thread ankits
Hi, Did you ever figure this one out? I'm seeing the same behavior: Calling cache() after a repartition() makes Spark cache the version of the RDD BEFORE the repartition, which means a shuffle everytime it is accessed.. However, calling cache before the repartition() seems to work fine, the

Re: JIRA Components for Streaming

2016-10-31 Thread Reynold Xin
Maybe just streaming or SS in GitHub? On Monday, October 31, 2016, Cody Koeninger wrote: > Makes sense to me. > > I do wonder if e.g. > > [SPARK-12345][STRUCTUREDSTREAMING][KAFKA] > > is going to leave any room in the Github PR form for actual title content? > > On Mon, Oct

Re: JIRA Components for Streaming

2016-10-31 Thread Cody Koeninger
Makes sense to me. I do wonder if e.g. [SPARK-12345][STRUCTUREDSTREAMING][KAFKA] is going to leave any room in the Github PR form for actual title content? On Mon, Oct 31, 2016 at 1:37 PM, Michael Armbrust wrote: > I'm planning to do a little maintenance on JIRA to

JIRA Components for Streaming

2016-10-31 Thread Michael Armbrust
I'm planning to do a little maintenance on JIRA to hopefully improve the visibility into the progress / gaps in Structured Streaming. In particular, while we share a lot of optimization / execution logic with SQL, the set of desired features and bugs is fairly different. Proposal: - Structured

Re: Odp.: Spark Improvement Proposals

2016-10-31 Thread Cody Koeninger
Now that spark summit europe is over, are any committers interested in moving forward with this? https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md Or are we going to let this discussion die on the vine? On Mon, Oct 17, 2016 at 10:05 AM, Tomasz Gawęda

Re: Interesting in contributing to spark

2016-10-31 Thread Reynold Xin
Welcome! This is the best guide to get started: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Mon, Oct 31, 2016 at 5:09 AM, Zak H wrote: > Hi, > > I'd like to introduce myself. My name is Zak and I'm a software engineer. > I'm interested

Interesting in contributing to spark

2016-10-31 Thread Zak H
Hi, I'd like to introduce myself. My name is Zak and I'm a software engineer. I'm interested in contributing to spark as a way to learn more. I've signed up to the mailing list and hope to learn more about spark. What do you recommend I start on as my first bug ? I have a working knowledge of

Re: Spark has a compile dependency on scalatest

2016-10-31 Thread Sean Owen
SBT and Maven resolution rules do differ. I thought SBT was generally latest-first though, which should make 3.0 take priority. Maven is more like closest-first, which means you can pretty much always override things in your own build. An exclusion is the right way to go in this case because the