Starting the process for a Spark 2.1.2 Release

2017-09-12 Thread Holden Karau
Hi Spark Developers, After the discussion around the need for a Spark 2.1.2 release I'd like to start get the ball rolling. If you are a developer on a specific component in Spark now is a good time to look and see if there are any important bug fixes that should be back ported into 2.1.2 and

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-12 Thread Michael Armbrust
In the checkpoint directory there is a file /offsets/$batchId that holds the offsets serialized as JSON. I would not consider this a public stable API though. Really the only important thing to get exactly once is that you must ensure whatever operation you are doing downstream is idempotent

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-12 Thread Dmitry Naumenko
Thanks for response, Michael > You should still be able to get exactly once processing by using the batchId that is passed to the Sink. Could you explain this in more detail, please? Is there some kind of offset manager API that works as get-offset by batch id lookup table? Dmitry 2017-09-12

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-12 Thread Michael Armbrust
I think that we are going to have to change the Sink API as part of SPARK-20928 , which is why I linked these tickets together. I'm still targeting an initial version for Spark 2.3 which should happen sometime towards the end of the year.

Re: 2.1.2 maintenance release?

2017-09-12 Thread Holden Karau
Sounds good. I have a little more experience with the Jenkins jobs packaging from helping debug the Python packaging issues so I'll get started and look at updating the docs as I go until I get stuck. On Tue, Sep 12, 2017 at 2:29 AM Sean Owen wrote: > I think you could just

Re: 2.1.2 maintenance release?

2017-09-12 Thread Sean Owen
I think you could just dive in to the steps at http://spark.apache.org/release-process.html and see how far you get before you need assistance to execute steps like tagging and publishing artifacts. I think a secondary goal of this process is to update and expand those release documents, as a

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-12 Thread Takuya UESHIN
This vote passes with 4 binding +1 votes, 6 non-binding votes, no +0 vote, and no -1 votes. Thanks all! +1 votes (binding): Reynold Xin Wenchen Fan Yin Huai Matei Zaharia +1 votes (non-binding): Felix Cheung Bryan Cutler Sameer Agarwal Hyukjin Kwon Xiao Li Liang-Chi Hsieh On Tue, Sep 12,

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-12 Thread Dmitry Naumenko
Thanks, Cody Unfortunately, it seems to be there is no active development right now. Maybe I can step in and help with it somehow? Dmitry 2017-09-11 21:01 GMT+03:00 Cody Koeninger : > https://issues-test.apache.org/jira/browse/SPARK-18258 > > On Mon, Sep 11, 2017 at 7:15