Spark 2.0 vote for RC5 passed last Friday night so it will probably be released early this week if I had to guess.
On Mon, Jul 25, 2016 at 12:23 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> wrote: > All, > > I had three questions: > > (1) Is there a timeline for stable Spark 2.0 release? I know the > 'preview' build is out there, but was curious what the timeline was for > full release. Jira seems to indicate that there should be a release 7/27. > > (2) For 'continuous' datasets there has been a lot of discussion. One > item that came up in tickets was the idea that 'count()' and other > functions do not apply to continuous datasets: > https://github.com/apache/spark/pull/12080. In this case what is the > intended procedure to calculate a streaming statistic based on an interval > (e.g. count the number of records in a 2 minute window every 2 minutes)? > > (3) In previous releases (1.6.1) the call to DStream / RDD repartition w/ > a number of partitions set to zero silently deletes data. I have looked in > Jira for a similar issue, but I do not see one. I would like to address > this (and would likely be willing to go fix it myself). Should I just > create a ticket? > > Thank you, > > Bryan Jeffrey > > -- Pedro Rodriguez PhD Student in Distributed Machine Learning | CU Boulder UC Berkeley AMPLab Alumni ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 Github: github.com/EntilZha | LinkedIn: https://www.linkedin.com/in/pedrorodriguezscience