Re: Watermarking in Structured Streaming to drop late data

2016-10-27 Thread kostas papageorgopoylos
Hi all I would highly recommend to all users-devs interested in the design suggestions / discussions for Structured Streaming Spark API watermarking to take a look on the following links along with the design document. It would help to understand the notions of watermark , out of order data and po

[VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Reynold Xin
Greetings from Spark Summit Europe at Brussels. Please vote on releasing the following candidate as Apache Spark version 2.0.2. The vote is open until Sun, Oct 30, 2016 at 00:30 PDT and passes if a majority of at least 3+1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.0.2 [ ]

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Herman van Hövell tot Westerflier
+1 On Thu, Oct 27, 2016 at 9:18 AM, Reynold Xin wrote: > Greetings from Spark Summit Europe at Brussels. > > Please vote on releasing the following candidate as Apache Spark version > 2.0.2. The vote is open until Sun, Oct 30, 2016 at 00:30 PDT and passes if > a majority of at least 3+1 PMC vote

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-27 Thread Sean Owen
Seems OK by me. How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now. On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers wrote: > that sounds good to me > > On Wed, Oct 26, 2016 at 2:26 PM, Reynold

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-27 Thread Steve Loughran
On 27 Oct 2016, at 10:03, Sean Owen mailto:so...@cloudera.com>> wrote: Seems OK by me. How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now. If you go to java 8 only, then hadoop 2.6+ is ma

Re: Watermarking in Structured Streaming to drop late data

2016-10-27 Thread Ofir Manor
Assaf, I think you are using the term "window" differently than Structured Streaming,... Also, you didn't consider groupBy. Here is an example: I want to maintain, for every minute over the last six hours, a computation (trend or average or stddev) on a five-minute window (from t-4 to t). So, 1. My

Re: Watermarking in Structured Streaming to drop late data

2016-10-27 Thread Tathagata Das
Hello Assaf, I think you are missing the fact that we want to compute over event-time of the data (e.g. data generation time), which may arrive at Spark out-of-order and late. And we want to aggregate over late data. The watermark is an estimate made by the system that there wont be any data later

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-27 Thread Reynold Xin
I created a JIRA ticket to track this: https://issues.apache.org/jira/browse/SPARK-18138 On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran wrote: > > On 27 Oct 2016, at 10:03, Sean Owen wrote: > > Seems OK by me. > How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like > to a

RE: Watermarking in Structured Streaming to drop late data

2016-10-27 Thread assaf.mendelson
Thanks. This article is excellent. It completely explains everything. I would add it as a reference to any and all explanations of structured streaming (and in the case of watermarking, I simply didn’t understand the definition before reading this). Thanks, Assaf. From: kostas

Re: Watermarking in Structured Streaming to drop late data

2016-10-27 Thread Tathagata Das
Assaf, thanks for the feedback! On Thu, Oct 27, 2016 at 3:28 AM, assaf.mendelson wrote: > Thanks. > > This article is excellent. It completely explains everything. > > I would add it as a reference to any and all explanations of structured > streaming (and in the case of watermarking, I simply d

Spark 2.0 on HDP

2016-10-27 Thread Deenar Toraskar
Hi Has anyone tried running Spark 2.0 on HDP. I have managed to get around the issues with the timeline service (by turning it off), but now am stuck when the YARN cannot find org.apache.spark.deploy.yarn.ExecutorLauncher. Error: Could not find or load main class org.apache.spark.deploy.yarn.Exec

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Sean Owen
+1 from me. All the sigs and licenses and hashes check out. It builds and passes tests with -Phadoop-2.7 -Pyarn -Phive -Phive-thriftserver on Ubuntu 16 + Java 8. Here are the open issues for 2.0.2 BTW. No blockers, but some marked Critical FYI. Just making sure nobody really meant for one of these

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Koert Kuipers
+1 non binding compiled and unit tested in-house libraries against 2.0.2-rc1 successfully was able to build, deploy and launch on cdh 5.7 yarn cluster on a side note... these artifacts on staging repo having version 2.0.2 instead of 2.0.2-rc1 makes it somewhat dangerous to test against it in exi

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-27 Thread Yanbo Liang
+1 On Thu, Oct 27, 2016 at 3:15 AM, Reynold Xin wrote: > I created a JIRA ticket to track this: https://issues.apache. > org/jira/browse/SPARK-18138 > > > > On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran > wrote: > >> >> On 27 Oct 2016, at 10:03, Sean Owen wrote: >> >> Seems OK by me. >> How

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-27 Thread Matei Zaharia
Just to comment on this, I'm generally against removing these types of things unless they create a substantial burden on project contributors. It doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might, but then of course we need to wait for 2.12 to be out and stable. In genera

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-27 Thread Amit Tank
+1 for Matei's point. On Thursday, October 27, 2016, Matei Zaharia wrote: > Just to comment on this, I'm generally against removing these types of > things unless they create a substantial burden on project contributors. It > doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 mig

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Michael Armbrust
+1 On Oct 27, 2016 12:19 AM, "Reynold Xin" wrote: > Greetings from Spark Summit Europe at Brussels. > > Please vote on releasing the following candidate as Apache Spark version > 2.0.2. The vote is open until Sun, Oct 30, 2016 at 00:30 PDT and passes if > a majority of at least 3+1 PMC votes are

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Davies Liu
+1 On Thu, Oct 27, 2016 at 12:18 AM, Reynold Xin wrote: > Greetings from Spark Summit Europe at Brussels. > > Please vote on releasing the following candidate as Apache Spark version > 2.0.2. The vote is open until Sun, Oct 30, 2016 at 00:30 PDT and passes if a > majority of at least 3+1 PMC vote

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-27 Thread Davies Liu
+1 for Matei's point. On Thu, Oct 27, 2016 at 8:36 AM, Matei Zaharia wrote: > Just to comment on this, I'm generally against removing these types of > things unless they create a substantial burden on project contributors. It > doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 mi

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread vaquar khan
+1 On Thu, Oct 27, 2016 at 11:56 AM, Davies Liu wrote: > +1 > > On Thu, Oct 27, 2016 at 12:18 AM, Reynold Xin wrote: > > Greetings from Spark Summit Europe at Brussels. > > > > Please vote on releasing the following candidate as Apache Spark version > > 2.0.2. The vote is open until Sun, Oct

[JIRA] (SPARK-575) Maintain a cache of JARs on each node to avoid unnecessary copying

2016-10-27 Thread Anonymous (JIRA)
Title: Message Title Anonymous started wor

encoders for more complex types

2016-10-27 Thread Koert Kuipers
i have been pushing my luck a bit and started using ExpressionEncoder for more complex types like sequences of case classes etc. (where the case classes only had primitives and Strings). it all seems to work but i think the wheels come off in certain cases in the code generation. i guess this is n

Re: encoders for more complex types

2016-10-27 Thread Herman van Hövell tot Westerflier
What kind of difficulties are you experiencing? On Thu, Oct 27, 2016 at 9:57 PM, Koert Kuipers wrote: > i have been pushing my luck a bit and started using ExpressionEncoder for > more complex types like sequences of case classes etc. (where the case > classes only had primitives and Strings). >

Re: encoders for more complex types

2016-10-27 Thread Koert Kuipers
well i was using Aggregators that returned sequences of structs, or structs with sequences inside etc. and got compilation errors on the codegen. i didnt bother trying to reproduce it so far, or post it, since what i did was beyond the supposed usage anyhow. On Thu, Oct 27, 2016 at 4:02 PM, Herm

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-27 Thread Felix Cheung
+1 on Matei's. _ From: Davies Liu mailto:dav...@databricks.com>> Sent: Thursday, October 27, 2016 9:58 AM Subject: Re: Straw poll: dropping support for things like Scala 2.10 To: Matei Zaharia mailto:matei.zaha...@gmail.com>> Cc: Reynold Xin mailto:r...@databricks.com>

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Kousuke Saruta
+1 - Kousuke On 2016/10/28 2:07, vaquar khan wrote: +1 On Thu, Oct 27, 2016 at 11:56 AM, Davies Liu > wrote: +1 On Thu, Oct 27, 2016 at 12:18 AM, Reynold Xin mailto:r...@databricks.com>> wrote: > Greetings from Spark Summit Europe at Brussels.

Re: encoders for more complex types

2016-10-27 Thread Michael Armbrust
I would categorize these as bugs. We should (but probably don't fully yet) support arbitrary nesting as long as you use basic collections / case classes / primitives. Please do open JIRAs as you find problems. On Thu, Oct 27, 2016 at 1:05 PM, Koert Kuipers wrote: > well i was using Aggregators

Re: encoders for more complex types

2016-10-27 Thread Koert Kuipers
ok will do On Thu, Oct 27, 2016 at 4:51 PM, Michael Armbrust wrote: > I would categorize these as bugs. We should (but probably don't fully > yet) support arbitrary nesting as long as you use basic collections / case > classes / primitives. Please do open JIRAs as you find problems. > > On Thu

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-27 Thread Sean Owen
The burden may be a little more apparent when dealing with the day to day merging and fixing of breaks. The upside is maybe the more compelling argument though. For example, lambda-fying all the Java code, supporting java.time, and taking advantage of some newer Hadoop/YARN APIs is a moderate win f

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Dongjoon Hyun
+1 non-binding. Built and tested CentOS 6.6 / OpenJDK 1.8.0_111. Cheers, Dongjoon. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-27 Thread Ofir Manor
I totally agree with Sean, just a small correction: Java 7 and Python 2.6 are already deprecated since Spark 2.0 (after a lengthy discussion), so there is no need to discuss whether they should become deprecated in 2.1 http://spark.apache.org/releases/spark-release-2-0-0.html#deprecations The dis

Re: [VOTE] Release Apache Spark 1.6.3 (RC1)

2016-10-27 Thread Dongjoon Hyun
Hi, All. Last time, RC1 passed the tests with only the timezone testcase failure. Now, it's backported, too. I'm wondering if we have other issues to block releasing Apache Spark 1.6.3. Bests, Dongjoon. - To unsubscribe e-mail:

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Ricardo Almeida
+1 (non-binding) built and tested without regressions from 2.0.1. On 27 October 2016 at 19:07, vaquar khan wrote: > +1 > > > > On Thu, Oct 27, 2016 at 11:56 AM, Davies Liu > wrote: > >> +1 >> >> On Thu, Oct 27, 2016 at 12:18 AM, Reynold Xin >> wrote: >> > Greetings from Spark Summit Europe

Re: encoders for more complex types

2016-10-27 Thread Koert Kuipers
https://issues.apache.org/jira/browse/SPARK-18147 On Thu, Oct 27, 2016 at 4:55 PM, Koert Kuipers wrote: > ok will do > > On Thu, Oct 27, 2016 at 4:51 PM, Michael Armbrust > wrote: > >> I would categorize these as bugs. We should (but probably don't fully >> yet) support arbitrary nesting as lo

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Denny Lee
+1 (non-binding) On Thu, Oct 27, 2016 at 3:36 PM Ricardo Almeida < ricardo.alme...@actnowib.com> wrote: > +1 (non-binding) > > built and tested without regressions from 2.0.1. > > > > On 27 October 2016 at 19:07, vaquar khan wrote: > > +1 > > > > On Thu, Oct 27, 2016 at 11:56 AM, Davies Liu >

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Luciano Resende
+1 (non-binding) On Thu, Oct 27, 2016 at 9:18 AM, Reynold Xin wrote: > Greetings from Spark Summit Europe at Brussels. > > Please vote on releasing the following candidate as Apache Spark version > 2.0.2. The vote is open until Sun, Oct 30, 2016 at 00:30 PDT and passes if > a majority of at leas

Re: SparkR issue with array types in gapply()

2016-10-27 Thread Felix Cheung
This is a R native data.frame behavior. While arr is a character vector of length = 2, > arr [1] "rows= 50" "cols= 2" > length(arr) [1] 2 when it is set as R data.frame the character vector is splitted into 2 rows > data.frame(key, strings = arr, stringsAsFactors = F) key strings 1 a rows= 5

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Jagadeesan As
+1 (non binding) Ubuntu 14.04.2/openjdk "1.8.0_72" (-Pyarn -Phadoop-2.7 -Psparkr -Pkinesis-asl -Phive-thriftserver) Cheers, Jagadeesan A S From: Reynold Xin To: "dev@spark.apache.org" Date: 27-10-16 12:49 PM Subject:[VOTE] Release Apache Spark 2.0.2 (RC1) Greetings from