Unsubscribe

2016-10-26 Thread Manoj Shah
Unsubscribe

LIMIT statement on SparkSQL

2016-10-26 Thread Liz Bai
Hi all, We used Parquet and Spark 2.0 to do the testing. The table below is the summary of what we have found about `Limit` keyword. Query-2 reveals that SparkSQL does early stop upon getting adequate results. But we are curious of Query-1 and Query-2. It seems that, either writing result RDD a

Using SPARK_WORKER_INSTANCES and SPARK-15781

2016-10-26 Thread assaf.mendelson
As of applying SPARK-15781 the documentation of SPARK_WORKER_INSTANCES have been removed. This was due to a warning in spark-submit which suggested: WARN SparkConf: SPARK_WORKER_INSTANCES was detected (set to '4'). This is deprecated in Spark 1.0+. Please instead use: - ./spark-submit with --num-

Re: LIMIT statement on SparkSQL

2016-10-26 Thread Liz Bai
Sorry for the typo in last mail. Compared with the Query-2, we have questions in Query-1 and Query-3. Also, may I know the difference between CollectLimit and BaseLimit? Thanks so much. Best, Liz > On 26 Oct 2016, at 7:25 PM, Liz Bai wrote: > > Hi all, > > We used Parquet and Spark 2.0 to do t

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-26 Thread Dongjoon Hyun
Hi, All. It's great since it's a progress. Then, at least, in 2017, Spark 2.2.0 will be out with JDK8 and Scala 2.11/2.12, right? Bests, Dongjoon. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-26 Thread Daniel Siegmann
Is the deprecation of JDK 7 and Scala 2.10 documented anywhere outside the release notes for Spark 2.0.0? I do not consider release notes to be sufficient public notice for deprecation of supported platforms - this should be noted in the documentation somewhere. Here are on the only mentions I coul

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-26 Thread Dongjoon Hyun
Hi, Daniel. I guess that kind of works will start sufficiently in 2.1.0 after PMC's annoucement/reminder on mailing list. Bests, Dongjoon. On Wednesday, October 26, 2016, Daniel Siegmann < dsiegm...@securityscorecard.io> wrote: > Is the deprecation of JDK 7 and Scala 2.10 documented anywhere o

Re: getting encoder implicits to be more accurate

2016-10-26 Thread Michael Armbrust
Hmm, that is unfortunate. Maybe the best solution is to add support for sets? I don't think that would be super hard. On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers wrote: > i am trying to use encoders as a typeclass where if it fails to find an > ExpressionEncoder it falls back to KryoEncoder

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-26 Thread Reynold Xin
We can do the following concrete proposal: 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 2017). 2. In Spark 2.1.0 release, aggressively and explicitly announce the deprecation of Java 7 / Scala 2.10 support. (a) It should appear in release notes, documentations that m

Re: getting encoder implicits to be more accurate

2016-10-26 Thread Ryan Blue
Isn't the problem that Option is a Product and the class it contains isn't checked? Adding support for Set fixes the example, but the problem would happen with any class there isn't an encoder for, right? On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust wrote: > Hmm, that is unfortunate. Mayb

Re: getting encoder implicits to be more accurate

2016-10-26 Thread Koert Kuipers
yup, it doesnt really solve the underlying issue. we fixed it internally by having our own typeclass that produces encoders and that does check the contents of the products, but we did this by simply supporting Tuple1 - Tuple22 and Option explicitly, and not supporting Product, since we dont have

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-26 Thread Koert Kuipers
that sounds good to me On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin wrote: > We can do the following concrete proposal: > > 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr > 2017). > > 2. In Spark 2.1.0 release, aggressively and explicitly announce the > deprecation of

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-26 Thread Michael Armbrust
+1 On Wed, Oct 26, 2016 at 11:26 AM, Reynold Xin wrote: > We can do the following concrete proposal: > > 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr > 2017). > > 2. In Spark 2.1.0 release, aggressively and explicitly announce the > deprecation of Java 7 / Scala 2.10

Re: getting encoder implicits to be more accurate

2016-10-26 Thread Michael Armbrust
Sorry, I realize that set is only one example here, but I don't think that making the type of the implicit more narrow to include only ProductN or something eliminates the issue. Even with that change, we will fail to generate an encoder with the same error if you, for example, have a field of you

Re: getting encoder implicits to be more accurate

2016-10-26 Thread Koert Kuipers
why would generating implicits for ProductN where you also require the elements in the Product to have an expression encoder not work? we do this. and then we have a generic fallback where it produces a kryo encoder. for us the result is that say an implicit for Seq[(Int, Seq[(String, Int)])] wil

Re: getting encoder implicits to be more accurate

2016-10-26 Thread Koert Kuipers
for example (the log shows when it creates a kryo encoder): scala> implicitly[EncoderEvidence[Option[Seq[String.encoder res5: org.apache.spark.sql.Encoder[Option[Seq[String]]] = class[value[0]: array] scala> implicitly[EncoderEvidence[Option[Set[String.encoder dataframe.EncoderEvidence$:

Re: getting encoder implicits to be more accurate

2016-10-26 Thread Michael Armbrust
You use kryo encoder for the whole thing? Or just the subtree that we don't have specific encoders for? Also, I'm saying I like the idea of having a kryo fallback. I don't see the point of narrowing the the definition of the implicit. On Wed, Oct 26, 2016 at 1:07 PM, Koert Kuipers wrote: > fo

Re: getting encoder implicits to be more accurate

2016-10-26 Thread Koert Kuipers
i use kryo for the whole thing currently it would be better to use it for the subtree On Wed, Oct 26, 2016 at 5:06 PM, Michael Armbrust wrote: > You use kryo encoder for the whole thing? Or just the subtree that we > don't have specific encoders for? > > Also, I'm saying I like the idea of hav

Re: getting encoder implicits to be more accurate

2016-10-26 Thread Koert Kuipers
if kryo could transparently be used for subtrees without narrowing the implicit that would be great On Wed, Oct 26, 2016 at 5:10 PM, Koert Kuipers wrote: > i use kryo for the whole thing currently > > it would be better to use it for the subtree > > On Wed, Oct 26, 2016 at 5:06 PM, Michael Armbr

Re: getting encoder implicits to be more accurate

2016-10-26 Thread Michael Armbrust
Awesome, this is a great idea. I opened SPARK-18122 . On Wed, Oct 26, 2016 at 2:11 PM, Koert Kuipers wrote: > if kryo could transparently be used for subtrees without narrowing the > implicit that would be great > > On Wed, Oct 26, 2016 at 5:10

Watermarking in Structured Streaming to drop late data

2016-10-26 Thread Tathagata Das
Hey all, We are planning implement watermarking in Structured Streaming that would allow us handle late, out-of-order data better. Specially, when we are aggregating over windows on event-time, we currently can end up keeping unbounded amount data as state. We want to define watermarks on the even

Re: Watermarking in Structured Streaming to drop late data

2016-10-26 Thread Michael Armbrust
And the JIRA: https://issues.apache.org/jira/browse/SPARK-18124 On Wed, Oct 26, 2016 at 4:56 PM, Tathagata Das wrote: > Hey all, > > We are planning implement watermarking in Structured Streaming that would > allow us handle late, out-of-order data better. Specially, when we are > aggregating ov

RE: Watermarking in Structured Streaming to drop late data

2016-10-26 Thread assaf.mendelson
Hi, Should comments come here or in the JIRA? Any, I am a little confused on the need to expose this as an API to begin with. Let’s consider for a second the most basic behavior: We have some input stream and we want to aggregate a sum over a time window. This means that the window we should be lo