SparkSession replace SQLContext

2016-07-05 Thread nihed mbarek
Hi, I just discover that that SparkSession will replace SQLContext for spark 2.0 JavaDoc is clear https://spark.apache.org/docs/2.0.0-preview/api/java/org/apache/spark/sql/SparkSession.html but there is no mention in sql programming guide https://spark.apache.org/docs/2.0.0-preview/sql-programming

Re: SparkSession replace SQLContext

2016-07-05 Thread Romi Kuntsman
You can also claim that there's a whole section of "Migrating from 1.6 to 2.0" missing there: https://spark.apache.org/docs/2.0.0-preview/sql-programming-guide.html#migration-guide *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Tue, Jul 5, 2016 at 12:24 PM, nihed mbarek wrote: >

Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Jacek Laskowski
Hi, It's with the master built today. Why can't I call ds.foreachPartition(println)? Is using type annotation the only way to go forward? I'd be so sad if that's the case. scala> ds.foreachPartition(println) :28: error: overloaded method value foreachPartition with alternatives: (func: org.apa

Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Sean Owen
Do you not mean ds.foreachPartition(_.foreach(println)) or similar? On Tue, Jul 5, 2016 at 2:22 PM, Jacek Laskowski wrote: > Hi, > > It's with the master built today. Why can't I call > ds.foreachPartition(println)? Is using type annotation the only way to > go forward? I'd be so sad if that's th

Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Jacek Laskowski
Sort of. Your example works, but could you do a mere ds.foreachPartition(println)? Why not? What should I even see the Java version? scala> val ds = spark.range(10) ds: org.apache.spark.sql.Dataset[Long] = [id: bigint] scala> ds.foreachPartition(println) :26: error: overloaded method value foreac

Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Sean Owen
A DStream is a sequence of RDDs, not of elements. I don't think I'd expect to express an operation on a DStream as if it were elements. On Tue, Jul 5, 2016 at 2:47 PM, Jacek Laskowski wrote: > Sort of. Your example works, but could you do a mere > ds.foreachPartition(println)? Why not? What shoul

Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Jacek Laskowski
ds is Dataset and the problem is that println (or any other one-element function) would not work here (and perhaps other methods with two variants - Java's and Scala's). Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark

Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Sean Owen
Right, should have noticed that in your second mail. But foreach already does what you want, right? it would be identical here. How these two methods do conceptually different things on different arguments. I don't think I'd expect them to accept the same functions. On Tue, Jul 5, 2016 at 3:18 PM

Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Jacek Laskowski
Well, there is foreach for Java and another foreach for Scala. That's what I can understand. But while supporting two language-specific APIs -- Scala and Java -- Dataset API lost support for such simple calls without type annotations so you have to be explicit about the variant (since I'm using Sca

Re: SparkSession replace SQLContext

2016-07-05 Thread Michael Allman
These topics have been included in the documentation for recent builds of Spark 2.0. Michael > On Jul 5, 2016, at 3:49 AM, Romi Kuntsman wrote: > > You can also claim that there's a whole section of "Migrating from 1.6 to > 2.0" missing there: > https://spark.apache.org/docs/2.0.0-preview/sql

Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Reynold Xin
This seems like a Scala compiler bug. On Tuesday, July 5, 2016, Jacek Laskowski wrote: > Well, there is foreach for Java and another foreach for Scala. That's > what I can understand. But while supporting two language-specific APIs > -- Scala and Java -- Dataset API lost support for such simple

Re: Call to new JObject sometimes returns an empty R environment

2016-07-05 Thread Shivaram Venkataraman
-sparkr-dev@googlegroups +dev@spark.apache.org [Please send SparkR development questions to the Spark user / dev mailing lists. Replies inline] > From: > Date: Tue, Jul 5, 2016 at 3:30 AM > Subject: Call to new JObject sometimes returns an empty R environment > To: SparkR Developers > > > > H

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-07-05 Thread Reynold Xin
Please consider this vote canceled and I will work on another RC soon. On Tue, Jun 21, 2016 at 6:26 PM, Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and passes > if a majority of a

Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Jacek Laskowski
Hi Reynold, Is this already reported and tracked somewhere. I'm quite sure that people will be asking about the reasons Spark does this. Where are such issues reported usually? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apac

Re: spark git commit: [SPARK-15204][SQL] improve nullability inference for Aggregator

2016-07-05 Thread Jacek Laskowski
On Mon, Jul 4, 2016 at 6:14 AM, wrote: > Repository: spark > Updated Branches: > refs/heads/master 88134e736 -> 8cdb81fa8 > > > [SPARK-15204][SQL] improve nullability inference for Aggregator > > ## What changes were proposed in this pull request? > > TypedAggregateExpression sets nullable base

Re: spark git commit: [SPARK-15204][SQL] improve nullability inference for Aggregator

2016-07-05 Thread Koert Kuipers
oh you mean instead of: assert(ds3.select(NameAgg.toColumn).schema.head.nullable === true) just do: assert(ds3.select(NameAgg.toColumn).schema.head.nullable) i did mostly === true because i also had === false, and i liked the symmetry, but sure this can be fixed if its not the norm On Tue, Jul 5,

Re: spark git commit: [SPARK-15204][SQL] improve nullability inference for Aggregator

2016-07-05 Thread Reynold Xin
Jacek, This is definitely not necessary, but I wouldn't waste cycles "fixing" things like this when they have virtually zero impact. Perhaps next time we update this code we can "fix" it. Also can you comment on the pull request directly? On Tue, Jul 5, 2016 at 1:07 PM, Jacek Laskowski wrote:

Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Cody Koeninger
I don't think that's a scala compiler bug. println is a valid expression that returns unit. Unit is not a single-argument function, and does not match any of the overloads of foreachPartition You may be used to a conversion taking place when println is passed to method expecting a function, but

Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Reynold Xin
You can file it here: https://issues.scala-lang.org/secure/Dashboard.jspa Perhaps "bug" is not the right word, but "limitation". println accepts a single argument of type Any and returns Unit, and it appears that Scala fails to infer the correct overloaded method in this case. def println() = C

Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Shixiong(Ryan) Zhu
I asked this question in Scala user group two years ago: https://groups.google.com/forum/#!topic/scala-user/W4f0d8xK1nk Take a look if you are interested in. On Tue, Jul 5, 2016 at 1:31 PM, Reynold Xin wrote: > You can file it here: https://issues.scala-lang.org/secure/Dashboard.jspa > > Perhap

[VOTE] Release Apache Spark 2.0.0 (RC2)

2016-07-05 Thread Reynold Xin
Please vote on releasing the following candidate as Apache Spark version 2.0.0. The vote is open until Friday, July 8, 2016 at 23:00 PDT and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.0.0 [ ] -1 Do not release this package because ...

Re: Dataset and Aggregator API pain points

2016-07-05 Thread Reynold Xin
See https://issues.apache.org/jira/browse/SPARK-16390 On Sat, Jul 2, 2016 at 6:35 PM, Reynold Xin wrote: > Thanks, Koert, for the great email. They are all great points. > > We should probably create an umbrella JIRA for easier tracking. > > > On Saturday, July 2, 2016, Koert Kuipers wrote: > >