Re: Gitter chat room for Spark

2015-04-17 Thread Akhil Das
Freenode already has a bit active channel under #Apache-spark, I think Josh idle there sometimes. Thanks Best Regards On Fri, Apr 17, 2015 at 3:33 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Would we be interested in having a public chat room? > > Gitter offers

Re: [RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-17 Thread Sree V
Sorry, I couldn't catch up before closing the voting.If it still counts, mvn package fails (1).  And didn't run test (2).  So, -1.1.mvn -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.6.0 -DskipTests clean package 2. mvn -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.6.0

Re: [RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-17 Thread Sean Owen
Sree that doesn't show any error, so it doesn't help. I built with the same flags when I tested and it succeeded. On Fri, Apr 17, 2015 at 8:53 AM, Sree V wrote: > Sorry, I couldn't catch up before closing the voting.If it still counts, mvn > package fails (1). And didn't run test (2). So, -1.1

Re: Gitter chat room for Spark

2015-04-17 Thread Sean Owen
There are N chat options out there, and of course there's no need or way to stop people from using them. If 1 is blessed as 'best', it excludes others who prefer a different one. Tomorrow there will be a New Best Chat App. If a bunch are blessed, the conversation fractures. There's also a principl

Why does the HDFS parquet file generated by Spark SQL have different size with those on Tachyon?

2015-04-17 Thread zhangxiongfei
Hi, I did some tests on Parquet Files with Spark SQL DataFrame API. I generated 36 gzip compressed parquet files by Spark SQL and stored them on Tachyon,The size of each file is about 222M.Then read them with below code. val tfs =sqlContext.parquetFile("tachyon://datanode8.bitauto.dmp:19998/apps

Addition of new Metrics for killed executors.

2015-04-17 Thread Archit Thakur
Hi, We are planning to add new Metrics in Spark for the executors that got killed during the execution. Was just curious, why this info is not already present. Is there some reason for not adding it.? Any ideas around are welcome. Thanks and Regards, Archit Thakur.

Fwd: Addition of new Metrics for killed executors.

2015-04-17 Thread Archit Thakur
-- Forwarded message -- From: Archit Thakur Date: Fri, Apr 17, 2015 at 4:07 PM Subject: Addition of new Metrics for killed executors. To: u...@spark.incubator.apache.org, u...@spark.apache.org, d...@spark.incubator.apache.org Hi, We are planning to add new Metrics in Spark for t

[Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Hi everyone, I had an issue trying to use Spark SQL from Java (8 or 7), I tried to reproduce it in a small test case close to the actual documentation , so sorry for the long mail, but this is "Ja

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Ted Yu
The image didn't go through. I think you were referring to: override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f) Cheers On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > I had an issue trying to use Spark SQL from Java (8 o

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Yes thanks ! Le ven. 17 avr. 2015 à 16:20, Ted Yu a écrit : > The image didn't go through. > > I think you were referring to: > override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f) > > Cheers > > On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot < > o.girar...@lateral-thoughts.com>

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Reynold Xin
I think in 1.3 and above, you'd need to do .sql(...).javaRDD().map(..) On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Yes thanks ! > > Le ven. 17 avr. 2015 à 16:20, Ted Yu a écrit : > > > The image didn't go through. > > > > I think you were referr

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Ok, do you want me to open a pull request to fix the dedicated documentation ? Le ven. 17 avr. 2015 à 18:14, Reynold Xin a écrit : > I think in 1.3 and above, you'd need to do > > .sql(...).javaRDD().map(..) > > On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot < > o.girar...@lateral-thoughts.co

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Reynold Xin
Please do! Thanks. On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Ok, do you want me to open a pull request to fix the dedicated > documentation ? > > Le ven. 17 avr. 2015 à 18:14, Reynold Xin a écrit : > >> I think in 1.3 and above, you'd need to

BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

2015-04-17 Thread Nipun Batra
Hi The example given in SQL document https://spark.apache.org/docs/latest/sql-programming-guide.html org.apache.spark.sql.Row Does not exist in Java API or atleast I was not able to find it. Build Info - Downloaded from spark website Dependency org.apache.spark spark-sql_2.10 1

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Is there any convention *not* to show java 8 versions in the documentation ? Le ven. 17 avr. 2015 à 21:39, Reynold Xin a écrit : > Please do! Thanks. > > > On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> Ok, do you want me to open a pull request

Re: Why does the HDFS parquet file generated by Spark SQL have different size with those on Tachyon?

2015-04-17 Thread Reynold Xin
It's because you did a repartition -- which rearranges all the data. Parquet uses all kinds of compression techniques such as dictionary encoding and run-length encoding, which would result in the size difference when the data is ordered different. On Fri, Apr 17, 2015 at 4:51 AM, zhangxiongfei

Re: dataframe can not find fields after loading from hive

2015-04-17 Thread Reynold Xin
This is strange. cc the dev list since it might be a bug. On Thu, Apr 16, 2015 at 3:18 PM, Cesar Flores wrote: > Never mind. I found the solution: > > val newDataFrame = hc.createDataFrame(hiveLoadedDataFrame.rdd, > hiveLoadedDataFrame.schema) > > which translate to convert the data frame to r

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Reynold Xin
No there isn't a convention. Although if you want to show java 8, you should also show java 6/7 syntax since there are still more 7 users than 8. On Fri, Apr 17, 2015 at 3:36 PM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Is there any convention *not* to show java 8 versions in

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
another PR I guess :) here's the associated Jira https://issues.apache.org/jira/browse/SPARK-6988 Le ven. 17 avr. 2015 à 23:00, Reynold Xin a écrit : > No there isn't a convention. Although if you want to show java 8, you > should also show java 6/7 syntax since there are still more 7 users than

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
and the PR: https://github.com/apache/spark/pull/5564 Thank you ! Olivier. Le ven. 17 avr. 2015 à 23:00, Reynold Xin a écrit : > No there isn't a convention. Although if you want to show java 8, you > should also show java 6/7 syntax since there are still more 7 users than 8. > > > On Fri, Apr

Re: BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

2015-04-17 Thread Olivier Girardot
Hi Nipun, I'm sorry but I don't understand exactly what your problem is ? Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL dependency. Is it a compilation problem ? Are you trying to run a main method using the pom you've just described ? or are you trying to spark-submit the

Re: [RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-17 Thread Sree V
Hi Sean, This is from build log.  I made a master branch build earlier on this machine.Do you think, it needs a clean up of .m2 folder, that you suggested in onetime earlier ?Giving it another try, while you take a look at this. [INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @

Re: [RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-17 Thread Sree V
Hi Sean, This is from build log.  I made a master branch build earlier on this machine.Do you think, it needs a clean up of .m2 folder, that you suggested in onetime earlier ?Giving it another try, while you take a look at this. [INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @

Re: Spark streaming vs. spark usage

2015-04-17 Thread Nathan Kronenfeld
I finally got this compiling and working, I think, but since (as Reynold points out) it involves a little API refactoring, I was hoping to get some discussion about it going as soon as possible. I have the changes necessary to give RDD, DStream, and DataFrame some level of common interface, in htt

Announcing Spark 1.3.1 and 1.2.2

2015-04-17 Thread Patrick Wendell
Hi All, I'm happy to announce the Spark 1.3.1 and 1.2.2 maintenance releases. We recommend all users on the 1.3 and 1.2 Spark branches upgrade to these releases, which contain several important bug fixes. Download Spark 1.3.1 or 1.2.2: http://spark.apache.org/downloads.html Release notes: 1.3.1:

Re: [RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-17 Thread Sree V
cleaned up ~/.m2 and ~/.zinc. received exact same error, again. So, -1 from me. [INFO] [INFO] Building Spark Project External Flume 1.2.2 [INFO] [INFO]

Re: Spark development with IntelliJ

2015-04-17 Thread tanejagagan
This was little bit of frustration to get this working with Intellij IDEA 14.1 Clearing the Additional Compiler option was not fixing the issueI had to add following to Settings->Build Execution, Deployment-> S