Nulls getting converted to 0 with spark 2.0 SNAPSHOT

2016-03-07 Thread Franklyn D'souza
Just wanted to confirm that this is the expected behaviour. Basically I'm putting nulls into a non-nullable LongType column and doing a transformation operation on that column, the result is a column with nulls converted to 0. Heres an example from pyspark.sql import types from pyspark.sql impor

Re: Nulls getting converted to 0 with spark 2.0 SNAPSHOT

2016-03-07 Thread Michael Armbrust
That looks like a bug to me. Open a JIRA? On Mon, Mar 7, 2016 at 11:30 AM, Franklyn D'souza < franklyn.dso...@shopify.com> wrote: > Just wanted to confirm that this is the expected behaviour. > > Basically I'm putting nulls into a non-nullable LongType column and doing > a transformation operati

Re: HashedRelation Memory Pressure on Broadcast Joins

2016-03-07 Thread Davies Liu
The underlying buffer for UnsafeRow is reused in UnsafeProjection. On Thu, Mar 3, 2016 at 9:11 PM, Rishi Mishra wrote: > Hi Davies, > When you say "UnsafeRow could come from UnsafeProjection, so We should copy > the rows for safety." do you intend to say that the underlying state might > change

Re: [VOTE] Release Apache Spark 1.6.1 (RC1)

2016-03-07 Thread Reynold Xin
+1 (binding) On Sun, Mar 6, 2016 at 12:08 PM, Egor Pahomov wrote: > +1 > > Spark ODBC server is fine, SQL is fine. > > 2016-03-03 12:09 GMT-08:00 Yin Yang : > >> Skipping docker tests, the rest are green: >> >> [INFO] Spark Project External Kafka ... SUCCESS >> [01:28 min] >

Re: Typo in community databricks cloud docs

2016-03-07 Thread Reynold Xin
Thanks - I've fixed it and it will go out next time we update. For future reference, you can email directly supp...@databricks.com for this. Again - thanks for reporting this. On Sat, Mar 5, 2016 at 4:23 PM, Eugene Morozov wrote: > Hi, I'm not sure where to put this, but I've found a typo on a

Adding hive context gives error

2016-03-07 Thread Suniti Singh
Hi All, I am trying to create a hive context in a scala prog as follows in eclipse: Note -- i have added the maven dependency for spark -core , hive , and sql. import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD.rddToPairRDDFunctions object D

Dynamic allocation availability on standalone mode. Misleading doc.

2016-03-07 Thread Eugene Morozov
Hi, the feature looks like the one I'd like to use, but there are two different descriptions in the docs of whether it's available. I'm on a standalone deployment mode and here: http://spark.apache.org/docs/latest/configuration.html it's specified the feature is available only for YARN, but here:

Re: Dynamic allocation availability on standalone mode. Misleading doc.

2016-03-07 Thread Mark Hamstra
Yes, it works in standalone mode. On Mon, Mar 7, 2016 at 4:25 PM, Eugene Morozov wrote: > Hi, the feature looks like the one I'd like to use, but there are two > different descriptions in the docs of whether it's available. > > I'm on a standalone deployment mode and here: > http://spark.apache.

Re: Adding hive context gives error

2016-03-07 Thread Suniti Singh
yeah i realized it and changed the version of it to 1.6.0 as mentioned in http://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10/1.6.0 I added the spark sql dependency back to the pom.xml and the scala code works just fine. On Mon, Mar 7, 2016 at 5:00 PM, Tristan Nixon wrote: > Hi

Re: Dynamic allocation availability on standalone mode. Misleading doc.

2016-03-07 Thread Saisai Shao
Yes, we need to fix the document. On Tue, Mar 8, 2016 at 9:07 AM, Mark Hamstra wrote: > Yes, it works in standalone mode. > > On Mon, Mar 7, 2016 at 4:25 PM, Eugene Morozov > wrote: > >> Hi, the feature looks like the one I'd like to use, but there are two >> different descriptions in the docs

Re: Dynamic allocation availability on standalone mode. Misleading doc.

2016-03-07 Thread Reynold Xin
The doc fix was merged in 1.6.1, so it will get updated automatically once we push the 1.6.1 docs. On Mon, Mar 7, 2016 at 5:40 PM, Saisai Shao wrote: > Yes, we need to fix the document. > > On Tue, Mar 8, 2016 at 9:07 AM, Mark Hamstra > wrote: > >> Yes, it works in standalone mode. >> >> On Mo

Does anyone implement org.apache.spark.serializer.Serializer in their own code?

2016-03-07 Thread Josh Rosen
Does anyone implement Spark's serializer interface (org.apache.spark.serializer.Serializer) in your own third-party code? If so, please let me know because I'd like to change this interface from a DeveloperAPI to private[spark] in Spark 2.0 in order to do some cleanup and refactoring. I think that

Re: Does anyone implement org.apache.spark.serializer.Serializer in their own code?

2016-03-07 Thread Koert Kuipers
we are not, but it seems reasonable to me that a user has the ability to implement their own serializer. can you refactor and break compatibility, but not make it private? On Mon, Mar 7, 2016 at 9:57 PM, Josh Rosen wrote: > Does anyone implement Spark's serializer interface > (org.apache.spark.

Re: Does anyone implement org.apache.spark.serializer.Serializer in their own code?

2016-03-07 Thread Ted Yu
Josh: SerializerInstance and SerializationStream would also become private[spark], right ? Thanks On Mon, Mar 7, 2016 at 6:57 PM, Josh Rosen wrote: > Does anyone implement Spark's serializer interface > (org.apache.spark.serializer.Serializer) in your own third-party code? If > so, please let m

ML ALS API

2016-03-07 Thread Maciej Szymkiewicz
Can I ask for a clarifications regarding ml.recommendation.ALS: - is train method (https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala#L598) intended to be be public? - Rating class (https://github.com/apache/spark/blob/master/mllib/src/ma

Re: More Robust DataSource Parameters

2016-03-07 Thread Reynold Xin
Hi Hamel, Sorry for the slow reply. Do you mind writing down the thoughts in a document, with API sketches? I think all the devils are in the details of the API for this one. If we can design an API that is type-safe, supports all languages, and also can be stable, then it sounds like a great ide

BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-07 Thread Jacek Laskowski
Hi, Got the BUILD FAILURE. Anyone looking into it? ➜ spark git:(master) ✗ ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.2 -Phive -Phive-thriftserver -DskipTests clean install ... [INFO] [INFO] BUILD FAILURE [INFO] --

Re: BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-07 Thread Reynold Xin
+Sean, who was playing with this. On Mon, Mar 7, 2016 at 11:38 PM, Jacek Laskowski wrote: > Hi, > > Got the BUILD FAILURE. Anyone looking into it? > > ➜ spark git:(master) ✗ ./build/mvn -Pyarn -Phadoop-2.6 > -Dhadoop.version=2.7.2 -Phive -Phive-thriftserver -DskipTests clean > install > ...

Re: BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-07 Thread Shixiong(Ryan) Zhu
There is a fix: https://github.com/apache/spark/pull/11567 On Mon, Mar 7, 2016 at 11:39 PM, Reynold Xin wrote: > +Sean, who was playing with this. > > > > > On Mon, Mar 7, 2016 at 11:38 PM, Jacek Laskowski wrote: > >> Hi, >> >> Got the BUILD FAILURE. Anyone looking into it? >> >> ➜ spark git:(

Re: BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-07 Thread Jacek Laskowski
Hi, Nope. It's not. It's at https://github.com/apache/spark/commit/0eea12a3d956b54bbbd73d21b296868852a04494#diff-600376dffeb79835ede4a0b285078036L2249. I've got that and testing... Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-

Re: BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-07 Thread Jacek Laskowski
Okey...it's building now properly...https://github.com/apache/spark/pull/11567 + git mv scalastyle-config.xml dev/ How to fix it in the repo? Should I send a pull request to...pull request #11567? Guide me or fix it yourself...somehow :-) Pozdrawiam, Jacek Laskowski https://medium.com/@jacek

Re: BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-07 Thread Dongjoon Hyun
Ur, may I include that, too? Dongjoon. On Mon, Mar 7, 2016 at 11:46 PM, Jacek Laskowski wrote: > Okey...it's building now > properly...https://github.com/apache/spark/pull/11567 + git mv > scalastyle-config.xml dev/ > > How to fix it in the repo? Should I send a pull request to...pull > request

Re: BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-07 Thread Dongjoon Hyun
Or, I hope one of committer commits both mine(11567) and that soon. It's related to build setting files, Jenkins test tooks over 2 hours. :( Dongjoon. On Mon, Mar 7, 2016 at 11:48 PM, Dongjoon Hyun wrote: > Ur, may I include that, too? > > Dongjoon. > > On Mon, Mar 7, 2016 at 11:46 PM, Jacek La

Re: BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-07 Thread Jacek Laskowski
Sure! Go ahead and...fix the build. Thanks. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Mar 8, 2016 at 8:48 AM, Dongjoon Hyun wrote: > Ur, may I include t

Re: BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-07 Thread Jacek Laskowski
Hi, I could confirm that the two changes fixed the build. #happy again. ➜ spark git:(master) ✗ ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.2 -Phive -Phive-thriftserver -DskipTests clean install ... [INFO] [INFO] BU