[ https://issues.apache.org/jira/browse/SPARK-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000616#comment-15000616 ]
Bartlomiej Alberski edited comment on SPARK-11553 at 11/11/15 5:10 PM: ----------------------------------------------------------------------- Ok. I think that I know what is the problem. It can be reproduced with scala 2.11.6 and DataFrame API. If you are using DataFrame API from scala and you are trying to get Int|Long|Boolean etc - value that extends AnyVal, you will receive "zero value" specific for given type (0 for Long and Int, false for Boolean etc), while API suggest that NPE will be raised. Example modified in order to ilustrate problem (from http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes) {code} val sc: SparkContext // An existing SparkContext. val sqlContext = new org.apache.spark.sql.SQLContext(sc) val df = sqlContext.read.json("examples/src/main/resources/people.json") // Displays the content of the DataFrame to stdout df.show() val res = df.map(x => x.getLong(x.fieldIndex("name"))).collect() println(res.mkString(",") {code} Problem comes from implementation of getInt|Float|Boolean|... methods: {code} getInt(i: Int): Int = getAs[Int](i) getAs[T](i: Int): T = get(i).asInstanceOf[T] {code} null.asInstanceOf[Long] returns 0 (because Long cannot be null - it extends AnyVal) Examplary invocations from scala REPL {code} scala> null.asInstanceOf[Int] res0: Int = 0 scala> null.asInstanceOf[Long] res1: Long = 0 scala> null.asInstanceOf[Short] res2: Short = 0 scala> null.asInstanceOf[Boolean] res3: Boolean = false scala> null.asInstanceOf[Double] res4: Double = 0.0 scala> null.asInstanceOf[Float] res5: Float = 0.0 {code} I will be more than happy to prepare PR solving this issue. was (Author: alberskib): Ok. I think that I know what is the problem. It can be reproduced with scala 2.11.6 and DataFrame API. If you are using DataFrame API from scala and you are trying to get Int|Long|Boolean etc - value that extends AnyVal, you will receive "zero value" specific for given type (0 for Long and Int, false for Boolean etc), while API suggest that NPE will be raised. Example modified in order to ilustrate problem (from http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes) {code} val sc: SparkContext // An existing SparkContext. val sqlContext = new org.apache.spark.sql.SQLContext(sc) val df = sqlContext.read.json("examples/src/main/resources/people.json") // Displays the content of the DataFrame to stdout df.show() val res = df.map(x => x.getLong(x.fieldIndex("name"))).collect() println(res.mkString(",") {code} Problem comes from implementation of getInt|Float|Boolean|... methods: {code} getInt(i: Int): Int = getAs[Int](i) getAs[T](i: Int): T = get(i).asInstanceOf[T] {code} null.asInstanceOf[Long] returns 0 (because Long cannot be null because it extends AnyVal) Examplary invocations from scala REPL {code} scala> null.asInstanceOf[Int] res0: Int = 0 scala> null.asInstanceOf[Long] res1: Long = 0 scala> null.asInstanceOf[Short] res2: Short = 0 scala> null.asInstanceOf[Boolean] res3: Boolean = false scala> null.asInstanceOf[Double] res4: Double = 0.0 scala> null.asInstanceOf[Float] res5: Float = 0.0 {code} I will be more than happy to prepare PR solving this issue. > row.getInt(i) if row[i]=null returns 0 > -------------------------------------- > > Key: SPARK-11553 > URL: https://issues.apache.org/jira/browse/SPARK-11553 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Tofigh > Priority: Minor > > row.getInt|Float|Double in SPARK RDD return 0 if row[index] is null. (Even > according to the document they should throw nullException error) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org