Option only works when you are going from case classes. Just put null into the Row, when you want the value to be null.
On Tue, May 5, 2015 at 9:00 AM, Masf <masfwo...@gmail.com> wrote: > Hi. > > I have a spark application where I store the results into table (with > HiveContext). Some of these columns allow nulls. In Scala, this columns are > represented through Option[Int] or Option[Double].. Depend on the data type. > > For example: > > *val hc = new HiveContext(sc)* > *var col1: Option[Ingeger] = None* > *...* > > *val myRow = org.apache.spark.sql.Row(col1, ...)* > > *val mySchema = StructType(Array(StructField("Column1", IntegerType, > true)))* > > *val TableOutputSchemaRDD = hc.applySchema(myRow, mySchema)* > *hc.registerRDDAsTable(TableOutputSchemaRDD, "result_intermediate")* > *hc.sql("CREATE TABLE table_output STORED AS PARQUET AS SELECT * FROM > result_intermediate")* > > Produce: > > java.lang.ClassCastException: scala.Some cannot be cast to > java.lang.Integer > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaIntObjectInspector.get(JavaIntObjectInspector.java:40) > at > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.createPrimitive(ParquetHiveSerDe.java:247) > at > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.createObject(ParquetHiveSerDe.java:301) > at > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.createStruct(ParquetHiveSerDe.java:178) > at > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.serialize(ParquetHiveSerDe.java:164) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:123) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:114) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org > $apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:114) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > > > > Thanks!!!!! > -- > > Regards. > Miguel Ángel >