Spark 1.2
SchemaRDD has schema with decimal columns created like
x1 = new StructField("a", DecimalType(14,4), true)
x2 = new StructField("b", DecimalType(14,4), true)
Registering as SQL Temp table and doing SQL queries on these columns ,
including SUM etc. works fine, so the schema Decimal does not seems to be
issue
When doing saveAsParquetFile on the RDD, it gives following error. Not sure
why the "DecimalType" in SchemaRDD is not seen by Parquet, which seems to
see it as scala.math.BigDecimal
java.lang.ClassCastException: scala.math.BigDecimal cannot be cast to
org.apache.spark.sql.catalyst.types.decimal.Decimal
at org.apache.spark.sql.parquet.MutableRowWriteSupport.consumeType(
ParquetTableSupport.scala:359)
at org.apache.spark.sql.parquet.MutableRowWriteSupport.write(
ParquetTableSupport.scala:328)
at org.apache.spark.sql.parquet.MutableRowWriteSupport.write(
ParquetTableSupport.scala:314)
at parquet.hadoop.InternalParquetRecordWriter.write(
InternalParquetRecordWriter.java:120)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37)
at org.apache.spark.sql.parquet.InsertIntoParquetTable.org
$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(
ParquetTableOperations.scala:308)
at
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(
ParquetTableOperations.scala:325)
at
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(
ParquetTableOperations.scala:325)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)