Spark 1.2 SchemaRDD has schema with decimal columns created like
x1 = new StructField("a", DecimalType(14,4), true) x2 = new StructField("b", DecimalType(14,4), true) Registering as SQL Temp table and doing SQL queries on these columns , including SUM etc. works fine, so the schema Decimal does not seems to be issue When doing saveAsParquetFile on the RDD, it gives following error. Not sure why the "DecimalType" in SchemaRDD is not seen by Parquet, which seems to see it as scala.math.BigDecimal java.lang.ClassCastException: scala.math.BigDecimal cannot be cast to org.apache.spark.sql.catalyst.types.decimal.Decimal at org.apache.spark.sql.parquet.MutableRowWriteSupport.consumeType( ParquetTableSupport.scala:359) at org.apache.spark.sql.parquet.MutableRowWriteSupport.write( ParquetTableSupport.scala:328) at org.apache.spark.sql.parquet.MutableRowWriteSupport.write( ParquetTableSupport.scala:314) at parquet.hadoop.InternalParquetRecordWriter.write( InternalParquetRecordWriter.java:120) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37) at org.apache.spark.sql.parquet.InsertIntoParquetTable.org $apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1( ParquetTableOperations.scala:308) at org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply( ParquetTableOperations.scala:325) at org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply( ParquetTableOperations.scala:325) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)