Error in saving schemaRDD with Decimal as Parquet

Manoj Samel Sun, 01 Feb 2015 12:29:19 -0800

Spark 1.2

SchemaRDD has schema with decimal columns created like


x1 = new StructField("a", DecimalType(14,4), true)

x2 = new StructField("b", DecimalType(14,4), true)

Registering as SQL Temp table and doing SQL queries on these columns ,
including SUM etc. works fine, so the schema Decimal does not seems to be
issue

When doing saveAsParquetFile on the RDD, it gives following error. Not sure
why the "DecimalType" in SchemaRDD is not seen by Parquet, which seems to
see it as scala.math.BigDecimal

java.lang.ClassCastException: scala.math.BigDecimal cannot be cast to
org.apache.spark.sql.catalyst.types.decimal.Decimal

at org.apache.spark.sql.parquet.MutableRowWriteSupport.consumeType(
ParquetTableSupport.scala:359)

at org.apache.spark.sql.parquet.MutableRowWriteSupport.write(
ParquetTableSupport.scala:328)

at org.apache.spark.sql.parquet.MutableRowWriteSupport.write(
ParquetTableSupport.scala:314)

at parquet.hadoop.InternalParquetRecordWriter.write(
InternalParquetRecordWriter.java:120)

at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81)

at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37)

at org.apache.spark.sql.parquet.InsertIntoParquetTable.org
$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(
ParquetTableOperations.scala:308)

at
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(
ParquetTableOperations.scala:325)

at
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(
ParquetTableOperations.scala:325)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)

at org.apache.spark.scheduler.Task.run(Task.scala:56)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)

at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)

 at java.lang.Thread.run(Thread.java:744)

Error in saving schemaRDD with Decimal as Parquet

Reply via email to