Avro/Parquet GenericFixed decimal is not read into Spark correctly

Justin Pihony Wed, 12 Apr 2017 20:12:44 -0700

All,

Before creating a JIRA for this I wanted to get a sense as to whether it
would be shot down or not:


Take the following code:

spark-shell --packages org.apache.avro:avro:1.8.1
import org.apache.avro.{Conversions, LogicalTypes, Schema}
import java.math.BigDecimal
val dc = new Conversions.DecimalConversion()
val javaBD = BigDecimal.valueOf(643.85924958)
val schema =
   
Schema.parse("{\"type\":\"record\",\"name\":\"Header\",\"namespace\":\"org.apache.avro.file\",\"fields\":["
+
     
"{\"name\":\"COLUMN\",\"type\":[\"null\",{\"type\":\"fixed\",\"name\":\"COLUMN\","
+
     
"\"size\":19,\"precision\":17,\"scale\":8,\"logicalType\":\"decimal\"}]}]}"
    )
val schemaDec = schema.getField("COLUMN").schema()
val fieldSchema = if(schemaDec.getType() == Schema.Type.UNION)
schemaDec.getTypes.get(1) else schemaDec
val converted = dc.toFixed(javaBD, fieldSchema,
LogicalTypes.decimal(javaBD.precision, javaBD.scale))
sqlContext.createDataFrame(List(("value",converted)))

and you'll get this error:

java.lang.UnsupportedOperationException: Schema for type
org.apache.avro.generic.GenericFixed is not supported

However if you write out a parquet file using the AvroParquetWriter and the
above GenericFixed value (converted), then read it in via the
DataFrameReader the decimal value that is retrieved is not accurate (ie.
643... above is listed as -0.5...)

Even if not supported, is there any way to at least have it throw an
UnsupportedOperationException as it does when you try to do it directly (as
compared to read in from a file)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Avro-Parquet-GenericFixed-decimal-is-not-read-into-Spark-correctly-tp28592.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Avro/Parquet GenericFixed decimal is not read into Spark correctly

Reply via email to