Hi,

I came across strange behavior when dealing with postgres columns of type 
numeric[] using Spark 2.3.2, PostgreSQL 10.4, 9.6.9.
Consider the following table definition:

create table test1
(
   v  numeric[],
   d  numeric
);

insert into test1 values('{1111.222,2222.332}', 222.4555);

When reading the table into a Dataframe, I get the following schema:

root
 |-- v: array (nullable = true)
 |    |-- element: decimal(0,0) (containsNull = true)
 |-- d: decimal(38,18) (nullable = true)

Notice that for both columns precision and scale were not specified, but in 
case of the array element I got both set to 0, while in the other case defaults 
were set.

Later, when I try to read the Dataframe, I get the following error:

java.lang.IllegalArgumentException: requirement failed: Decimal precision 4 
exceeds max precision 0
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
        at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:453)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$16$$anonfun$apply$6$$anonfun$apply$7.apply(JdbcUtils.scala:474)
        ...

I would expect to get array elements of type decimal(38,18) and no error when 
reading in this case.
Should this be considered a bug? Is there a workaround other than changing the 
column array type definition to include explicit precision and scale?

Best regards,
Alexey

-- реклама -----------------------------------------------------------
Поторопись зарегистрировать самый короткий почтовый адрес @i.ua
https://mail.i.ua/reg - и получи 1Gb для хранения писем

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to