Re: CHAR implementation?

2017-09-15 Thread Dongjoon Hyun
Thank you, Ryan! Yes. Right. If we turn off `spark.sql.hive.convertMetastoreParquet`, Spark pads the space. For ORC CHAR, it's the same. ORC only handles truncation on write. The padding is handled by Hive side in `HiveCharWritable` via `HiveBaseChar.java` on read. Spark ORCFileFormat uses HiveCh

Re: CHAR implementation?

2017-09-15 Thread Ryan Blue
My guess is that this is because Parquet doesn't have a CHAR type. That should be applied to strings by Spark for Parquet. The reason from Parquet's perspective not to support CHAR is that we have no expectation that it is a portable type. Non-SQL writers aren't going to pad values with spaces, an

CHAR implementation?

2017-09-14 Thread Dongjoon Hyun
Hi, All. Currently, Spark shows different behavior when we uses CHAR types. spark-sql> CREATE TABLE t1(a CHAR(3)); spark-sql> CREATE TABLE t2(a CHAR(3)) STORED AS ORC; spark-sql> CREATE TABLE t3(a CHAR(3)) STORED AS PARQUET; spark-sql> INSERT INTO TABLE t1 SELECT 'a '; spark-sql> INSERT INTO TAB