Thank you, Ryan!
Yes. Right. If we turn off `spark.sql.hive.convertMetastoreParquet`, Spark
pads the space.
For ORC CHAR, it's the same. ORC only handles truncation on write.
The padding is handled by Hive side in `HiveCharWritable` via
`HiveBaseChar.java` on read.
Spark ORCFileFormat uses HiveCh
My guess is that this is because Parquet doesn't have a CHAR type. That
should be applied to strings by Spark for Parquet.
The reason from Parquet's perspective not to support CHAR is that we have
no expectation that it is a portable type. Non-SQL writers aren't going to
pad values with spaces, an
Hi, All.
Currently, Spark shows different behavior when we uses CHAR types.
spark-sql> CREATE TABLE t1(a CHAR(3));
spark-sql> CREATE TABLE t2(a CHAR(3)) STORED AS ORC;
spark-sql> CREATE TABLE t3(a CHAR(3)) STORED AS PARQUET;
spark-sql> INSERT INTO TABLE t1 SELECT 'a ';
spark-sql> INSERT INTO TAB