Hi, 100% agree with Reynold.
Regards, Gourav Sengupta On Mon, Mar 16, 2020 at 3:31 AM Reynold Xin <r...@databricks.com> wrote: > Are we sure "not padding" is "incorrect"? > > I don't know whether ANSI SQL actually requires padding, but plenty of > databases don't actually pad. > > https://docs.snowflake.net/manuals/sql-reference/data-types-text.html > <https://docs.snowflake.net/manuals/sql-reference/data-types-text.html#:~:text=CHAR%20%2C%20CHARACTER,(1)%20is%20the%20default.&text=Snowflake%20currently%20deviates%20from%20common,space%2Dpadded%20at%20the%20end.> > : > "Snowflake currently deviates from common CHAR semantics in that strings > shorter than the maximum length are not space-padded at the end." > > MySQL: > https://stackoverflow.com/questions/53528645/why-char-dont-have-padding-in-mysql > > > > > > > > > On Sun, Mar 15, 2020 at 7:02 PM, Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Hi, Reynold. >> >> Please see the following for the context. >> >> https://issues.apache.org/jira/browse/SPARK-31136 >> "Revert SPARK-30098 Use default datasource as provider for CREATE TABLE >> syntax" >> >> I raised the above issue according to the new rubric, and the banning was >> the proposed alternative to reduce the potential issue. >> >> Please give us your opinion since it's still PR. >> >> Bests, >> Dongjoon. >> >> On Sat, Mar 14, 2020 at 17:54 Reynold Xin <r...@databricks.com> wrote: >> >>> I don’t understand this change. Wouldn’t this “ban” confuse the hell out >>> of both new and old users? >>> >>> For old users, their old code that was working for char(3) would now >>> stop working. >>> >>> For new users, depending on whether the underlying metastore char(3) is >>> either supported but different from ansi Sql (which is not that big of a >>> deal if we explain it) or not supported. >>> >>> On Sat, Mar 14, 2020 at 3:51 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >>> wrote: >>> >>>> Hi, All. >>>> >>>> Apache Spark has been suffered from a known consistency issue on `CHAR` >>>> type behavior among its usages and configurations. However, the evolution >>>> direction has been gradually moving forward to be consistent inside Apache >>>> Spark because we don't have `CHAR` offically. The following is the summary. >>>> >>>> With 1.6.x ~ 2.3.x, `STORED PARQUET` has the following different result. >>>> (`spark.sql.hive.convertMetastoreParquet=false` provides a fallback to >>>> Hive behavior.) >>>> >>>> spark-sql> CREATE TABLE t1(a CHAR(3)); >>>> spark-sql> CREATE TABLE t2(a CHAR(3)) STORED AS ORC; >>>> spark-sql> CREATE TABLE t3(a CHAR(3)) STORED AS PARQUET; >>>> >>>> spark-sql> INSERT INTO TABLE t1 SELECT 'a '; >>>> spark-sql> INSERT INTO TABLE t2 SELECT 'a '; >>>> spark-sql> INSERT INTO TABLE t3 SELECT 'a '; >>>> >>>> spark-sql> SELECT a, length(a) FROM t1; >>>> a 3 >>>> spark-sql> SELECT a, length(a) FROM t2; >>>> a 3 >>>> spark-sql> SELECT a, length(a) FROM t3; >>>> a 2 >>>> >>>> Since 2.4.0, `STORED AS ORC` became consistent. >>>> (`spark.sql.hive.convertMetastoreOrc=false` provides a fallback to Hive >>>> behavior.) >>>> >>>> spark-sql> SELECT a, length(a) FROM t1; >>>> a 3 >>>> spark-sql> SELECT a, length(a) FROM t2; >>>> a 2 >>>> spark-sql> SELECT a, length(a) FROM t3; >>>> a 2 >>>> >>>> Since 3.0.0-preview2, `CREATE TABLE` (without `STORED AS` clause) >>>> became consistent. >>>> (`spark.sql.legacy.createHiveTableByDefault.enabled=true` provides a >>>> fallback to Hive behavior.) >>>> >>>> spark-sql> SELECT a, length(a) FROM t1; >>>> a 2 >>>> spark-sql> SELECT a, length(a) FROM t2; >>>> a 2 >>>> spark-sql> SELECT a, length(a) FROM t3; >>>> a 2 >>>> >>>> In addition, in 3.0.0, SPARK-31147 aims to ban `CHAR/VARCHAR` type in >>>> the following syntax to be safe. >>>> >>>> CREATE TABLE t(a CHAR(3)); >>>> https://github.com/apache/spark/pull/27902 >>>> >>>> This email is sent out to inform you based on the new policy we voted. >>>> The recommendation is always using Apache Spark's native type `String`. >>>> >>>> Bests, >>>> Dongjoon. >>>> >>>> References: >>>> 1. "CHAR implementation?", 2017/09/15 >>>> >>>> https://lists.apache.org/thread.html/96b004331d9762e356053b5c8c97e953e398e489d15e1b49e775702f%40%3Cdev.spark.apache.org%3E >>>> 2. "FYI: SPARK-30098 Use default datasource as provider for CREATE >>>> TABLE syntax", 2019/12/06 >>>> >>>> https://lists.apache.org/thread.html/493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev.spark.apache.org%3E >>>> >>> >