Aman Sinha created DRILL-4694: --------------------------------- Summary: CTAS in JSON format produces extraneous NULL fields Key: DRILL-4694 URL: https://issues.apache.org/jira/browse/DRILL-4694 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.6.0 Reporter: Aman Sinha
Consider the following JSON file: {noformat} // file t2.json { "X" : { "key1" : "value1", "key2" : "value2" } } { "X" : { "key3" : "value3", "key4" : "value4" } } { "X" : { "key5" : "value5", "key6" : "value6" } } {noformat} Now create a table in Json format using CTAS: {noformat} 0: jdbc:drill:zk=local> alter session set `store.format` = 'json'; 0: jdbc:drill:zk=local> create table dfs.tmp.jt12 as select t.`X` from `t2.json` t; +-----------+----------------------------+ | Fragment | Number of records written | +-----------+----------------------------+ | 0_0 | 3 | +-----------+----------------------------+ {noformat} The output file has rows with union schema of all the fields in all the records. This creates extraneous Null fields in the output: {noformat} $ cat jt12/0_0_0.json { "X" : { "key1" : "value1", "key2" : "value2", "key3" : null, "key4" : null, "key5" : null, "key6" : null } } { "X" : { "key1" : null, "key2" : null, "key3" : "value3", "key4" : "value4", "key5" : null, "key6" : null } } { "X" : { "key1" : null, "key2" : null, "key3" : null, "key4" : null, "key5" : "value5", "key6" : "value6" } } {noformat} Note that if I change the output format to CSV or Parquet, there are no Null fields created in the output file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)