Hi Team,
I want to raise one topic about the Standard of Parquet nested data types.
Firstly let me show you one simple example.
Sample Json file:
{"c1":[1,2,3]}
Using Spark to convert it to parquet, the schema is:
c1: OPTIONAL F:1
.bag: REPEATED F:1
..array: OPTIONAL INT64 R:1 D:3
Using Drill to create parquet file, schema will be:
c1: REPEATED INT64 R:1 D:1
So this caused that Drill can not read the parquet nested data types
generated by Spark, or even Hive(See DRILL-1999
<https://issues.apache.org/jira/browse/DRILL-1999>)
Spark community's answer to this standard question of parquet nested data
types are in:
https://www.mail-archive.com/[email protected]/msg35663.html
What is Drill's stand point on this topic? Do we need to make some
agreement on the standard of nested data types in the Parquet community?
Any comment is welcome.
Thanks,
Hao