Hi Team, Say I have a test.json file: {"c1":[1,2,3]} I can create a parquet file like : var df = sqlContext.load("/tmp/test.json","json") var df_c = df.repartition(1) df_c.select("*").save("/tmp/testjson_spark","parquet”)
The output parquet file’s schema is like: c1: OPTIONAL F:1 .bag: REPEATED F:1 ..array: OPTIONAL INT64 R:1 D:3 Is there anyway to avoid using “.bag”, instead of, can we create the parquet file using column type “REPEATED INT64”? The expected data type is: c1: REPEATED INT64 R:1 D:1 Thanks! -- Thanks, www.openkb.info (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)