Tried writing the parquet using spark.sql.parquet.writeLegacyFormat=true, same error. I'm running out of ideas on this.
Just to show it's valid parquet: parquet-tools cat /mnt/Drill/parqJsDf_0626v3/part-00000-9b54445b-7723-4e19-a145-e719e30da73f-c000.snappy.parquet | head -n 5 id = tag:search.twitter.com,2005:792893798200160257 objectType = activity actor: .objectType = person .id = id:twitter.com:63936789 On 6/29/20, 9:35 AM, "Updike, Clark" <[email protected]> wrote: APL external email warning: Verify sender [email protected] before clicking links or attachments I keep getting an NPE whenever I try to read parquet files generated by Spark using 1.18 nightly (June 9). $ ls /mnt/Drill/parqJsDf_0625/dt\=2016-10-31/ | head -n 2 part-00000-blah.snappy.parquet part-00001-blah.snappy.parquet No matter how I query it: apache drill> select * from dfs.`mnt_drill`.`parqJsDf_0625` where dir0='dt\=2016-10-31' limit 2; apache drill> select * from dfs.`mnt_drill`.`parqJsDf_0625` limit 2; I get an exception related to the partitioning: Caused By (java.lang.NullPointerException) null org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.checkForPartitionColumn():186 org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.collect():119 org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.<init>():59 org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.getParquetGroupScanStatistics():293 org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.getTableMetadata():249 org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.initializeMetadata():203 org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.init():170 org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>():95 org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>():48 org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl$Builder.build():415 org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>():150 org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>():120 org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan():202 org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan():79 org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan():226 org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan():209 org.apache.drill.exec.planner.logical.DrillTable.getGroupScan():119 org.apache.drill.exec.planner.common.DrillScanRelBase.<init>():51 org.apache.drill.exec.planner.logical.DrillScanRel.<init>():76 org.apache.drill.exec.planner.logical.DrillScanRel.<init>():65 org.apache.drill.exec.planner.logical.DrillScanRel.<init>():58 org.apache.drill.exec.planner.logical.DrillScanRule.onMatch():38 org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():208 org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():633 org.apache.calcite.tools.Programs$RuleSetProgram.run():327 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():405 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():351 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel():245 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():308 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():173 org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():283 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan():163 org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan():128 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():93 org.apache.drill.exec.work.foreman.Foreman.runSQL():593 The files are valid parquet... I can use parquet tools on them just fine. I can read the same files back in using Spark. I have tested with and without partitioning when writing from Spark. I have tried it both with and without snappy compression. Always the same NPE. Any insight appreciated...
