Paul Rogers created DRILL-5075: ---------------------------------- Summary: Tests complain about Parquet metadata parse errors in Drill-created files Key: DRILL-5075 URL: https://issues.apache.org/jira/browse/DRILL-5075 Project: Apache Drill Issue Type: Bug Affects Versions: 1.8.0 Reporter: Paul Rogers Priority: Minor
The test {{TestParquetWriter.testAllScalarTypes}} seems to create a Parquet file, then read it using the "new" Parquet reader. However, the test throws the following assertion (though the test still succeeds.) Note that the exception does _not_ occur if we run the single test function by itself. It only occurs when run as part of the entire test class, suggesting an interaction between tests. When run stand-alone, another behavior occurs. When the test is complete, and the Drillbit shuts down, only then does Parquet log a bunch of "ColumnChunkPageWriteStore: written" messages followed by: {code} WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by is null or empty! See PARQUET-251 and PARQUET-297 {code} Are we leaving a file open that is getting flushed only on shut-down? Full error when the test runs in the entire suite: {code} PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr using format: (.+) version ((.*) )?\(build ?(.*)\) at org.apache.parquet.VersionParser.parse(VersionParser.java:112) at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:66) at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:264) at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:568) at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:545) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:455) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:412) at org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:381) at org.apache.drill.exec.store.parquet.Metadata.access$0(Metadata.java:379) at org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:316) at org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:1) at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:56) at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:122) at org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:278) at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:257) at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:242) at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:118) at org.apache.drill.exec.store.parquet.ParquetGroupScan.init(ParquetGroupScan.java:733) at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:230) at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:190) at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:169) at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:1) at org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:145) at org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:103) at org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:85) at org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.onMatch(DrillPushProjIntoScan.java:65) at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808) at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:404) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:343) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:240) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:290) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:168) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:123) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:97) at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1008) at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)