Paul Rogers created DRILL-5075:
----------------------------------

             Summary: Tests complain about Parquet metadata parse errors in 
Drill-created files
                 Key: DRILL-5075
                 URL: https://issues.apache.org/jira/browse/DRILL-5075
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Paul Rogers
            Priority: Minor


The test {{TestParquetWriter.testAllScalarTypes}} seems to create a Parquet 
file, then read it using the "new" Parquet reader. However, the test throws the 
following assertion (though the test still succeeds.)

Note that the exception does _not_ occur if we run the single test function by 
itself. It only occurs when run as part of the entire test class, suggesting an 
interaction between tests.

When run stand-alone, another behavior occurs. When the test is complete, and 
the Drillbit shuts down, only then does Parquet log a bunch of 
"ColumnChunkPageWriteStore: written" messages followed by:

{code}
WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because 
created_by is null or empty! See PARQUET-251 and PARQUET-297
{code}

Are we leaving a file open that is getting flushed only on shut-down?

Full error when the test runs in the entire suite:

{code}
PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because 
created_by could not be parsed (see PARQUET-251): parquet-mr
org.apache.parquet.VersionParser$VersionParseException: Could not parse 
created_by: parquet-mr using format: (.+) version ((.*) )?\(build ?(.*)\)
        at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
        at 
org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:66)
        at 
org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:264)
        at 
org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:568)
        at 
org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:545)
        at 
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:455)
        at 
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:412)
        at 
org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:381)
        at 
org.apache.drill.exec.store.parquet.Metadata.access$0(Metadata.java:379)
        at 
org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:316)
        at 
org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:1)
        at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:56)
        at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:122)
        at 
org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:278)
        at 
org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:257)
        at 
org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:242)
        at 
org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:118)
        at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.init(ParquetGroupScan.java:733)
        at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:230)
        at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:190)
        at 
org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:169)
        at 
org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:1)
        at 
org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:145)
        at 
org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:103)
        at 
org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:85)
        at 
org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.onMatch(DrillPushProjIntoScan.java:65)
        at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
        at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
        at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303)
        at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:404)
        at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:343)
        at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:240)
        at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:290)
        at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:168)
        at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:123)
        at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:97)
        at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1008)
        at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to