John Humphreys created DRILL-6194: ------------------------------------- Summary: Allow un-caching of parquet metadata or stop queries from failing when metadata is old. Key: DRILL-6194 URL: https://issues.apache.org/jira/browse/DRILL-6194 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 1.10.0 Reporter: John Humphreys
Let's say you have files stored in the standard hierarchical way and the data is held in parquet: * year/ ** month/ *** day/ **** filev2.parquet If you cache the metadata under year/ or one of the other levels, and then you replace filev2.parquet with filev3.parquet, you will get errors when running queries relating to file2.parquet not being present. I'm specifically seeing this when using maxdir(), and dir0/1/2 for year/month/day but I suspect its a general issue. Queries using cached metadata should not fail if the metadata is outdated; they should just choose not to use it. Otherwise there should be an uncache operator for the metadata so people can just decide to stop using it. It's not always efficient to run a metadata refresh before every single query you do, and its difficult to run one from every program that touches HDFS files immediately after it touches them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)