John Humphreys created DRILL-6194:
-------------------------------------

             Summary: Allow un-caching of parquet metadata or stop queries from 
failing when metadata is old.
                 Key: DRILL-6194
                 URL: https://issues.apache.org/jira/browse/DRILL-6194
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet
    Affects Versions: 1.10.0
            Reporter: John Humphreys


Let's say you have files stored in the standard hierarchical way and the data 
is held in parquet:
 * year/
 ** month/
 *** day/
 **** filev2.parquet

If you cache the metadata under year/ or one of the other levels, and then you 
replace filev2.parquet with filev3.parquet, you will get errors when running 
queries relating to file2.parquet not being present.

I'm specifically seeing this when using maxdir(), and dir0/1/2 for 
year/month/day but I suspect its a general issue.

Queries using cached metadata should not fail if the metadata is outdated; they 
should just choose not to use it.  Otherwise there should be an uncache 
operator for the metadata so people can just decide to stop using it.

It's not always efficient to run a metadata refresh before every single query 
you do, and its difficult to run one from every program that touches HDFS files 
immediately after it touches them.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to