Impala version : impalad version 2.12.0 OS: Centos 6.10 Table size : 88TB Partitions : 7K Type : Parquet, file size compacted 256MB
We ingest data every minute to the table partition and run the refresh table to load the data. There is a separate compaction process that runs every hour and merges smaller files into big. The set up was working fine for months until recently we are running into a strange issue of inconsistent behavior between few nodes. Randomly some nodes appear to have inconsistent metadata i.e. even though refresh table command ran successfully some nodes still didn't have correct files so they referred older files for those partitions. We tried invalidating metadata ( followed by 'describe table' to fix metadata) but it didn't help. Even re-running refresh doesn't help all the time. We need some help/points to figure out the issue. * Is there a way to check if all Impala nodes have stale metadata? * How to fix metadata for individual node? Is there a command? * Anyone has faced similar issue? Can you share your experience and fix? Sunil Parmar
