In my case the partition has alot of small files and each 2 hours there is more than 20,000 files added, today running recover partition on each table exceeding 3 minutes, for that i'm trying to think if using refresh with partitions can reduce this running time
On Wed, Feb 6, 2019 at 10:51 AM Jeszy <jes...@gmail.com> wrote: > Recover will recognize the newly added files in a newly added > partition. It doesn't touch already existing partitions. > How much you gain by using recover depends on the amount of files and > partitions, in the vast majority of cases I've seen it's not worth the > added complexity of having to use two commands instead of one. > Per-partition refresh is usually good enough. > > On Wed, 6 Feb 2019 at 09:39, Fawze Abujaber <fawz...@gmail.com> wrote: > > > > Thanks Jezy for your quick response, > > > > Is it means the best that i need to run alter recover partitions once a > day and all the others in the same day to run refresh? > > > > Does both provide the same result? according to the documntation the > recover will recognize the newly added files under the partition. > > > > > > On Wed, Feb 6, 2019 at 10:35 AM Jeszy <jes...@gmail.com> wrote: > >> > >> Hey Fawze, > >> > >> RECOVER PARTITIONS is cheaper to execute, but it works only once for > >> each new partition. If you keep adding files to existing partitions, > >> per-partition REFRESH is the best bet. > >> > >> HTH > >> > >> On Wed, 6 Feb 2019 at 09:27, Fawze Abujaber <fawz...@gmail.com> wrote: > >> > > >> > Hi Community, > >> > > >> > I'm all the time working to enhance our impala usage and resource > consumption, and here i would like to think which to use between alter > table recover partitions and refresh statement, in terms of running time > and resources, specially that refresh can be run on specific partitions, i > have spark job that adding files at the HDFS partitioned by year,month and > day. > >> > > >> > To automatically detect new partition directories added through Hive > or HDFS operations: > >> > > >> > In CDH 5.5 / Impala 2.3 and higher, the RECOVER PARTITIONS clause > scans a partitioned table to detect if any new partition directories were > added outside of Impala, such as by Hive ALTER TABLE statements or by hdfs > dfs or hadoop fs commands. The RECOVER PARTITIONS clause automatically > recognizes any data files present in these new directories, the same as the > REFRESH statement does. > >> > > >> > > >> > -- > >> > Take Care > >> > Fawze Abujaber > > > > > > > > -- > > Take Care > > Fawze Abujaber > -- Take Care Fawze Abujaber