In my case the partition has alot of small files and each 2 hours there is
more than 20,000 files added, today running recover partition on each table
exceeding 3 minutes, for that i'm trying to think if using refresh with
partitions can reduce this running time

On Wed, Feb 6, 2019 at 10:51 AM Jeszy <jes...@gmail.com> wrote:

> Recover will recognize the newly added files in a newly added
> partition. It doesn't touch already existing partitions.
> How much you gain by using recover depends on the amount of files and
> partitions, in the vast majority of cases I've seen it's not worth the
> added complexity of having to use two commands instead of one.
> Per-partition refresh is usually good enough.
>
> On Wed, 6 Feb 2019 at 09:39, Fawze Abujaber <fawz...@gmail.com> wrote:
> >
> > Thanks Jezy for your quick response,
> >
> > Is it means the best that i need to run alter recover partitions once a
> day and all the others in the same day to run refresh?
> >
> > Does both provide the same result? according to the documntation the
> recover will recognize the newly added files under the partition.
> >
> >
> > On Wed, Feb 6, 2019 at 10:35 AM Jeszy <jes...@gmail.com> wrote:
> >>
> >> Hey Fawze,
> >>
> >> RECOVER PARTITIONS is cheaper to execute, but it works only once for
> >> each new partition. If you keep adding files to existing partitions,
> >> per-partition REFRESH is the best bet.
> >>
> >> HTH
> >>
> >> On Wed, 6 Feb 2019 at 09:27, Fawze Abujaber <fawz...@gmail.com> wrote:
> >> >
> >> > Hi Community,
> >> >
> >> > I'm all the time working to enhance our impala usage and resource
> consumption, and here i would like to think which to use between alter
> table recover partitions and refresh statement, in terms of running time
> and resources, specially that refresh can be run on specific partitions, i
> have spark job that adding files at the HDFS partitioned by year,month and
> day.
> >> >
> >> > To automatically detect new partition directories added through Hive
> or HDFS operations:
> >> >
> >> > In CDH 5.5 / Impala 2.3 and higher, the RECOVER PARTITIONS clause
> scans a partitioned table to detect if any new partition directories were
> added outside of Impala, such as by Hive ALTER TABLE statements or by hdfs
> dfs or hadoop fs commands. The RECOVER PARTITIONS clause automatically
> recognizes any data files present in these new directories, the same as the
> REFRESH statement does.
> >> >
> >> >
> >> > --
> >> > Take Care
> >> > Fawze Abujaber
> >
> >
> >
> > --
> > Take Care
> > Fawze Abujaber
>


-- 
Take Care
Fawze Abujaber

Reply via email to