Hi Manish!

Why do you need metadata backup? Can't you just store all the table create
statements in an init file? If you care about Partitions that have been
created dynamically then you can restore them from data by RECOVER
PARTITIONS (if using Amazon EMR) or an analog check command for a regular
distro of Hadoop (I don't remember what the name is).

Ruslan


On Mon, Dec 10, 2012 at 12:48 PM, Manish Malhotra <
manish.hadoop.w...@gmail.com> wrote:

> Sending again, as got no response.
>
> Can somebody from Hive dev group please review my approach and reply?
>
> Cheers,
> Manish
>
>
> On Thu, Dec 6, 2012 at 11:17 PM, Manish Malhotra <
> manish.hadoop.w...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm building / designing a back-up and restore tool for hive data for
>> Disaster Recovery scenarios.
>>
>> I'm trying to understand the locking behavior of HIVE that is currently
>> supporting ZooKeeper for locking.
>>
>> My thought process if like this ( early design.)
>>
>> 1. Backing up the meta-data of hive.
>> 2. Backing up the data for hive tables on s3 or hdfs or NFS
>> 3. Restoring table(s):
>>     a. Only Data
>>     b. Schema and data
>>
>> So, to achieve 1st task, this is the flow I'm thinking.
>>
>> a. Check whether there is any exclusive lock on the Table, whose
>> meta-data needs to be backed up.
>>          if YES then don't do any thing, wait and retry for configured
>> no/frequency
>>          if NO: Then get the meta-data of the table and create the DDL
>> statement for HIVE including table / partition etc.
>>
>> For 2nd task:
>>
>> a. Check whether the table has any exclusive lock,
>>         if NOT take shared lock and start copy, once done release the
>> shared lock.
>>         if YES then then wait and retry.
>>
>> For 3rd: Restoring:
>>
>> a. Only Data: Check if there is any lock on the table.
>>                      if NO, then take the exclusive lock, insert the data
>> into table, release the lock.
>>                      if YES then wait and retry.
>>
>> b. Schema and Data:
>>
>>                 Check if there is any lock on table/partition.
>>                       if NO then Drop and create table/partitions.
>>                       if YES then wait and retry.
>>                  Once schema is created:
>>                       take the exclusive lock, insert data, release lock.
>>
>>
>> Now I'm going to run this kind of job from my scheduler / WF engine.
>> I need input on following questions:
>>
>> a. Is this overall approach looks good?
>> b. How can I take and release different locks explicitly using HIVE API.
>> ref: https://cwiki.apache.org/confluence/display/Hive/Locking
>>
>> If I understood correctly, As per this still HIVE doesn't support locking
>> explicitly at API level.
>> Is there any plan or patch to get this done.
>>
>> I saw some classes like *ZooKeeperHiveLock *etc.but need to dig further
>> to see, if can use these classes for locking features.
>>
>> Thanks for your time and effort.
>>
>> Regards,
>> Manish
>>
>>
>>
>

Reply via email to