Hi Eugene,

HMS depends on DataNucleus for the identity value generation for the HMS
tables. It is generated by DataNucleus when an object is made persistent.
DataNucleus value generator will generate values uniquely across different
JVMs. As Zoltan said, DataNucleus tracks with the SEQUENCE_TABLE for each
model class id allocation. We don't have id generation code directly in
metastore code. In recent times, to add the dynamic partitions using direct
sql to db, there is a method getDataStoreId(Class<?> modelClass) in
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlInsertPart.java.
It fetches the next available id to use from DataNucleus directly. It is
used for classes using datastore identity type.
Not sure about how you are going to replicate into the target cluster. You
might have to explore DataNucleus value generation further in case this is
what you are looking for.

Regards,
Venu

On Tue, Oct 24, 2023 at 6:00 PM Zoltán Rátkai <zrat...@cloudera.com> wrote:

> Hi Eugene,
>
> the TBL_ID in TBLS table is handled by Datanucleus, so AUTO_INCREMENT
> won't help, since the TBL_ID is not defined as AUTO_INCREMENT.
>
> Datanucleus uses SEQUENCE_TABLE to store the actual value for primary
> keys. In this table this two rows is what you need to modify:
>
> org.apache.hadoop.hive.metastore.model.MDatabase
> org.apache.hadoop.hive.metastore.model.MTable
>
> e.g:
> update SEQUENCE_TABLE set NEXT_VAL = 10000  where
> SEQUENCE_NAME='org.apache.hadoop.hive.metastore.model.MTable';
> and do it for org.apache.hadoop.hive.metastore.model.Database as well.
>
> After that if you create a table the TBL_ID will be used from this value.
> Datanucleus uses caching (default 10) so maybe the next tables will still
> use the old value. Try to create 10 simple table like this:
>
> create table test1 (i int);
> ...
> create table test10 (i int);
> and then drop them and check the TBL_ID.
>
> *Before doing this I recommend to create a backup from the Metastore DB!!*
>
> Also check this:
>
> https://community.cloudera.com/t5/Support-Questions/How-to-migrate-Hive-Table-From-one-cluster-to-another/m-p/235145
>
> Regards,
>
> Zoltan Ratkai
>
> On Sun, Oct 22, 2023 at 5:39 PM Eugene Miretsky <eug...@badal.io> wrote:
>
>> Hey!
>>
>> Looking for a way to control the ids (DB_ID and TABLE_ID) of newly
>> created  databases and tables.
>>
>> We have a somewhat complicated use case where we replicate the metastore
>> (and data) from a source Hive cluster to a target cluster. However new
>> tables can be added on both source and target. We need a way to avoid
>> unique Id collision. One way would be to make sure all databases/tables
>> created in the target Hive start from a higher Id.
>>
>> We have tried to set AUTO_INCREAMENT='10000' on a metastore MySQL db, but
>> it doesn't work. This makes us think the Id is generated by the Metastore
>> code itself, but we cannot find the right place in the code, or if it is
>> possible to control the logic.
>>
>> Any advice would be appreciated.
>>
>> Cheers,
>>
>

Reply via email to