Hello,

For a partitioned table (with an int type partition) in the following format:

 CREATE EXTERNAL TABLE `poc.titi`(
   `n` string)
 PARTITIONED BY (
   `id` integer)
 SERDE LINE FORMAT
   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
 STORED AS INPUT FORMAT
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
 OUTPUT FORMAT
   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
 LOCATION
   'hdfs://hdp/warehouse/tablespace/external/hive/poc.db/titi'
 TBL PROPERTIES (
   'TRANSLATED_TO_EXTERNAL'='TRUE',
   'bucketing_version'='2',
   'discover.partitions'='true',
   'external.table.purge'='TRUE'
)

With properties 'external.table.purge'='TRUE' and 'discover.partitions'='true'.

I insert some data into it: insert into poc.titi values ('j','1'),('k','4');
I delete only one partition directly on hdfs: hdfs dfs -rm -r
hdfs://hdp/warehouse/tablespace/external/hive/poc.db/titi/id=4.

Then I wait about 5 to 10 min while the hive metastore launches its
automatic synchronization
(metastore.partition.management.task.frequency).

Then I make a select on the table and there surprise I have no more data:
+---------+----------+
 titi.n titi.id
+---------+----------+
+---------+----------+

Neither on Hive nor on Hdfs as if metastore sync destroyed all data.

I spotted the problem only in the case where the table had the
propertie 'external.table.purge'='TRUE' and in the case where the
partition was of type Int (no problem on partitions of type string,
only the good partition is removed)

I'm using Hive 3.1.0.

Do you have any idea of the problem?

Best regards,

Jeremy

Reply via email to