Hi
We are using HDP 2.6.3 with the Atlas version that comes shipped with that 
release. We are having  a problem with lagging and falling behind the messages 
in the ATLAS_HOOK Kafka topic. And I can understand that, as we ingest a large 
number of tables every day to the cluster. Basically, we are creating roughly 
165000 entries in the ATLAS_HOOK topic every day. Primarily from sqoop and 
create/drop tables in Hive. Problem is that Atlas only process around 35-40000 
entries per day, so it kind of builds up.
Many of the tables we import are quite wide, so it's pretty common that the 
messages in the Kafka topic are between 600-800Kb each.
I have verified that I can consume the messages in the topic from a normal 
Kafka client, so it's not a problem with Kafka.I have also cleared the two 
HBase tables and cleared the Kafka topic just to start over from the 
beginning., but the problem remains.
I would like to get some help with what kind of performance tuning I can do to 
make sure that Atlas can consume at least 200.000 entries from the ATLAS_HOOK 
topic per day (we are planning to add a lot more datasources over the next 
couple of month). What options do I have to make this happen?
//Berry

Reply via email to