Berry, 35-40,000 messages/day seems too low. For reference: Atlas in my local VM (8gb RAM, 2gb for Atlas server) processes more than 10,000 messages/hour; the messages include tables that have about 1,000 columns. Production environments should see a higher throughput
Fix in ATLAS-2169 help improve the performance of notification processing - especially delete notifications. You might want to try this patch in your deployment. Hope this helps. Madhan From: Österlund Berry <berry.osterlund@ç> Reply-To: "[email protected]" <[email protected]> Date: Monday, February 5, 2018 at 11:24 PM To: "[email protected]" <[email protected]> Subject: Falling behind the Kafka workload Hi We are using HDP 2.6.3 with the Atlas version that comes shipped with that release. We are having a problem with lagging and falling behind the messages in the ATLAS_HOOK Kafka topic. And I can understand that, as we ingest a large number of tables every day to the cluster. Basically, we are creating roughly 165000 entries in the ATLAS_HOOK topic every day. Primarily from sqoop and create/drop tables in Hive. Problem is that Atlas only process around 35-40000 entries per day, so it kind of builds up. Many of the tables we import are quite wide, so it’s pretty common that the messages in the Kafka topic are between 600-800Kb each. I have verified that I can consume the messages in the topic from a normal Kafka client, so it’s not a problem with Kafka.I have also cleared the two HBase tables and cleared the Kafka topic just to start over from the beginning., but the problem remains. I would like to get some help with what kind of performance tuning I can do to make sure that Atlas can consume at least 200.000 entries from the ATLAS_HOOK topic per day (we are planning to add a lot more datasources over the next couple of month). What options do I have to make this happen? //Berry
