[ https://issues.apache.org/jira/browse/ATLAS-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sarath Subramanian updated ATLAS-2751: -------------------------------------- Affects Version/s: (was: 1.1.0) 0.8.1 1.0.0-alpha 0.8.2 1.0.0 Fix Version/s: 0.8.3 > Atlas is not consuming messages from ATLAS_HOOK topic after recovering from > zookeeper connection timeout. > --------------------------------------------------------------------------------------------------------- > > Key: ATLAS-2751 > URL: https://issues.apache.org/jira/browse/ATLAS-2751 > Project: Atlas > Issue Type: Bug > Components: atlas-core > Affects Versions: 0.8.1, 1.0.0-alpha, 0.8.2, 1.0.0 > Reporter: Sarath Subramanian > Assignee: Sarath Subramanian > Priority: Major > Fix For: 0.8.3, 1.1.0 > > > Atlas is not consuming messages from ATLAS_HOOK topic after recovering from > zookeeper connection timeout. > * Below snapshot showing, atlas going into passive state with notification > consumer thread shutdown > {noformat} > 2018-05-21 18:45:42,156 INFO - > [org.apache.ranger.audit.queue.AuditBatchQueue1:] ~ Audit Status Log: > name=atlas.async.multi_dest.batch, > finalDestination=atlas.async.multi_dest.batch.solr, interval=01:00.002 > minutes, events=4, succcessCount=2, totalEvents=2960, totalSuccessCount=112 > (BaseAuditHandler:312) > 2018-05-21 20:33:32,604 INFO - > [main-SendThread(ctr-e138-1518143905142-323545-01-000005.hwx.site:2181):] ~ > Client session timed out, have not heard from server in 13339ms for sessionid > 0x1637d8f847f002c, closing socket connection and attempting reconnect > (ClientCnxn:1140) > 2018-05-21 20:33:32,705 INFO - [main-EventThread:] ~ State change: SUSPENDED > (ConnectionStateManager:228) > 2018-05-21 20:33:32,708 WARN - [Curator-ConnectionStateManager-0:] ~ Server > instance with server id id1 is removed as leader > (ActiveInstanceElectorService:191) > 2018-05-21 20:33:32,708 WARN - [Curator-ConnectionStateManager-0:] ~ Instance > becoming passive from ACTIVE (ServiceState:92) > 2018-05-21 20:33:32,709 INFO - [Curator-ConnectionStateManager-0:] ~ Reacting > to passive state: shutting down Kafka consumers. > (NotificationHookConsumer:227) > 2018-05-21 20:33:32,709 INFO - [Curator-ConnectionStateManager-0:] ~ ==> > stopConsumerThreads() (NotificationHookConsumer:193) > 2018-05-21 20:33:32,709 INFO - [Curator-ConnectionStateManager-0:] ~ ==> > HookConsumer shutdown() (NotificationHookConsumer:610) > 2018-05-21 20:33:32,710 INFO - [Curator-ConnectionStateManager-0:] ~ > [atlas-hook-consumer-thread]: Shutting down > (NotificationHookConsumer$HookConsumer:72) > 2018-05-21 20:33:32,710 INFO - [NotificationHookConsumer thread-0:] ~ closing > NotificationConsumer (NotificationHookConsumer:343) > 2018-05-21 20:33:34,550 INFO - > [main-SendThread(ctr-e138-1518143905142-323545-01-000005.hwx.site:2181):] ~ > Client will use GSSAPI as SASL mechanism. (ZooKeeperSaslClient:285) > 2018-05-21 20:33:34,551 INFO - > [main-SendThread(ctr-e138-1518143905142-323545-01-000005.hwx.site:2181):] ~ > Opening socket connection to server > ctr-e138-1518143905142-323545-01-000005.hwx.site/172.27.15.200:2181. Will > attempt to SASL-authenticate using Login Context section 'Client' > (ClientCnxn:1019) > 2018-05-21 20:33:47,715 ERROR - [Curator-Framework-0:] ~ Connection timed out > for connection string (ctr-e138-1518143905142-323545-01-000005.hwx.site:2181) > and timeout (15000) / elapsed (15006) (ConnectionState:200) > org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = > ConnectionLoss > at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) > at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) > at > org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > > > * Below snapshot showing, atlas recovering from zookeeper timeout and > assuming to active instance with consumer thread initialized to read kafka > events. > {noformat} > 2018-05-21 20:34:13,358 INFO - [Curator-Framework-0-EventThread:] ~ State > change: RECONNECTED (ConnectionStateManager:228) > 2018-05-21 20:34:13,366 WARN - [Curator-Framework-0-EventThread:] ~ Server > instance with server id id1 is elected as leader > (ActiveInstanceElectorService:145) > 2018-05-21 20:34:13,366 WARN - [Curator-Framework-0-EventThread:] ~ Instance > becoming active from PASSIVE (ServiceState:76) > 2018-05-21 20:34:13,366 INFO - [Curator-Framework-0-EventThread:] ~ Reacting > to active: Creating HBase table for Audit if required. > (HBaseBasedAuditRepository:571) > 2018-05-21 20:34:13,366 INFO - [Curator-Framework-0-EventThread:] ~ Checking > if table ATLAS_ENTITY_AUDIT_EVENTS exists (HBaseBasedAuditRepository:513) > 2018-05-21 20:34:13,379 INFO - [Curator-Framework-0-EventThread:] ~ Table > ATLAS_ENTITY_AUDIT_EVENTS exists (HBaseBasedAuditRepository:525) > 2018-05-21 20:34:13,379 INFO - [Curator-Framework-0-EventThread:] ~ Reacting > to active: initializing index (GraphBackedSearchIndexer:132) > 2018-05-21 20:34:13,380 INFO - [Curator-Framework-0-EventThread:] ~ Creating > indexes for graph. (GraphBackedSearchIndexer:243) > 2018-05-21 20:34:13,429 INFO - [Curator-Framework-0-EventThread:] ~ Index > creation for global keys complete. (GraphBackedSearchIndexer:293) > 2018-05-21 20:34:13,430 INFO - [Curator-Framework-0-EventThread:] ~ ==> > AtlasTypeDefStoreInitializer.instanceIsActive() > (AtlasTypeDefStoreInitializer:331) > 2018-05-21 20:34:13,430 INFO - [Curator-Framework-0-EventThread:] ~ ==> > AtlasTypeDefGraphStoreV1.init() (AtlasTypeDefGraphStoreV1:106) > 2018-05-21 20:34:13,430 INFO - [Curator-Framework-0-EventThread:] ~ ==> > AtlasTypeDefGraphStore.init() (AtlasTypeDefGraphStore:81) > 2018-05-21 20:34:58,686 WARN - [Curator-Framework-0-EventThread:] ~ No > RelationshipDef defined between table_type_mALvg and db_type_mALvg on > attribute: table_type_mALvg.db (AtlasEntityType:145) > 2018-05-21 20:34:58,686 WARN - [Curator-Framework-0-EventThread:] ~ No > RelationshipDef defined between table_type_mALvg and column_type_mALvg on > attribute: table_type_mALvg.column (AtlasEntityType:145) > 2018-05-21 20:34:58,686 WARN - [Curator-Framework-0-EventThread:] ~ No > RelationshipDef defined between composite_entity_dRRgt and > simple_entity_dRRgt on attribute: composite_entity_dRRgt.simple_entity > (AtlasEntityType:145) > 2018-05-21 20:34:58,687 WARN - [Curator-Framework-0-EventThread:] ~ No > RelationshipDef defined between composite_entity_zbqFl and > simple_entity_zbqFl on attribute: composite_entity_zbqFl.simple_entity > (AtlasEntityType:145) > 2018-05-21 20:34:58,688 WARN - [Curator-Framework-0-EventThread:] ~ No > RelationshipDef defined between __AtlasUserProfile and __AtlasUserSavedSearch > on attribute: __AtlasUserProfile.savedSearches (AtlasEntityType:145) > 2018-05-21 20:34:58,688 WARN - [Curator-Framework-0-EventThread:] ~ No > RelationshipDef defined between composite_entity_PEhru and > simple_entity_PEhru on attribute: composite_entity_PEhru.simple_entity > (AtlasEntityType:145) > 2018-05-21 20:34:58,688 WARN - [Curator-Framework-0-EventThread:] ~ No > RelationshipDef defined between table_type_UUDrx and db_type_UUDrx on > attribute: table_type_UUDrx.db (AtlasEntityType:145) > 2018-05-21 20:34:58,688 WARN - [Curator-Framework-0-EventThread:] ~ No > RelationshipDef defined between table_type_UUDrx and column_type_UUDrx on > attribute: table_type_UUDrx.column (AtlasEntityType:145) > 2018-05-21 20:34:58,689 WARN - [Curator-Framework-0-EventThread:] ~ No > RelationshipDef defined between composite_entity_qhCgR and > simple_entity_qhCgR on attribute: composite_entity_qhCgR.simple_entity > (AtlasEntityType:145) > 2018-05-21 20:34:58,689 INFO - [Curator-Framework-0-EventThread:] ~ <== > AtlasTypeDefGraphStore.init() (AtlasTypeDefGraphStore:105) > 2018-05-21 20:34:58,689 INFO - [Curator-Framework-0-EventThread:] ~ <== > AtlasTypeDefGraphStoreV1.init() (AtlasTypeDefGraphStoreV1:110) > 2018-05-21 20:34:58,692 INFO - [Curator-Framework-0-EventThread:] ~ ==> > AtlasTypeDefStoreInitializer.loadBootstrapTypeDefs() > (AtlasTypeDefStoreInitializer:116) > 2018-05-21 20:34:58,692 INFO - [Curator-Framework-0-EventThread:] ~ ==> > AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0) > (AtlasTypeDefStoreInitializer:153) > 2018-05-21 20:34:58,694 INFO - [Curator-Framework-0-EventThread:] ~ No new > type in file > /usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0/0010-base_model.json > (AtlasTypeDefStoreInitializer:185) > 2018-05-21 20:34:58,697 INFO - [Curator-Framework-0-EventThread:] ~ No new > type in file > /usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0/0011-glossary_model.json > (AtlasTypeDefStoreInitializer:185) > 2018-05-21 20:34:58,697 INFO - [Curator-Framework-0-EventThread:] ~ Type > patches directory /usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0/patches does > not exist or not readable or has no patches (AtlasTypeDefStoreInitializer:392) > 2018-05-21 20:34:58,697 INFO - [Curator-Framework-0-EventThread:] ~ <== > AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0) > (AtlasTypeDefStoreInitializer:196) > 2018-05-21 20:34:58,697 INFO - [Curator-Framework-0-EventThread:] ~ ==> > AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop) > (AtlasTypeDefStoreInitializer:153) > 2018-05-21 20:34:58,698 INFO - [Curator-Framework-0-EventThread:] ~ No new > type in file > /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1020-fs_model.json > (AtlasTypeDefStoreInitializer:185) > 2018-05-21 20:34:58,700 INFO - [Curator-Framework-0-EventThread:] ~ No new > type in file > /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1030-hive_model.json > (AtlasTypeDefStoreInitializer:185) > 2018-05-21 20:34:58,700 INFO - [Curator-Framework-0-EventThread:] ~ No new > type in file > /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1040-sqoop_model.json > (AtlasTypeDefStoreInitializer:185) > 2018-05-21 20:34:58,701 INFO - [Curator-Framework-0-EventThread:] ~ No new > type in file > /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1050-falcon_model.json > (AtlasTypeDefStoreInitializer:185) > 2018-05-21 20:34:58,702 INFO - [Curator-Framework-0-EventThread:] ~ No new > type in file > /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1060-hbase_model.json > (AtlasTypeDefStoreInitializer:185) > 2018-05-21 20:34:58,703 INFO - [Curator-Framework-0-EventThread:] ~ No new > type in file > /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1070-kafka_model.json > (AtlasTypeDefStoreInitializer:185) > 2018-05-21 20:34:58,704 INFO - [Curator-Framework-0-EventThread:] ~ No new > type in file > /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1080-storm_model.json > (AtlasTypeDefStoreInitializer:185) > . > . > . > . > . > 2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ patch > skipped: typeName=hbase_table; applyToVersion=1.2; updateToVersion=1.3 > (AtlasTypeDefStoreInitializer:618) > 2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ patch > skipped: typeName=hbase_column_family; applyToVersion=1.1; > updateToVersion=1.2 (AtlasTypeDefStoreInitializer:618) > 2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ Applying > patches in file > /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/patches/006-kafka_topic_add_attribute.json > (AtlasTypeDefStoreInitializer:416) > 2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ patch > skipped: typeName=kafka_topic; applyToVersion=1.0; updateToVersion=1.1 > (AtlasTypeDefStoreInitializer:618) > 2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ <== > AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop) > (AtlasTypeDefStoreInitializer:196) > 2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ ==> > AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models) > (AtlasTypeDefStoreInitializer:153) > 2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ Type > patches directory /usr/hdp/3.0.0.0-1365/atlas/models/patches does not exist > or not readable or has no patches (AtlasTypeDefStoreInitializer:392) > 2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ <== > AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models) > (AtlasTypeDefStoreInitializer:196) > 2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ <== > AtlasTypeDefStoreInitializer.loadBootstrapTypeDefs() > (AtlasTypeDefStoreInitializer:145) > 2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ <== > AtlasTypeDefStoreInitializer.instanceIsActive() > (AtlasTypeDefStoreInitializer:341) > 2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ Reacting > to active state: initializing Kafka consumers (NotificationHookConsumer:214) > 2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ ==> > KafkaNotification.createConsumers(notificationType=HOOK, numConsumers=1, > autoCommitEnabled=false) (KafkaNotification:149) > 2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ <== > KafkaNotification.createConsumers(notificationType=HOOK, numConsumers=1, > autoCommitEnabled=false) (KafkaNotification:156) > 2018-05-21 20:34:58,710 INFO - [NotificationHookConsumer thread-0:] ~ > [atlas-hook-consumer-thread]: Starting > (NotificationHookConsumer$HookConsumer:72) > 2018-05-21 20:34:58,710 INFO - [NotificationHookConsumer thread-0:] ~ ==> > HookConsumer doWork() (NotificationHookConsumer:313) > 2018-05-21 20:34:58,710 INFO - [NotificationHookConsumer thread-0:] ~ Atlas > Server is not ready. Waiting for 1000 milliseconds to retry... > (NotificationHookConsumer:588) > 2018-05-21 20:34:58,713 WARN - [Curator-Framework-0-EventThread:] ~ Instance > is active from BECOMING_ACTIVE (ServiceState:87) > 2018-05-21 20:34:59,710 INFO - [NotificationHookConsumer thread-0:] ~ Atlas > Server is ready, can start reading Kafka events. > (NotificationHookConsumer:603) > {noformat} > * Below snapshot confirms that, hive hook messages ingested into ATLAS_HOOk > topic but not consumed by Atlas > {noformat} > /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper > ctr-e138-1518143905142-323545-01-000005.hwx.site:2181,ctr-e138-1518143905142-323545-01-000006.hwx.site:2181,ctr-e138-1518143905142-323545-01-000007.hwx.site:2181 > --topic ATLAS_HOOK --security-protocol PLAINTEXTSASL > Using the ConsoleConsumer with old consumer is deprecated and will be removed > in a future major release. Consider using the new consumer by passing > [bootstrap-server] instead of [zookeeper]. > {"version":\{"version":"1.0.0","versionParts":[1]},"msgCompressionKind":"NONE","msgSplitIdx":1,"msgSplitCount":1,"msgSourceIP":"172.27.27.0","msgCreatedBy":"hive","msgCreationTime":1527012364380,"message":\{"type":"ENTITY_CREATE_V2","user":"hrt_qa","entities":{"referredEntities":{},"entities":[\{"typeName":"hive_db","attributes":{"owner":"hrt_qa","ownerType":"USER","qualifiedName":"db10@cl1","clusterName":"cl1","name":"db10","description":null,"location":"hdfs://ns1/apps/hive/warehouse/db10.db","parameters":{}},"guid":"-4487807398104856","version":0}]}}} > ^CProcessed a total of 1 messages > {noformat} > Live cluster: https://172.27.17.64:8443/#/main/services/ATLAS/summary > Active atlas: https://ctr-e138-1518143905142-323545-01-000003.hwx.site:21443/ > NOTE: Currently, above atlas service is in buggy state, restarting atlas will > bring atlas to normal mode. Please do not restart atlas service. -- This message was sent by Atlassian JIRA (v7.6.3#76005)