Three places to check,
1. Would you mind showing your "/etc/zookeeper/conf/server-jaas.conf",
2. and using zkCli.sh to getAcl /hbase.
3. BTW, what was your login principal when executing "add_peer" in hbase
shell.
________________________________
From: Saad Mufti <[email protected]>
Sent: 23 May 2018 01:48:17
To: [email protected]
Subject: HBase Replication Between Two Secure Clusters With Different Kerberos
KDC's
Hi,
Here is my scenario, I have two secure/authenticated EMR based HBase
clusters, both have their own cluster dedicated KDC (using EMR support for
this which means we get Kerberos support by just turning on a config flag).
Now we want to get replication going between them. For other application
reasons, we want both clusters to have the same Kerberos realm, let's say
APP.COM, so Kerberos principals are like [email protected] .
I looked around the web and found the instructions at
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_hadoop-high-availability/content/hbase-cluster-replication-among-kerberos-secured-clusters.html
so I tried to follow these directions. Of course the instructions are for
replication between clusters with different realms, so I adapted this by
adding only one principal "krbtgt/[email protected]" and gave it some
arbitrary password. Followed the rest of the directions as well to pass a
rule property to Zookeeper and the requisite Hadoop property in
core-site.xml .
After all this, when I set up replication from cluster1 to cluster using
add_peer, I see error messages in the region servers for cluster1 of the
following form:
> 2018-05-22 17:27:45,763 INFO [main-SendThread(xxx.net:2181)]
> zookeeper.ClientCnxn: Opening socket connection to server i
>
> p-10-194-247-88.aolp-ds-dev.us-east-1.ec2.aolcloud.net/xxx.yyy.zzz:2181.
> Will attempt to SASL-authenticate using Login Context section 'Client'
>
> 2018-05-22 17:27:45,764 INFO [main-SendThread(xxx.net:2181)]
> zookeeper.ClientCnxn: Socket connection established to ip-1
>
> 0-194-247-88.aolp-ds-dev.us-east-1.ec2.aolcloud.net/xxx.yyy.zzz:2181,
> initiating session
>
> 2018-05-22 17:27:45,777 INFO [main-SendThread(xxx.net:2181)]
> zookeeper.ClientCnxn: Session establishment complete on ser
>
> ver xxx.net/xxx.yyy.zzz:2181, sessionid = 0x16388599b300215, negotiated
> timeout = 40000
> 2018-05-22 17:27:45,779 ERROR [main-SendThread(xxx.net:2181)]
> client.ZooKeeperSaslClient: An error: (java.security.Privil
>
> egedActionException: javax.security.sasl.SaslException: GSS initiate
> failed [Caused by GSSException: No valid credentials provided (Mechanism
> level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)])
> occurred when evaluating Zookeeper Quorum Member's received SASL token.
> Zookeeper Client will go to AUTH_FAILED state.
>
> 2018-05-22 17:27:45,779 ERROR [main-SendThread(xxx.net:2181)]
> zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum member
> failed: javax.security.sasl.SaslException: An error:
> (java.security.PrivilegedActionException:
> javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Server not
> found in Kerberos database (7) - LOOKING_UP_SERVER)]) occurred when
> evaluating Zookeeper Quorum Member's received SASL token. Zookeeper
> Client will go to AUTH_FAILED state.
>
> 2018-05-22 17:28:12,574 WARN [main-EventThread] zookeeper.ZKUtil:
> hconnection-0x4dcc1d33-0x16388599b300215, quorum=xyz.net:2181,
> baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid)
>
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode
> = AuthFailed for /hbase/hbaseid
>
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
>
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102)
>
> at
>
>
The Zookeeper start command looks like the following:
/usr/lib/jvm/java-openjdk/bin/java -D*zoo*keeper.log.dir=/var/log/*zoo*keeper
> -D*zoo*keeper.root.logger=INFO,ROLLINGFILE -cp /usr/lib/*zoo*
> keeper/bin/../build/classes:/usr/lib/*zoo*
> keeper/bin/../build/lib/*.jar:/usr/lib/*zoo*
> keeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/lib/*zoo*
> keeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/lib/*zoo*
> keeper/bin/../lib/netty-3.10.5.Final.jar:/usr/lib/*zoo*
> keeper/bin/../lib/log4j-1.2.16.jar:/usr/lib/*zoo*
> keeper/bin/../lib/jline-2.11.jar:/usr/lib/*zoo*keeper/bin/../*zoo*
> keeper-3.4.10.jar:/usr/lib/*zoo*keeper/bin/../src/java/lib/*.jar:/etc/
> *zoo*keeper/conf::/etc/*zoo*keeper/conf:/usr/lib/*zoo*keeper/*:/usr/lib/
> *zoo*keeper/lib/*
> -Djava.security.auth.login.config=/etc/*zoo*keeper/conf/server-jaas.conf
> -D*zoo*keeper.security.auth_to_local=RULE:[2:\$1@\$0](.*@\QAPP.COM\E$)s/@\
> APP.COM\E$//DEFAULT -*zoo*keeper.log.threshold=INFO
> -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.local.only=false
> org.apache.*zoo*keeper.server.quorum.QuorumPeerMain
> /etc/*zoo*keeper/conf/*zoo*.cfg
>
>
The property in core-site looks like the following:
<property>
>
> <name>hadoop.security.auth_to_local</name>
>
> <value>RULE:[2:\$1@
> \$0](.*@\\QPGS.dev\\E$)s/@\\QPGS.dev\\E$//DEFAULT</value>
>
> </property>
>
At this point I am not clear how I can get the added Kerberos principal "
krbtgt/[email protected]" (in both clusters' Kerberos KDC's) to be
authenticated against and for replication to start flowing.
Any help and/or pointers would be appreciated.
Thanks.
-----
Saad