I did something like AccumuloInputFormat.setZooKeeperInstance(Job, ClientConfiguration.loadDefault().withZKHosts(zk).withInstance(name).withSasl(true))
So, this explicitly instruct to turn on SASL, and it's working past where I was stuck on. Now I seem to have a different problem :-( I'll look into it and report back later. Thanks Josh! -Simon On Fri, Jun 12, 2015 at 1:47 PM, Josh Elser <[email protected]> wrote: > Generally, it's not a good idea to assume that you can always locate the > correct ClientConfiguration from the local filesystem. > > For example, YARN nodes might not have Accumulo installed, or might even > point to the wrong Accumulo instance. The Job's Configuration serves as the > de-facto place for _all_ information that your Job needs to perform its > work. > > Can you try calling AccumuloInputFormat.setZooKeeperInstance(Job, > ClientConfiguration) instead? > > Great work tracking this down, Simon! > > > Xu (Simon) Chen wrote: >> >> Ah, I found the problem... >> >> In the hadoop chain of events, this is called eventually, because >> clientConfigString is not null (containing instance.name and >> instance.zookeeper.host): >> >> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/mapreduce/lib/impl/ConfiguratorBase.java#L382 >> >> The deserialize function unfortunately doesn't load the default >> options, therefore left out the sasl thing from ~/.accumulo/config >> >> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java#L235 >> >> Would it be reasonable for deserialize to load the default settings? >> >> -Simon >> >> >> On Fri, Jun 12, 2015 at 11:34 AM, Josh Elser<[email protected]> wrote: >>> >>> Just be careful with the mapreduce classes. I wouldn't be surprised if we >>> try to avoid any locally installed client.conf in MapReduce (using only >>> the >>> ClientConfiguration stored inside the Job). >>> >>> Will wait to hear back from you :) >>> >>> Xu (Simon) Chen wrote: >>>> >>>> Emm.. I have ~/.accumulo/config with "instance.rpc.sasl.enabled=true". >>>> That property is indeed populated to ClientConfiguration the first time >>>> - that's why I said the token worked initially. >>>> >>>> Apparently, in the Hadoop portion that property is not set, as I added >>>> some debug message to ZooKeeperInstance class. I think that's likely the >>>> issue. >>>> >>>> So the zookeeper instance is created in the following sequence: >>>> >>>> >>>> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/mapred/AbstractInputFormat.java#L341 >>>> >>>> >>>> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/mapreduce/lib/impl/InputConfigurator.java#L671 >>>> >>>> >>>> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/mapreduce/lib/impl/ConfiguratorBase.java#L361 >>>> >>>> The getClientConfiguration function calls getDefaultSearchPath() >>>> eventually, so my ~/.accumulo/config should be searched. I think we are >>>> close to the root cause... Will update when I find out more. >>>> >>>> Thanks! >>>> -Simon >>>> >>>> On Thu, Jun 11, 2015 at 11:28 PM, Josh Elser<[email protected]> >>>> wrote: >>>> > Are you sure that the spark tasks have the proper >>>> ClientConfiguration? They >>>> > need to have instance.rpc.sasl.enabled. I believe you should be >>>> able >>>> to set >>>> > this via the AccumuloInputFormat >>>> > >>>> > You can turn up logging org.apache.accumulo.core.client=TRACE >>>> and/or >>>> set the >>>> > system property -Dsun.security.krb5.debug=true to get some more >>>> information >>>> > as to why the authentication is failing. >>>> > >>>> > >>>> > Xu (Simon) Chen wrote: >>>> >> >>>> >> Josh, >>>> >> >>>> >> I am using this function: >>>> >> >>>> >> >>>> >> >>>> >>>> >>>> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/mapred/AbstractInputFormat.java#L106 >>>> >> >>>> >> If I pass in a KerberosToken, it's stuck at line 111; if I pass in >>>> a >>>> >> delegation token, the setConnectorInfo function finishes fine. >>>> >> >>>> >> But when I do something like queryRDD.count, spark eventually >>>> calls >>>> >> HadoopRDD.getPartitions, which calls the following and get stuck >>>> in >>>> >> the last authenticate() function: >>>> >> >>>> >> >>>> >>>> >>>> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/mapred/AbstractInputFormat.java#L621 >>>> >> >>>> >> >>>> >>>> >>>> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/mapred/AbstractInputFormat.java#L348 >>>> >> >>>> >> >>>> >>>> >>>> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/ZooKeeperInstance.java#L248 >>>> >> >>>> >> >>>> >>>> >>>> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/impl/ConnectorImpl.java#L70 >>>> >> >>>> >> Which essentially the same place where it would be stuck with >>>> >> KerberosToken. >>>> >> >>>> >> -Simon >>>> >> >>>> >> On Thu, Jun 11, 2015 at 9:41 PM, Josh Elser<[email protected]> >>>> wrote: >>>> >>> >>>> >>> What are the Accumulo methods that you are calling and what is >>>> the >>>> error >>>> >>> you >>>> >>> are seeing? >>>> >>> >>>> >>> A KerberosToken cannot be used in a MapReduce job which is why a >>>> >>> DelegationToken is automatically retrieved. You should still be >>>> able >>>> to >>>> >>> provide your own DelegationToken -- if that doesn't work, that's >>>> a >>>> bug. >>>> >>> >>>> >>> Xu (Simon) Chen wrote: >>>> >>>> >>>> >>>> I actually added a flag such that I can pass in either a >>>> KerberosToken >>>> >>>> or a DelegationTokenImpl to accumulo. >>>> >>>> >>>> >>>> Actually when a KerberosToken is passed in, accumulo converts it >>>> to >>>> a >>>> >>>> DelegationToken - the conversion is where I am having trouble. I >>>> tried >>>> >>>> passing in a delegation token directly to bypass the conversion, >>>> but >>>> a >>>> >>>> similar problem happens, that I am stuck at authenticate on the >>>> client >>>> >>>> side, and server side outputs the same output... >>>> >>>> >>>> >>>> On Thursday, June 11, 2015, Josh Elser<[email protected] >>>> >>>> <mailto:[email protected]>> wrote: >>>> >>>> >>>> >>>> Keep in mind that the authentication path for >>>> DelegationToken >>>> >>>> (mapreduce) and KerberosToken are completely different. >>>> >>>> >>>> >>>> Since most mapreduce jobs have multiple mappers (or >>>> reducers), >>>> I >>>> >>>> expect we would have run into the case that the same >>>> >>>> DelegationToken >>>> >>>> was used multiple times. It would still be good to narrow >>>> down the >>>> >>>> scope of the problem. >>>> >>>> >>>> >>>> Xu (Simon) Chen wrote: >>>> >>>> >>>> >>>> Thanks Josh... >>>> >>>> >>>> >>>> I tested this in scala REPL, and called >>>> >>>> DataStoreFinder.getDataStore() >>>> >>>> multiple times, each time it seems to be reusing the >>>> same >>>> >>>> KerberosToken object, and it works fine each time. >>>> >>>> >>>> >>>> So my problem only happens when the token is used in >>>> accumulo's >>>> >>>> mapred >>>> >>>> package. Weird.. >>>> >>>> >>>> >>>> -Simon >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Jun 11, 2015 at 5:29 PM, Josh >>>> >>>> Elser<[email protected]> wrote: >>>> >>>> >>>> >>>> Simon, >>>> >>>> >>>> >>>> Can you reproduce this in plain-jane Java code? I >>>> don't >>>> >>>> know >>>> >>>> enough about >>>> >>>> spark/scala, much less what Geomesa is actually do, >>>> to know >>>> >>>> what the issue >>>> >>>> is. >>>> >>>> >>>> >>>> Also, which token are you referring to: A >>>> KerberosToken or >>>> >>>> a >>>> >>>> DelegationToken? Either of them should be usable as >>>> many >>>> >>>> times as you'd like >>>> >>>> (given the underlying credentials are still >>>> available >>>> for >>>> >>>> KT >>>> >>>> or the DT token >>>> >>>> hasn't yet expired). >>>> >>>> >>>> >>>> >>>> >>>> Xu (Simon) Chen wrote: >>>> >>>> >>>> >>>> Folks, >>>> >>>> >>>> >>>> I am working on geomesa+accumulo+spark >>>> integration. For >>>> >>>> some reason, I >>>> >>>> found that the same token cannot be used to >>>> >>>> authenticate >>>> >>>> twice. >>>> >>>> >>>> >>>> The workflow is that geomesa would try to >>>> create a >>>> >>>> hadoop rdd, during >>>> >>>> which it tries to create an AccumuloDataStore: >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> https://github.com/locationtech/geomesa/blob/master/geomesa-compute/src/main/scala/org/locationtech/geomesa/compute/spark/GeoMesaSpark.scala#L81 >>>> >>>> >>>> >>>> During this process, a ZooKeeperInstance is >>>> created: >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> https://github.com/locationtech/geomesa/blob/rc7_a1.7_h2.5/geomesa-core/src/main/scala/org/locationtech/geomesa/core/data/AccumuloDataStoreFactory.scala#L177 >>>> >>>> I modified geomesa such that it would use >>>> kerberos >>>> to >>>> >>>> authenticate >>>> >>>> here. This step works fine. >>>> >>>> >>>> >>>> But next, geomesa calls >>>> >>>> ConfigurationBase.setConnectorInfo: >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> https://github.com/locationtech/geomesa/blob/rc7_a1.7_h2.5/geomesa-compute/src/main/scala/org/locationtech/geomesa/compute/spark/GeoMesaSpark.scala#L69 >>>> >>>> >>>> >>>> This is using the same token and the same >>>> zookeeper >>>> >>>> URI, >>>> >>>> for some >>>> >>>> reason it is stuck on spark-shell, and the >>>> following is >>>> >>>> outputted on >>>> >>>> tserver side: >>>> >>>> >>>> >>>> 2015-06-06 18:58:19,616 >>>> [server.TThreadPoolServer] >>>> >>>> ERROR: Error >>>> >>>> occurred during processing of message. >>>> >>>> java.lang.RuntimeException: >>>> >>>> >>>> org.apache.thrift.transport.TTransportException: >>>> >>>> java.net >>>> <http://java.net><http://java.net>.SocketTimeoutException: Read >>>> >>>> >>>> >>>> >>>> timed out >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48) >>>> >>>> at >>>> >>>> >>>> java.security.AccessController.doPrivileged(Native >>>> >>>> Method) >>>> >>>> at >>>> >>>> >>>> javax.security.auth.Subject.doAs(Subject.java:356) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1622) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) >>>> >>>> at >>>> java.lang.Thread.run(Thread.java:745) >>>> >>>> Caused by: >>>> >>>> org.apache.thrift.transport.TTransportException: >>>> >>>> java.net >>>> <http://java.net><http://java.net>.SocketTimeoutException: Read >>>> >>>> >>>> timed out >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) >>>> >>>> ... 11 more >>>> >>>> Caused by: java.net<http://java.net> >>>> >>>> <http://java.net>.SocketTimeoutException: Read timed >>>> >>>> out >>>> >>>> at >>>> >>>> java.net.SocketInputStream.socketRead0(Native >>>> Method) >>>> >>>> at >>>> >>>> >>>> >>>> java.net.SocketInputStream.read(SocketInputStream.java:152) >>>> >>>> at >>>> >>>> >>>> >>>> java.net.SocketInputStream.read(SocketInputStream.java:122) >>>> >>>> at >>>> >>>> >>>> >>>> java.io.BufferedInputStream.read1(BufferedInputStream.java:273) >>>> >>>> at >>>> >>>> >>>> >>>> java.io.BufferedInputStream.read(BufferedInputStream.java:334) >>>> >>>> at >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) >>>> >>>> ... 17 more >>>> >>>> >>>> >>>> Any idea why? >>>> >>>> >>>> >>>> Thanks. >>>> >>>> -Simon >>>> >>>> >>>> > >>>> >
