Re: ClientConfiguration using Kerberos & MapReduce

Josh Elser Sun, 28 May 2017 10:27:39 -0700


On 5/28/17 12:13 PM, James Srinivasan wrote:

[snip]

I can't call AccumuloInputFormat.setConnectorInfo again since it has
already been called, and I presume adding the serialised token to the
Configuration would be insecure?

Yeah, the configuration can't protect sensitive information. MapReduce/YARN
has special handling to make sure those tokens serialized in the Job's
credentials are only readable by you (the job submitter).

The thing I don't entirely follow is how you've gotten into this situation
to begin with. The adding of the delegation tokens to the Job's credentials
should be done by Accumulo's MR code on your behalf (just like it's
obtaining the delegation token, it would automatically add it to the job for
ya).

Any chance you can provide an end-to-end example? I am also pretty
Spark-ignorant -- so maybe I just don't understand what is possible and what
isn't..


Hmm, after further investigation concentrating on just MapReduce (and
not Spark) it seems the GeoMesaAccumuloInputFormat class might need
more significant work than just s/PasswordToken/KerberosToken that I
got away with previously. For example, sending an Accumulo password in
the Hadoop conf probably isn't ideal either.

Fortunately I found this:

https://github.com/apache/hive/blob/master/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableInputFormat.java

Is it a good example of Accumulo + MapReduce that I can copy?

Thanks,

James

That one is definitely over-kill. There's a bit of reflection in thereto work around older versions of Accumulo. However, it should be anexample of something that does work with Kerberos authentication.

Also, take note that Hive uses the InputFormat regardless of theexecution engine (local, MapReduce, Tez, etc). There are some commentsto that effect in the code. You can likely simplify those methods/blocksas well :)

Re: ClientConfiguration using Kerberos & MapReduce

Reply via email to