On 5/28/17 12:13 PM, James Srinivasan wrote:
[snip]
I can't call AccumuloInputFormat.setConnectorInfo again since it has
already been called, and I presume adding the serialised token to the
Configuration would be insecure?
Yeah, the configuration can't protect sensitive information. MapReduce/YARN
has special handling to make sure those tokens serialized in the Job's
credentials are only readable by you (the job submitter).
The thing I don't entirely follow is how you've gotten into this situation
to begin with. The adding of the delegation tokens to the Job's credentials
should be done by Accumulo's MR code on your behalf (just like it's
obtaining the delegation token, it would automatically add it to the job for
ya).
Any chance you can provide an end-to-end example? I am also pretty
Spark-ignorant -- so maybe I just don't understand what is possible and what
isn't..
Hmm, after further investigation concentrating on just MapReduce (and
not Spark) it seems the GeoMesaAccumuloInputFormat class might need
more significant work than just s/PasswordToken/KerberosToken that I
got away with previously. For example, sending an Accumulo password in
the Hadoop conf probably isn't ideal either.
Fortunately I found this:
https://github.com/apache/hive/blob/master/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableInputFormat.java
Is it a good example of Accumulo + MapReduce that I can copy?
Thanks,
James
That one is definitely over-kill. There's a bit of reflection in there
to work around older versions of Accumulo. However, it should be an
example of something that does work with Kerberos authentication.
Also, take note that Hive uses the InputFormat regardless of the
execution engine (local, MapReduce, Tez, etc). There are some comments
to that effect in the code. You can likely simplify those methods/blocks
as well :)