If you're using Hortonworks' HDP, you would probably benefit from https://github.com/hortonworks/accumulo

There is likely a git-tag for the exact version that you're running. The line numbers would match there.

To be clear, if your services (e.g. TabletServers) aren't failing after 10hrs, you're not running into ACCUMULO-4069. Given my (limited) understanding, your problem is purely client-side. It's possible that the client-side RPC implementation isn't correctly handling the ticket re-login, but I know there is specifically code in there to handle the re-login case.

The next step would be getting some debug logging from your application around UserGroupInformation or the JDK itself, or just spin up a trivial example with a small relogin window to reproduce the problem.

On 7/12/17 3:48 PM, James Srinivasan wrote:
Yup, I'm going to spin up a vanilla 1.7.0 (maybe newer) install too to
see if it behaves any differently. There is at least one patch
included in their distro that isn't in the formal documentation, plus
it makes matching line numbers in logs to src code rather difficult.

Thanks,

James

On 12 July 2017 at 20:37, Sean Busbey <[email protected]> wrote:
Hi James!

It sounds like you may need to chase things down with your vendor,
since the precise combination of patches included will make looking at
things hard for the community.

On Wed, Jul 12, 2017 at 11:01 AM, James Srinivasan
<[email protected]> wrote:
Hi,

So I've fired off a thread to perform the periodic
checkTGTAndReloginFromKeytab call which seems to be running, but the
connection still fails with GSS errors after precisely 10 hours.

While I am running 1.7.0, it seems the vendor included the
ACCUMULO-4069 patch, and immediately after the exception is thrown I
see a log entry "Performing ticket-cache-based Kerberos re-login".
However, it should be using a keytab - have turned up the logging to
11 and will leave running overnight...

James

On 11 July 2017 at 16:17, Josh Elser <[email protected]> wrote:
Nope, you've got it exactly right! That's the code I would've pointed you at
to copy :)

If/when you do get to long-running MR jobs, see the
"general.delegation.token.*" configuration properties in this table[1]. I
think the docs are citing that one delegation token is valid for 7 days, but
it's been a long time since writing/testing that code.

- Josh

[1]
https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2

On 7/11/17 1:25 AM, James Srinivasan wrote:

Thanks both. I can't (easily) upgrade beyond 1.7.0, but have raised a
support case with our Hadoop distribution vendor.

I'm not (yet) worried about expiration with MapReduce - for now I'll
try to keep such jobs to under 24h! Outside MR, sounds like I just
need to periodically call
UserGroupInformation.checkTGTAndReloginFromKeytab like


https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121

Or is the TGT associated with an Accumulo KerberosToken separate?

Thanks,

James

On 11 July 2017 at 02:59, Josh Elser <[email protected]> wrote:

No, you are (likely) not running into ACCUMULO-4069. What you've
described sounds like your client's ticket expired. Accumulo does not
spawn any ticket renewal on the behalf of clients.

Hadoop's UGI code will automatically spawn a renewal thread when you
log in using a ticket cache. This does not happen automatically when
you use a keytab (I have no explanation as to why this is). This is
the most likely cause of your error and something you need to correct
in your application (spawn a thread to renew your application's
ticket).

If you are using MapReduce, you have yet another layer of indirection
with DelegationTokens, but that's probably not what you're seeing (as
DelegationTokens don't actually have a Kerberos TGT).

On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[email protected]> wrote:

It certainly sounds like the same issue. I'd recommend upgrading to the
latest 1.7.3 (currently the latest 1.7 version) to include all the bugs
we've found and fixed in that release line.

On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
<[email protected]> wrote:


I'm using Accumulo 1.7.0 and finding that after some period of time
(>8 hours, <3 days - happened over the weekend) my ingest fails with
errors regarding "Failed to find any Kerberos tgt". My guess is that
the ticket from the keytab has expired, and needs to be renewed - from
memory, I had seen a Kerberos tgt renewer thread running in my client,
so assumed it happened automagically. Is that the case? Perhaps I am
hitting this bug? https://issues.apache.org/jira/browse/ACCUMULO-4069

Thanks,

James



--
busbey

Reply via email to