Hi Christopher,
Am 2019-07-02 um 17:49 schrieb [ext] Osipov, Michael:
[...]
During your ~1min stall, Tomcat is still waiting for data, right? When
the connection fails, Tomcat drops its error message at the same time,
right? Can you post a stack trace of what the Tomcat thread is doing
at that time? I assume it's blocked on a read of some kind.
I need to check this with jstack. I'll get back to you as soon as possible.
So I checked this and was able to get the dump right in the moment the
request stalled. To my disappointment the offending thread did not lock
or did not wait for read() on the native socket.
I have noticed this:
"http-apr-127.0.1.2-8081-exec-3" #33 daemon prio=5 os_prio=15
tid=0x0000000a68036800 nid=0x188be runnable [0x00007fffdd1cc000]
java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
- locked <0x0000000965edc140> (a java.net.SocksSocketImpl)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at java.net.Socket.<init>(Socket.java:434)
at java.net.Socket.<init>(Socket.java:211)
at com.sun.jndi.ldap.Connection.createSocket(Connection.java:375)
at com.sun.jndi.ldap.Connection.<init>(Connection.java:215)
at com.sun.jndi.ldap.LdapClient.<init>(LdapClient.java:137)
at com.sun.jndi.ldap.LdapClient.getInstance(LdapClient.java:1609)
at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2749)
at com.sun.jndi.ldap.LdapCtx.<init>(LdapCtx.java:319)
at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:199)
at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:217)
at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:195)
at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:217)
at
com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:156)
at
com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:86)
at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:684)
at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313)
at javax.naming.InitialContext.init(InitialContext.java:244)
at javax.naming.InitialContext.<init>(InitialContext.java:216)
at
javax.naming.directory.InitialDirContext.<init>(InitialDirContext.java:101)
at
net.sf.michaelo.dirctxsrc.DirContextSource$GSSInitialDirContext.<init>(DirContextSource.java:115)
at
net.sf.michaelo.dirctxsrc.DirContextSource$1.run(DirContextSource.java:606)
at
net.sf.michaelo.dirctxsrc.DirContextSource$1.run(DirContextSource.java:583)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
net.sf.michaelo.dirctxsrc.DirContextSource.getGssApiDirContext(DirContextSource.java:583)
at
net.sf.michaelo.dirctxsrc.DirContextSource.getDirContext(DirContextSource.java:692)
at
net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.open(ActiveDirectoryRealm.java:321)
at
net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.getPrincipal(ActiveDirectoryRealm.java:268)
at
net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.authenticate(ActiveDirectoryRealm.java:255)
at
net.sf.michaelo.tomcat.authenticator.SpnegoAuthenticator.doAuthenticate(SpnegoAuthenticator.java:166)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:575)
at
org.apache.catalina.valves.rewrite.RewriteValve.invoke(RewriteValve.java:556)
We query the Active Directory via LDAP with the user's Kerberos
principal. As you can see the thread is waiting for a socket to connect.
No DCs are hardcoded, they are all retreived via DNS SRV lookups for our
AD site. The point here is that we have major trouble with two of four
DCs at our site not properly respoding to services like DNS, Kerberos,
and LDAP. (Completely out of my department's control)
I have made a quick standalone reproducer to try those faulty DCs on
port 389/3268 and I had my confirmation. They do block the thread for
more than a minute (OS connect timeout).
Our counter measures were to reduce the default connect timeout for
InitialDirContext down to 1000 ms and query another local AD site which
is not serving our subnet.
So, thank you very much giving me the right pointer to start!
One question arises though: How do I properly size the ProxyTimeout
parameter? The longest possible request?
Regards,
Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org