Michael,

On 7/8/19 03:58, Osipov, Michael wrote:
> Christopher,
> 
> Am 2019-07-05 um 19:07 schrieb Christopher Schultz:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> Michael,
>>
>> On 7/5/19 11:00, Osipov, Michael wrote:
>>> Hi Christopher,
>>>
>>> Am 2019-07-02 um 17:49 schrieb [ext] Osipov, Michael:
>>>>
>>>> [...]
>>>>> During your ~1min stall, Tomcat is still waiting for data,
>>>>> right? When the connection fails, Tomcat drops its error
>>>>> message at the same time, right? Can you post a stack trace of
>>>>> what the Tomcat thread is doing at that time? I assume it's
>>>>> blocked on a read of some kind.
>>>>
>>>> I need to check this with jstack. I'll get back to you as soon
>>>> as possible.
>>>
>>> So I checked this and was able to get the dump right in the moment
>>> the request stalled. To my disappointment the offending thread did
>>> not lock or did not wait for read() on the native socket.
>>>
>>> I have noticed this:
>>>> "http-apr-127.0.1.2-8081-exec-3" #33 daemon prio=5 os_prio=15
>>>> tid=0x0000000a68036800 nid=0x188be runnable [0x00007fffdd1cc000]
>>>> java.lang.Thread.State: RUNNABLE at
>>>> java.net.PlainSocketImpl.socketConnect(Native Method) at
>>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.ja
>> va:350)
>>>>
>>>>
>>>>
>> - - locked <0x0000000965edc140> (a java.net.SocksSocketImpl)
>>>> at
>>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocket
>> Impl.java:206)
>>>>
>>>>
>>>>
>> at
>>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java
>> :188)
>>>>
>>>>
>>>>
>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>> at java.net.Socket.connect(Socket.java:589) at
>>>> java.net.Socket.connect(Socket.java:538) at
>>>> java.net.Socket.<init>(Socket.java:434) at
>>>> java.net.Socket.<init>(Socket.java:211) at
>>>> com.sun.jndi.ldap.Connection.createSocket(Connection.java:375) at
>>>> com.sun.jndi.ldap.Connection.<init>(Connection.java:215) at
>>>> com.sun.jndi.ldap.LdapClient.<init>(LdapClient.java:137) at
>>>> com.sun.jndi.ldap.LdapClient.getInstance(LdapClient.java:1609) at
>>>> com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2749) at
>>>> com.sun.jndi.ldap.LdapCtx.<init>(LdapCtx.java:319) at
>>>> com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:199)
>>>>
>>>>
>> at
>>>> com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:217
>> )
>>>>
>>>>
>> at
>>>> com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:195)
>>>>
>>>>
>> at
>>>> com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:217
>> )
>>>>
>>>>
>> at
>>>> com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.ja
>> va:156)
>>>>
>>>>
>>>>
>> at
>>>> com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.jav
>> a:86)
>>>>
>>>>
>>>>
>> at
>>>> javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:6
>> 84)
>>>>
>>>>
>> at
>>>> javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313
>> )
>>>>
>>>>
>> at javax.naming.InitialContext.init(InitialContext.java:244)
>>>> at javax.naming.InitialContext.<init>(InitialContext.java:216)
>>>> at
>>>> javax.naming.directory.InitialDirContext.<init>(InitialDirContext.jav
>> a:101)
>>>>
>>>>
>>>>
>> at
>>>> net.sf.michaelo.dirctxsrc.DirContextSource$GSSInitialDirContext.<init
>>> (DirContextSource.java:115)
>>>>
>>>>
>>>>
>> at
>>>> net.sf.michaelo.dirctxsrc.DirContextSource$1.run(DirContextSource.jav
>> a:606)
>>>>
>>>>
>>>>
>> at
>>>> net.sf.michaelo.dirctxsrc.DirContextSource$1.run(DirContextSource.jav
>> a:583)
>>>>
>>>>
>>>>
>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:422) at
>>>> net.sf.michaelo.dirctxsrc.DirContextSource.getGssApiDirContext(DirCon
>> textSource.java:583)
>>>>
>>>>
>>>>
>> at
>>>> net.sf.michaelo.dirctxsrc.DirContextSource.getDirContext(DirContextSo
>> urce.java:692)
>>>>
>>>>
>>>>
>> at
>>>> net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.open(ActiveDirector
>> yRealm.java:321)
>>>>
>>>>
>>>>
>> at
>>>> net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.getPrincipal(Active
>> DirectoryRealm.java:268)
>>>>
>>>>
>>>>
>> at
>>>> net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.authenticate(Active
>> DirectoryRealm.java:255)
>>>>
>>>>
>>>>
>> at
>>>> net.sf.michaelo.tomcat.authenticator.SpnegoAuthenticator.doAuthentica
>> te(SpnegoAuthenticator.java:166)
>>>>
>>>>
>>>>
>> at
>>>> org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authentica
>> torBase.java:575)
>>>>
>>>>
>>>>
>> at
>>>> org.apache.catalina.valves.rewrite.RewriteValve.invoke(RewriteValve.j
>> ava:556)
>>>>
>>>>
>>>>
>>>
>>> We query the Active Directory via LDAP with the user's Kerberos
>>> principal. As you can see the thread is waiting for a socket to
>>> connect. No DCs are hardcoded, they are all retreived via DNS SRV
>>> lookups for our AD site. The point here is that we have major
>>> trouble with two of four DCs at our site not properly respoding to
>>> services like DNS, Kerberos, and LDAP. (Completely out of my
>>> department's control) I have made a quick standalone reproducer to
>>> try those faulty DCs on port 389/3268 and I had my confirmation.
>>> They do block the thread for more than a minute (OS connect
>>> timeout).
>>>
>>> Our counter measures were to reduce the default connect timeout
>>> for InitialDirContext down to 1000 ms and query another local AD
>>> site which is not serving our subnet.
>>>
>>> So, thank you very much giving me the right pointer to start!
>>
>> Strange that everything seems to work well when you connect directly
>> to Tomcat. Can you confirm that you *never* have any issues connecting
>> directly to Tomcat? Or did you just get lucky a few times?
> 
> The issue did not show up via Tomcat directly because Tomcat does not
> drop the request (timeout), the client simply waits for it. Our previous
> services never used HTTPd as reserve proxy. I started to use it to gain
> some experience and prepare for potential balancing requirements.
> 
>>> One question arises though: How do I properly size the
>>> ProxyTimeout parameter? The longest possible request?
>>
>> I think that's really up to you. If it's too low, you'll end up with
>> probably many hung LDAP queries with no client waiting on them, right?
>> If it's too high, you'll make users wait and they might just stop and
>> try again, which comes to the same conclusion.
>>
>> What if you add a timeout to your LDAP queries instead?
> 
> The queries do not hang. The connect does. AS soon as the connection is
> established, it is pretty fast. I have not set
> "com.sun.jndi.ldap.connect.timeout=1000" and verified it to work. It
> will quickly fail over to the next available DC from SRV RRs.

That's good that (a) it can be configured and (b) it actually works. Is
that LDAP connect timeout set as a system property? Can it be set on a
per-connection basis using e.g. a connection URL parameter? I'm mostly
asking for my own edification, and possibly to help anyone with a
similar problem in the future.

Good luck with your unreliable network :)

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to