-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Michael,
On 7/8/19 15:36, Osipov, Michael wrote: > Christopher, > > Am 2019-07-08 um 19:55 schrieb Christopher Schultz: >> Michael, >> >> On 7/8/19 03:58, Osipov, Michael wrote: >>> Christopher, >>> >>> Am 2019-07-05 um 19:07 schrieb Christopher Schultz: >>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 >>>> >>>> Michael, >>>> >>>> On 7/5/19 11:00, Osipov, Michael wrote: >>>>> Hi Christopher, >>>>> >>>>> Am 2019-07-02 um 17:49 schrieb [ext] Osipov, Michael: >>>>>> >>>>>> [...] >>>>>>> During your ~1min stall, Tomcat is still waiting for >>>>>>> data, right? When the connection fails, Tomcat drops >>>>>>> its error message at the same time, right? Can you post >>>>>>> a stack trace of what the Tomcat thread is doing at >>>>>>> that time? I assume it's blocked on a read of some >>>>>>> kind. >>>>>> >>>>>> I need to check this with jstack. I'll get back to you as >>>>>> soon as possible. >>>>> >>>>> So I checked this and was able to get the dump right in the >>>>> moment the request stalled. To my disappointment the >>>>> offending thread did not lock or did not wait for read() on >>>>> the native socket. >>>>> >>>>> I have noticed this: >>>>>> "http-apr-127.0.1.2-8081-exec-3" #33 daemon prio=5 >>>>>> os_prio=15 tid=0x0000000a68036800 nid=0x188be runnable >>>>>> [0x00007fffdd1cc000] java.lang.Thread.State: RUNNABLE at >>>>>> java.net.PlainSocketImpl.socketConnect(Native Method) at >>>>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImp l.ja >>>> >>>>>> va:350) >>>>>> >>>>>> >>>>>> >>>> - - locked <0x0000000965edc140> (a java.net.SocksSocketImpl) >>>>>> at >>>>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSo cket >>>> >>>>>> Impl.java:206) >>>>>> >>>>>> >>>>>> >>>> at >>>>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl. java >>>> >>>>>> :188) >>>>>> >>>>>> >>>>>> >>>> at >>>> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) >>>>>> at java.net.Socket.connect(Socket.java:589) at >>>>>> java.net.Socket.connect(Socket.java:538) at >>>>>> java.net.Socket.<init>(Socket.java:434) at >>>>>> java.net.Socket.<init>(Socket.java:211) at >>>>>> com.sun.jndi.ldap.Connection.createSocket(Connection.java:375) >>>>>> at >>>>>> com.sun.jndi.ldap.Connection.<init>(Connection.java:215) >>>>>> at >>>>>> com.sun.jndi.ldap.LdapClient.<init>(LdapClient.java:137) >>>>>> at >>>>>> com.sun.jndi.ldap.LdapClient.getInstance(LdapClient.java:1609) >>>>>> at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2749) >>>>>> at com.sun.jndi.ldap.LdapCtx.<init>(LdapCtx.java:319) at >>>>>> com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java: 199) >>>>>> >>>>>> >>>> >>>>>> at >>>>>> com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java :217 >>>> >>>>>> ) >>>>>> >>>>>> >>>> at >>>>>> com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java: 195) >>>>>> >>>>>> >>>> >>>>>> at >>>>>> com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java :217 >>>> >>>>>> ) >>>>>> >>>>>> >>>> at >>>>>> com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactor y.ja >>>> >>>>>> va:156) >>>>>> >>>>>> >>>>>> >>>> at >>>>>> com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory .jav >>>> >>>>>> a:86) >>>>>> >>>>>> >>>>>> >>>> at >>>>>> javax.naming.spi.NamingManager.getInitialContext(NamingManager.ja va:6 >>>> >>>>>> 84) >>>>>> >>>>>> >>>> at >>>>>> javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java :313 >>>> >>>>>> ) >>>>>> >>>>>> >>>> at javax.naming.InitialContext.init(InitialContext.java:244) >>>>>> at >>>>>> javax.naming.InitialContext.<init>(InitialContext.java:216) >>>>>> >>>>>> at >>>>>> javax.naming.directory.InitialDirContext.<init>(InitialDirContext .jav >>>> >>>>>> a:101) >>>>>> >>>>>> >>>>>> >>>> at >>>>>> net.sf.michaelo.dirctxsrc.DirContextSource$GSSInitialDirContext.< init >>>>> >>>>>> (DirContextSource.java:115) >>>>>> >>>>>> >>>>>> >>>> at >>>>>> net.sf.michaelo.dirctxsrc.DirContextSource$1.run(DirContextSource .jav >>>> >>>>>> a:606) >>>>>> >>>>>> >>>>>> >>>> at >>>>>> net.sf.michaelo.dirctxsrc.DirContextSource$1.run(DirContextSource .jav >>>> >>>>>> a:583) >>>>>> >>>>>> >>>>>> >>>> at java.security.AccessController.doPrivileged(Native >>>> Method) >>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) at >>>>>> net.sf.michaelo.dirctxsrc.DirContextSource.getGssApiDirContext(Di rCon >>>> >>>>>> textSource.java:583) >>>>>> >>>>>> >>>>>> >>>> at >>>>>> net.sf.michaelo.dirctxsrc.DirContextSource.getDirContext(DirConte xtSo >>>> >>>>>> urce.java:692) >>>>>> >>>>>> >>>>>> >>>> at >>>>>> net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.open(ActiveDire ctor >>>> >>>>>> yRealm.java:321) >>>>>> >>>>>> >>>>>> >>>> at >>>>>> net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.getPrincipal(Ac tive >>>> >>>>>> DirectoryRealm.java:268) >>>>>> >>>>>> >>>>>> >>>> at >>>>>> net.sf.michaelo.tomcat.realm.ActiveDirectoryRealm.authenticate(Ac tive >>>> >>>>>> DirectoryRealm.java:255) >>>>>> >>>>>> >>>>>> >>>> at >>>>>> net.sf.michaelo.tomcat.authenticator.SpnegoAuthenticator.doAuthen tica >>>> >>>>>> te(SpnegoAuthenticator.java:166) >>>>>> >>>>>> >>>>>> >>>> at >>>>>> org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authen tica >>>> >>>>>> torBase.java:575) >>>>>> >>>>>> >>>>>> >>>> at >>>>>> org.apache.catalina.valves.rewrite.RewriteValve.invoke(RewriteVal ve.j >>>> >>>>>> ava:556) >>>>>> >>>>>> >>>>>> >>>>> >>>>> We query the Active Directory via LDAP with the user's >>>>> Kerberos principal. As you can see the thread is waiting >>>>> for a socket to connect. No DCs are hardcoded, they are all >>>>> retreived via DNS SRV lookups for our AD site. The point >>>>> here is that we have major trouble with two of four DCs at >>>>> our site not properly respoding to services like DNS, >>>>> Kerberos, and LDAP. (Completely out of my department's >>>>> control) I have made a quick standalone reproducer to try >>>>> those faulty DCs on port 389/3268 and I had my >>>>> confirmation. They do block the thread for more than a >>>>> minute (OS connect timeout). >>>>> >>>>> Our counter measures were to reduce the default connect >>>>> timeout for InitialDirContext down to 1000 ms and query >>>>> another local AD site which is not serving our subnet. >>>>> >>>>> So, thank you very much giving me the right pointer to >>>>> start! >>>> >>>> Strange that everything seems to work well when you connect >>>> directly to Tomcat. Can you confirm that you *never* have any >>>> issues connecting directly to Tomcat? Or did you just get >>>> lucky a few times? >>> >>> The issue did not show up via Tomcat directly because Tomcat >>> does not drop the request (timeout), the client simply waits >>> for it. Our previous services never used HTTPd as reserve >>> proxy. I started to use it to gain some experience and prepare >>> for potential balancing requirements. >>> >>>>> One question arises though: How do I properly size the >>>>> ProxyTimeout parameter? The longest possible request? >>>> >>>> I think that's really up to you. If it's too low, you'll end >>>> up with probably many hung LDAP queries with no client >>>> waiting on them, right? If it's too high, you'll make users >>>> wait and they might just stop and try again, which comes to >>>> the same conclusion. >>>> >>>> What if you add a timeout to your LDAP queries instead? >>> >>> The queries do not hang. The connect does. AS soon as the >>> connection is established, it is pretty fast. I have not set >>> "com.sun.jndi.ldap.connect.timeout=1000" and verified it to >>> work. It will quickly fail over to the next available DC from >>> SRV RRs. >> >> That's good that (a) it can be configured and (b) it actually >> works. Is that LDAP connect timeout set as a system property? Can >> it be set on a per-connection basis using e.g. a connection URL >> parameter? I'm mostly asking for my own edification, and possibly >> to help anyone with a similar problem in the future. > > That's actually straight forward: > >> <GlobalNamingResources> <Resource name="gc/ad001.siemens.net" >> type="net.sf.michaelo.dirctxsrc.DirContextSource" >> factory="net.sf.michaelo.dirctxsrc.DirContextSourceFactory" >> urls="ldap://ad001.siemens.net:3268" auth="gssapi" >> loginEntryName="tomcat-initiate" referral="ignore" >> >> additionalProperties="com.siemens.dymowerk.activedirectory.site=S-DEB LN-01;com.sun.jndi.ldap.connect.timeout=1000" >> >> /> >> </GlobalNamingResources> Oh, interesting -- it's not a part of the URL. That's unfortunate that it has to be set at the <Resource> level and it can't be set on a per-server basis. Oh, well. At least it's not a system property affecting the whole JVM! - -chris -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl0l7OoACgkQHPApP6U8 pFgdVBAAjNHmo5PmXnfYuiFaMVeg+fDfAs978+FDZD5Vbn/P5V/NsP933A8nYc60 wDC4kfNsSNzm1CmyvhnptyMpbQokMuDdds3mxa6yHrTgmtlzTvivF4yG8rgNjBvp zAJxmuQVpuL6bwxeuRV2GDTOrW04RqQpTR8Aagi53ExjRfqigisIbIMjK8EIyZVk +732bgax6xta6SMTjb8xSXxV/NwtfmyUpsYfBTXlDkdmxdixA9JHXiVRq+ZPGX88 itZYCjA265LYd94gg0b6+sIBH6t27oUHnJ+eMPii+9a3zJK7BKghBEG00pT4T0bf glFToNi2+jHroFuWomBBJoQvCdoePLwdxYGo/mKAdF+WN+2NbjPl6knGmaHVhxm0 lnTDsJIpi1F8EelvELpGVKlgkRAeuf2pAgTRptPCDVEBjlf3eDJyAQDLikI+CRvh c9pnDS+4Uj/UeUFloMoQMa5i3OuFt4PY7Kw6qrQDy+UUU4UCFRsdDSzUtWU3DHrn 7FR9Ba8Dygz/z5ErKK+6ezvtcdrKjWN90MPJsx+JuxdRBp9xOg88lyugJfw3kl3h di3TLj6dcKkMDGlqnkg52TmB92mJFCuni7yuwM25aTwH+p1caawusUzNpCgjJsgy /rYGrJrN6idMkj3zGKZzfxdyAFjR2l/ly4ClcmR5DMybkMN2qcU= =EAb4 -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org