I've not experienced this issue myself. It's an interesting one, and Gregg's response is also intriguing.
I know it's not that helpful to you, but I'll see what I can do about including something about this on the River site or wiki. Chris, if you feel this is an issue that River can/should solve then please create a Jira for it otherwise it'll get lost in the mists of time. On Tue, Mar 15, 2011 at 4:42 PM, Christopher Dolan <[email protected]> wrote: > Understood, increasing that value to something large would make me just > suffer that timeout once per remote machine per reboot. Is this the > solution most River users have employed, or have most of you simply > never had to deal with this problem? In my case, I may connect to > hundreds of remote machines via an app that wants a short startup time, > so this solution concerns me. > > Chris > > -----Original Message----- > From: Gregg Wonderly [mailto:[email protected]] > Sent: Sunday, March 13, 2011 9:08 AM > To: [email protected] > Subject: Re: reverse DNS timeouts and SocketPermission > > Dns failure ttl change is the most useful way to deal with this. 10 > seconds is the default and a failing dns query will be longer than that. > So every use of the name will result in a new attempt to lookup the same > thing on the same failing server > > Gregg > > Sent from my iPhone > > On Mar 10, 2011, at 3:06 PM, "Christopher Dolan" > <[email protected]> wrote: > >> The java.net.SocketPermission class uses forward and reverse DNS > lookups >> to ensure that we're allowed to talk to particular remote machines. >> These lookups are used to canonicalize a remote host's name to ensure >> that variations in that name don't lead to false negatives. >> >> However, many people have found >> (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4975882) that if >> there are configuration errors in a DNS system, the reverse DNS > failures >> cause very significant latency (e.g. I've seen 10-12 seconds). This >> latency has widely varying affects on a djinn. In many cases, it just >> causes LookupCache slowdowns which can be mitigated by delayed >> deserialization techniques discussed previously on the dev@ mailing >> list. But in some cases, I've seen it cause Reggie to hang up for a >> while (I still don't understand where in Reggie the problem occurs, >> maybe EventListeners?) >> >> Obviously, the real solution is to properly configure DNS. But I would >> like to know how other people have addressed this issue in their >> deployments. >> >> * Do you ensure the RMI codebase URLs all use canonical hostnames, or >> IP addresses? >> * Do you ensure that the TcpServerEndpoint has a consistent (perhaps >> hard-coded) name? >> * Do you have monitoring or logging code to proactively detect DNS >> configuration errors? >> * Do you fiddle the Java security property >> "networkaddress.cache.negative.ttl"? >> * Do you use host files? >> * Do you use a non-Sun JVM? >> * Do you use wildcards or IP addresses in your security policy file? >> * Do you completely disable the socket check in your security policy >> file? (yikes!) >> * Have you simply never seen this problem? (lucky you!) >> >> Thanks, >> Chris >> >
