Re: reverse DNS timeouts and SocketPermission

Tom Hobbs Tue, 15 Mar 2011 10:02:35 -0700

I've not experienced this issue myself.  It's an interesting one, and
Gregg's response is also intriguing.


I know it's not that helpful to you, but I'll see what I can do about
including something about this on the River site or wiki.

Chris, if you feel this is an issue that River can/should solve then
please create a Jira for it otherwise it'll get lost in the mists of
time.

On Tue, Mar 15, 2011 at 4:42 PM, Christopher Dolan
<[email protected]> wrote:
> Understood, increasing that value to something large would make me just
> suffer that timeout once per remote machine per reboot. Is this the
> solution most River users have employed, or have most of you simply
> never had to deal with this problem? In my case, I may connect to
> hundreds of remote machines via an app that wants a short startup time,
> so this solution concerns me.
>
> Chris
>
> -----Original Message-----
> From: Gregg Wonderly [mailto:[email protected]]
> Sent: Sunday, March 13, 2011 9:08 AM
> To: [email protected]
> Subject: Re: reverse DNS timeouts and SocketPermission
>
> Dns failure ttl change is the most useful way to deal with this. 10
> seconds is the default and a failing dns query will be longer than that.
> So every use of the name will result in a new attempt to lookup the same
> thing on the same failing server
>
> Gregg
>
> Sent from my iPhone
>
> On Mar 10, 2011, at 3:06 PM, "Christopher Dolan"
> <[email protected]> wrote:
>
>> The java.net.SocketPermission class uses forward and reverse DNS
> lookups
>> to ensure that we're allowed to talk to particular remote machines.
>> These lookups are used to canonicalize a remote host's name to ensure
>> that variations in that name don't lead to false negatives.
>>
>> However, many people have found
>> (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4975882) that if
>> there are configuration errors in a DNS system, the reverse DNS
> failures
>> cause very significant latency (e.g. I've seen 10-12 seconds). This
>> latency has widely varying affects on a djinn. In many cases, it just
>> causes LookupCache slowdowns which can be mitigated by delayed
>> deserialization techniques discussed previously on the dev@ mailing
>> list. But in some cases, I've seen it cause Reggie to hang up for a
>> while (I still don't understand where in Reggie the problem occurs,
>> maybe EventListeners?)
>>
>> Obviously, the real solution is to properly configure DNS. But I would
>> like to know how other people have addressed this issue in their
>> deployments.
>>
>> * Do you ensure the RMI codebase URLs all use canonical hostnames, or
>> IP addresses?
>> * Do you ensure that the TcpServerEndpoint has a consistent (perhaps
>> hard-coded) name?
>> * Do you have monitoring or logging code to proactively detect DNS
>> configuration errors?
>> * Do you fiddle the Java security property
>> "networkaddress.cache.negative.ttl"?
>> * Do you use host files?
>> * Do you use a non-Sun JVM?
>> * Do you use wildcards or IP addresses in your security policy file?
>> * Do you completely disable the socket check in your security policy
>> file? (yikes!)
>> * Have you simply never seen this problem?  (lucky you!)
>>
>> Thanks,
>> Chris
>>
>

Re: reverse DNS timeouts and SocketPermission

Reply via email to