I think we finally found the solution to the problem described on this
thread, and apparently it was not James' fault.
Because of a custom mailet that was creating filesystem copies of
certain types of email messages, our james/temp directory had about
500,000 files in it. For a particular (good) reason, these files were
being created and forgotten about.
As soon as we emptied this directory out, our problem stopped.
As a reminder, the problem was that after starting up James, after
several minutes of normal operation, James would hang for an
ever-increasing period of time -- at last count, about 80 minutes. In
hindsight, it all makes sense now; here is my theory:
The period of normal operation was the time it took to reach this line
(in our custom mailet) for the first time:
email_file = File.createTempFile( "james-email-", ".tmp" );
With an ever increasing number of files in the temp directory,
apparently this line was what was taking 10, 30, 60, 80 minutes to
complete the FIRST time it ran, with subsequent executions happening
without delay. This is my theory--I do not actually know if (or why)
File.createTempFile(...) was the hanging culprit...But it would need
to take some time to figure out how to give the file a unique name,
and that time would increase as the number of existing files
increased. But 80 minutes is an awful long time, even for 500,000 files.
In any case, we no longer suffer from the hanging on restart, and
emptying the temp directory is the only thing we've changed.
Nathan
Nathan Cheng wrote:
Thanks for the url. That's going to take some time to go through.
For the time being we blocked all non-US ips since all of the "attacker"
ips seemed to be outside the US.
James still didn't wake up.
So we restarted James. James began working immediately.
YAY! We rejoiced. But it was shortlived.
We got exactly 10 minutes and 5 seconds of happiness. Then, with netstat
still showing very normal results, we went back to the old 40-minute
wait "restart" pattern, which is the connections log starts scrolling at
10mb/3 min and the smtpserver does this for every "Watchdog default
Worker" (and then hangs):
--------------------------------
25/04/06 12:37:38 DEBUG smtpserver: Watchdog default Worker #30 has time
to sleep 300000
...
25/04/06 12:42:38 DEBUG smtpserver: Watchdog default Worker #30 has time
to sleep -75
25/04/06 12:42:38 ERROR smtpserver: SMTP Connection has idled out.
25/04/06 12:42:38 DEBUG smtpserver: Watchdog default Worker #30 is
exiting run().
--------------------------------
From this point it's about 30 minutes of hang time before we'll get a
whole string of errors in the smtp log, and then James'll start back up
again.
What does it mean to sleep -71 milliseconds?
Nathan
Stefano Bagnara wrote:
If you use Linux read this:
http://www.linuxsecurity.com/content/view/121960/49/
Otherwise you should look for a firewall with similar features that
allow you to automatically block IPs that are part of a DDoS attack.
Btw, unfortunately DDoS are hard to block.
Stefano
Nathan Cheng wrote:
We have blocked over 20 ip addresses so far, they are all non-US ips
(all our legit customers are in the US right now and would have
almost no occasion to communicate outside the US), and as soon as we
block 1, another pops up.
James is reacting in just about the same manner as it does when we
try to restart. So it is true: a restart and a DDoS look very similar
to us. Last night we had a restart, and this morning we're having a
DDoS and they look the same.
How are we supposed to deal with this? We don't have fancy hardware,
so what's the software solution?
Thanks,
Nathan
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]