I think we finally found the solution to the problem described on this thread, and apparently it was not James' fault.

Because of a custom mailet that was creating filesystem copies of certain types of email messages, our james/temp directory had about 500,000 files in it. For a particular (good) reason, these files were being created and forgotten about.

As soon as we emptied this directory out, our problem stopped.

As a reminder, the problem was that after starting up James, after several minutes of normal operation, James would hang for an ever-increasing period of time -- at last count, about 80 minutes. In hindsight, it all makes sense now; here is my theory:

The period of normal operation was the time it took to reach this line (in our custom mailet) for the first time:

email_file = File.createTempFile( "james-email-", ".tmp" );

With an ever increasing number of files in the temp directory, apparently this line was what was taking 10, 30, 60, 80 minutes to complete the FIRST time it ran, with subsequent executions happening without delay. This is my theory--I do not actually know if (or why) File.createTempFile(...) was the hanging culprit...But it would need to take some time to figure out how to give the file a unique name, and that time would increase as the number of existing files increased. But 80 minutes is an awful long time, even for 500,000 files.

In any case, we no longer suffer from the hanging on restart, and emptying the temp directory is the only thing we've changed.

Nathan




Nathan Cheng wrote:

Thanks for the url. That's going to take some time to go through.

For the time being we blocked all non-US ips since all of the "attacker" ips seemed to be outside the US.

James still didn't wake up.

So we restarted James. James began working immediately.

YAY! We rejoiced. But it was shortlived.

We got exactly 10 minutes and 5 seconds of happiness. Then, with netstat still showing very normal results, we went back to the old 40-minute wait "restart" pattern, which is the connections log starts scrolling at 10mb/3 min and the smtpserver does this for every "Watchdog default Worker" (and then hangs):

--------------------------------
25/04/06 12:37:38 DEBUG smtpserver: Watchdog default Worker #30 has time to sleep 300000
...
25/04/06 12:42:38 DEBUG smtpserver: Watchdog default Worker #30 has time to sleep -75
25/04/06 12:42:38 ERROR smtpserver: SMTP Connection has idled out.
25/04/06 12:42:38 DEBUG smtpserver: Watchdog default Worker #30 is exiting run().
--------------------------------

From this point it's about 30 minutes of hang time before we'll get a whole string of errors in the smtp log, and then James'll start back up again.

What does it mean to sleep -71 milliseconds?

Nathan

Stefano Bagnara wrote:

If you use Linux read this:
http://www.linuxsecurity.com/content/view/121960/49/

Otherwise you should look for a firewall with similar features that allow you to automatically block IPs that are part of a DDoS attack.

Btw, unfortunately DDoS are hard to block.

Stefano

Nathan Cheng wrote:

We have blocked over 20 ip addresses so far, they are all non-US ips (all our legit customers are in the US right now and would have almost no occasion to communicate outside the US), and as soon as we block 1, another pops up.

James is reacting in just about the same manner as it does when we try to restart. So it is true: a restart and a DDoS look very similar to us. Last night we had a restart, and this morning we're having a DDoS and they look the same.

How are we supposed to deal with this? We don't have fancy hardware, so what's the software solution?

Thanks,

Nathan




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to