[ 
http://issues.apache.org/jira/browse/JAMES-603?page=comments#action_12432345 ] 
            
Stefano Bagnara commented on JAMES-603:
---------------------------------------

The worst scenario: everything stuck and not accepting new mail:

I already described how it happens to have the ougoing spool locked and every 
outgoing thread waiting to obtain the lock.
Now I experienced something worse and I think I got why:
I have 10 spool threads, 10 smtp workers.
I have 9 email in the spool to be remotly delivered.
9 of the 10 spool threads lock the 9 emails from the spool and start waiting to 
lock the outgoing repository.
The 10th spool threads start an infinite loop over the accept of the main spool 
because it find 9 mails, but it can't lock them because are being processed by 
the other threads, so it keeps an infinite lock over the main spool.(this 
happen because the loadPendingMessages take more than 1 second maybe because 
the server is already stressing the db with the outgoing thread looping into 
the accpet)
The first 10 incoming smtp connections will stuck trying to acquire the lock on 
the main spool to store the messages and you are under DOS.

I clearly remember user reports in the mailing list in the past months/years 
reporting similar scenario and maybe we finally found the problem.

So this bug also affect the main spool even if it is more rare because mails in 
the main spool are always acceptable if they are not locked and this happens 
only when all the available messages are locked and the accept query takes more 
than 1 second: but it happens because I saw it and I have the thread dump if 
anyone want to look at it.

> Outgoing spooling stuck over old mails when more than 1000 old mails are 
> present in outgoing.
> ---------------------------------------------------------------------------------------------
>
>                 Key: JAMES-603
>                 URL: http://issues.apache.org/jira/browse/JAMES-603
>             Project: James
>          Issue Type: Bug
>          Components: SpoolManager & Processors, Remote Delivery
>    Affects Versions: 2.3.0rc2
>            Reporter: Stefano Bagnara
>            Priority: Blocker
>             Fix For: 2.3.0
>
>
> scenario:
> remote delivery has 6 hours for the third delaytime
> insert into the outgoing spool 1000 messages with a last_updated 5 hours ago 
> and error_message 3
> start james
> send a new message
> the first remote delivery thread is stuck in the main accept method because 
> getNextPendingMessage ALWAYS return a new pending message but none of them is 
> ready to be processed. The bad news is that after it finish the 1000 messages 
> from pendingMessages it simply restart the loadPendingMessages and try them 
> again, without waiting.
> So 100% CPU used until we are able to spool the 1000 "old" messages and then 
> our james will return to normality.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to