[ 
http://issues.apache.org/jira/browse/JAMES-387?page=comments#action_12358307 ] 

Bernd Fondermann commented on JAMES-387:
----------------------------------------

I looked at the Mailet code and found that in buildCorpus(), instance variable 
"corpus" is filled with all ham and spam tokens which appear to be Maps of 
(String, Integer) pairs. Afterwards, the map is iterated and all values are 
replaced by Doubles, but while this is running (and taking longer every time) 
there could still be a fair amount of Integer-typed values.
If  another thread is stepping into line 591 at the same time this is still in 
process the error could very well occur because "corpus" is read there.
Are new mails fed in a separate thread?

The class cast in line 591 could be changed to "Number", as a very simple 
solution. Maybe it would also be appropriate to refactor buildCorpus() to work 
on a local map until it is ready with re-filling it with Doubles.

Hope this analysis makes some sense and I did not completely misread this whole 
case... :-)



> Exception in BayesianAnalysis
> -----------------------------
>
>          Key: JAMES-387
>          URL: http://issues.apache.org/jira/browse/JAMES-387
>      Project: James
>         Type: Bug
>   Components: Matchers/Mailets (bundled)
>     Versions: 3.0
>  Environment: James from svn-trunk 2005-08-01.
> MySQL 4.0
>     Reporter: Stefano Bagnara
>     Assignee: Vincenzo Gianferrari Pini
>     Priority: Minor

>
> Got this exception for every incoming mail:
> 02/08/05 00:39:25 INFO  James.Mailet: BayesianAnalysis: Exception: 
> java.lang.Integer
> java.lang.ClassCastException: java.lang.Integer
>         at 
> org.apache.james.util.BayesianAnalyzer.getTokenProbabilityStrengths(BayesianAnalyzer.java:591)
>         at 
> org.apache.james.util.BayesianAnalyzer.computeSpamProbability(BayesianAnalyzer.java:340)
>         at 
> org.apache.james.transport.mailets.BayesianAnalysis.service(BayesianAnalysis.java:289)
>         at 
> org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:407)
>         at 
> org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:460)
>         at 
> org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:369)
>         at java.lang.Thread.run(Unknown Source)
> If I clean my spam/ham db the exceptions disappears but they start again when 
> the spam/ham db become large.
> My bayesiananalysis_spam contains 200000 rows.
> The following are the spam tokens with higher "occurrences".
> +---------------------------+-------------+
> | token                     | occurrences |
> +---------------------------+-------------+
> | 3D                        |       82151 |
> | a                         |       59953 |
> | the                       |       45295 |
> | FONT                      |       42771 |
> | Content-Type              |       39058 |
> | to                        |       36626 |
> | com                       |       32902 |
> | http                      |       32886 |
> | of                        |       32504 |
> | font                      |       31803 |
> | and                       |       31577 |
> | Content-Transfer-Encoding |       31576 |
> | p                         |       29746 |
> | text                      |       29482 |
> | in                        |       29418 |
> | it                        |       28498 |
> | br                        |       28037 |
> | DIV                       |       27431 |

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to