[ 
http://issues.apache.org/jira/browse/JAMES-387?page=comments#action_12358507 ] 

Vincenzo Gianferrari Pini commented on JAMES-387:
-------------------------------------------------

Bernd is right: buildCorpus() is in a synchronized block to avoid messing when 
new mails are fed (in a separate thread), but I forgot to handle 
synchronization problems between buildCorpus() and 
getTokenProbabilityStrengths().
I will refactor builCorpus() to avoid this dirty double use of corpus.
Moreover corpus, hamTokenCounts and spamTokenCounts seem to be not cleared when 
loading/building an updated new corpus from the database.

> Exception in BayesianAnalysis
> -----------------------------
>
>          Key: JAMES-387
>          URL: http://issues.apache.org/jira/browse/JAMES-387
>      Project: James
>         Type: Bug
>   Components: Matchers/Mailets (bundled)
>     Versions: 3.0
>  Environment: James from svn-trunk 2005-08-01.
> MySQL 4.0
>     Reporter: Stefano Bagnara
>     Assignee: Vincenzo Gianferrari Pini
>     Priority: Minor

>
> Got this exception for every incoming mail:
> 02/08/05 00:39:25 INFO  James.Mailet: BayesianAnalysis: Exception: 
> java.lang.Integer
> java.lang.ClassCastException: java.lang.Integer
>         at 
> org.apache.james.util.BayesianAnalyzer.getTokenProbabilityStrengths(BayesianAnalyzer.java:591)
>         at 
> org.apache.james.util.BayesianAnalyzer.computeSpamProbability(BayesianAnalyzer.java:340)
>         at 
> org.apache.james.transport.mailets.BayesianAnalysis.service(BayesianAnalysis.java:289)
>         at 
> org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:407)
>         at 
> org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:460)
>         at 
> org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:369)
>         at java.lang.Thread.run(Unknown Source)
> If I clean my spam/ham db the exceptions disappears but they start again when 
> the spam/ham db become large.
> My bayesiananalysis_spam contains 200000 rows.
> The following are the spam tokens with higher "occurrences".
> +---------------------------+-------------+
> | token                     | occurrences |
> +---------------------------+-------------+
> | 3D                        |       82151 |
> | a                         |       59953 |
> | the                       |       45295 |
> | FONT                      |       42771 |
> | Content-Type              |       39058 |
> | to                        |       36626 |
> | com                       |       32902 |
> | http                      |       32886 |
> | of                        |       32504 |
> | font                      |       31803 |
> | and                       |       31577 |
> | Content-Transfer-Encoding |       31576 |
> | p                         |       29746 |
> | text                      |       29482 |
> | in                        |       29418 |
> | it                        |       28498 |
> | br                        |       28037 |
> | DIV                       |       27431 |

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to