Update of /cvsroot/spambayes/spambayes/testtools
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14471/testtools

Modified Files:
        incremental.HOWTO.txt 
Log Message:
Minor updates.

Index: incremental.HOWTO.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/testtools/incremental.HOWTO.txt,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** incremental.HOWTO.txt       28 Dec 2003 01:12:11 -0000      1.5
--- incremental.HOWTO.txt       7 Apr 2005 01:34:41 -0000       1.6
***************
*** 1,9 ****
- Yes, this is a lame attempt at explaining what I've built,
- in the vain hope that someone will read it and improve it.
- I'm writing this with only about 4 hours sleep, so my
- coherency may not be particularly high.
- 
- 
- 
  There are a few steps to doing incremental training tests:
  
--- 1,2 ----
***************
*** 12,24 ****
     sequence and group them.  The corpora need to be in the
     good old familiar Data/{Ham,Spam}/{reservoir,Set*} tree.
!    For my purposes, I wrote the es2hs.py tool to grab stuff
!    out of my real MH mail archive folders; other people may
!    want some other method of getting the corpora into the
!    tree.
  
  2. Sort and group the corpora.  When testing, messages will
     be processed in sorted order.  The messages should all
     have unique names with a group number and an id number
!    separated by a dash (eg. 0123-004556).  I wrote
     sort+group.py for this.  sort+group.py sorts the messages
     into chronological order (by topmost Received header) and
--- 5,18 ----
     sequence and group them.  The corpora need to be in the
     good old familiar Data/{Ham,Spam}/{reservoir,Set*} tree.
!    For my (Alex) purposes, I wrote the es2hs.py tool to grab
!    stuff out of my real MH mail archive folders; other people
!    may want some other method of getting the corpora into the
!    tree.  If you're using Outlook, then the
!    Outlook2000/export.py script is what you are after.
  
  2. Sort and group the corpora.  When testing, messages will
     be processed in sorted order.  The messages should all
     have unique names with a group number and an id number
!    separated by a dash (eg. 0123-004556).  I (Alex) wrote
     sort+group.py for this.  sort+group.py sorts the messages
     into chronological order (by topmost Received header) and
***************
*** 30,35 ****
     the oldest msg found.
  
!    Note that this script will run through *all* the files in
!    the Data directory, not just those in Data/Ham and Data/Spam.
  
  3. Distribute the corpora into multiple sets so you can do
--- 24,32 ----
     the oldest msg found.
  
!    With 1.0.x, note that this script will run through *all* the
!    files in the Data directory, not just those in Data/Ham and
!    Data/Spam.  With 1.1, only those specified in the
!    ham_directories and spam_directories will be used, unless
!    the -a option is used.
  
  3. Distribute the corpora into multiple sets so you can do
***************
*** 64,71 ****
     to do this, outputting datasets for plotmtv.  plotmtv is
     a really neat data visualization tool.  Use it.  Love it.
!    Gods, I need more sleep.
  
  See dotest.sh for a sample of automating steps 4 & 5.
- 
- Please, somebody rewrite this file.
- 
--- 61,65 ----
     to do this, outputting datasets for plotmtv.  plotmtv is
     a really neat data visualization tool.  Use it.  Love it.
!    XXX tools for Excel.
  
  See dotest.sh for a sample of automating steps 4 & 5.

_______________________________________________
Spambayes-checkins mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-checkins

Reply via email to