['yahoo.de']
> 1-and if the sample is like this:
> sunBuy is shinigViagrawww.xyx.com/dfdf.html
> ?
>
> 2-how manytokens will be there?
Marshall:~/spambayes tameyer$ python -c "from spambayes import
tokenizer;print list(tokenizer.tokenize('sunBuy is
shinigViagrawww.xyx.com/dfdf.html'))"
['content-type:text/plain', 'from:none', 'to:none', 'cc:none',
'sender:none', 'reply-to:none', 'x-mailer:none', 'message-
id:invalid', 'sunbuy', 'skip:s 30']
The first eight tokens (the ones with colons) are all header tokens,
so I presume the answer you are looking for is "two".
It is just as Tim & Tim said: basically it's split-on-whitespace, and
reading tokenizer.py is what you should do to learn more (and ask
questions if parts don't make sense).
=Tony.Meyer
--
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html