Thanks to Matt, Kent and Danny for helping me with my regex question. I will try your suggestions this morning.

In response to Danny's question about tokenising first, there are reasons why I don't want to do this - the initial problem was that filenames in my test data were being tokenised as separate words. EG. DataMarchAccounts.txt would be tokenised as two words, neither of which are real words that can be found in an English dictionary. (Often, filenames are not proper words, which is why I needed to delete the whole string - and by 'string' I mean any consecutive string of non-whitespace characters.) Because I don't want to subsequently analyse any 'non-words', only real words that will then be automatically checked against a lexicon.

Well - my code is all done now, apart from the tweaking of this one RE. BTW - I am new to Python and had never done any programming before that, so you may see some more questions from me in the future...

Cheers again,
Debbie
--
***************************************************
Debbie Elliott
Computer Vision and Language Research Group,
School of Computing,
University of Leeds,
Leeds LS2 9JT
United Kingdom.
Tel: 0113 3437288
Email: [EMAIL PROTECTED]
***************************************************
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to