I wonder if anyone can help me with an RE. I also wonder if there is an RE mailing list anywhere - I haven't managed to find one.

I'm trying to use this regular expression to delete particular strings from a file before tokenising it.

I want to delete all strings that have a full stop (period) when it is not at the beginning or end of a word, and also when it is not followed by a closing bracket. I want to delete file names (eg. fileX.doc), and websites (when www/http not given) but not file extensions (eg. this is in .jpg format). I also don't want to delete the last word of each sentence just because it precedes a fullstop, or if there's a fullstop followed by a closing bracket.

fullstopRe = re.compile (r'\S+\.[^)}]]+')

I've also tried fullstopRe = re.compile (r'\S+[.][^)}]]+')


I understand this to represent - any character one or more times, a full stop (I'm using the backslash, or putting it in a character class to make it literal), then any character but not any kind of closing bracket, one or more times.


If I forget about the bracket exceptions, the following works:
fullstopRe = re.compile (r'\S+[.]\S+')

But the scripts above are not deleting eg. bbc.co.uk

Can anyone enlighten me?
Thanks
Debbie


-- *************************************************** Debbie Elliott Computer Vision and Language Research Group, School of Computing, University of Leeds, Leeds LS2 9JT United Kingdom. Tel: 0113 3437288 Email: [EMAIL PROTECTED] *************************************************** _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor

Reply via email to