[Tutor] regular expression question

D Elliott Thu, 07 Apr 2005 05:03:47 -0700

I wonder if anyone can help me with an RE. I also wonder if there is an RE mailing list anywhere - I haven't managed to find one.

I'm trying to use this regular expression to delete particular strings from a file before tokenising it.

I want to delete all strings that have a full stop (period) when it is not at the beginning or end of a word, and also when it is not followed by a closing bracket. I want to delete file names (eg. fileX.doc), and websites (when www/http not given) but not file extensions (eg. this is in .jpg format). I also don't want to delete the last word of each sentence just because it precedes a fullstop, or if there's a fullstop followed by a closing bracket.

fullstopRe = re.compile (r'\S+\.[^)}]]+')

I've also tried fullstopRe = re.compile (r'\S+[.][^)}]]+')

I understand this to represent - any character one or more times, a full stop (I'm using the backslash, or putting it in a character class to make it literal), then any character but not any kind of closing bracket, one or more times.

If I forget about the bracket exceptions, the following works:
fullstopRe = re.compile (r'\S+[.]\S+')

But the scripts above are not deleting eg. bbc.co.uk

Can anyone enlighten me?
Thanks
Debbie


--
***************************************************
Debbie Elliott
Computer Vision and Language Research Group,
School of Computing,
University of Leeds,
Leeds LS2 9JT
United Kingdom.
Tel: 0113 3437288
Email: [EMAIL PROTECTED]
***************************************************
_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] regular expression question

Reply via email to