Re: [Tutor] regex problem with colon

2009-08-07 Thread Kent Johnson
On Thu, Aug 6, 2009 at 11:37 PM, Tim Johnson wrote: >  Yes. You nailed it Kent. >  I had grokked the logic but not entirely the syntax. >  I'm looking thru docs now. Just curious .. is there a flag that >  enables adding a ':' to the internal list of "word" characters? No. The list is affected by

Re: [Tutor] regex problem with colon

2009-08-06 Thread Tim Johnson
* Kent Johnson [090806 18:31]: > On Thu, Aug 6, 2009 at 8:47 PM, Tim Johnson wrote: > > using python 2.5. > > I'm having a problem with including a colon as part of a substring > > bounded by whitespace or beginning of line. > > Here's an example: > > p = re.compile(r'\bcc:\b',re.IGNORECASE) > >>>

Re: [Tutor] regex problem with colon

2009-08-06 Thread Kent Johnson
On Thu, Aug 6, 2009 at 8:47 PM, Tim Johnson wrote: > using python 2.5. > I'm having a problem with including a colon as part of a substring > bounded by whitespace or beginning of line. > Here's an example: > p = re.compile(r'\bcc:\b',re.IGNORECASE) res = p.findall('malicious cc: here CC: ther

[Tutor] regex problem with colon

2009-08-06 Thread Tim Johnson
using python 2.5. I'm having a problem with including a colon as part of a substring bounded by whitespace or beginning of line. Here's an example: p = re.compile(r'\bcc:\b',re.IGNORECASE) >>> res = p.findall('malicious cc: here CC: there') >>> res [] ## Darn! I'd hope that the 'cc:' and 'CC:' subs

Re: [Tutor] regex problem

2005-01-05 Thread Alan Gauld
> > Using regex to remove HTML is usually the wrong approach unless > > Thanks. This is one of those projects I've had in mind for a long > time, decided it was a good way to learn some python. It's a good way to write increasingly complex regex! Basically because HTML is recursive in nature

Re: [Tutor] regex problem

2005-01-05 Thread Michael Powe
On Wed, Jan 05, 2005 at 06:33:32AM -0500, Kent Johnson wrote: > If you search comp.lang.python for 'convert html text', the top four > results all have solutions for this problem including a reference to this > cookbook recipe: > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52297 > >

Re: [Tutor] regex problem

2005-01-05 Thread Michael Powe
On Wed, Jan 05, 2005 at 07:37:58AM -, Alan Gauld wrote: > > This function removes HTML formatting codes from a text email > Using regex to remove HTML is usually the wrong approach unless > you can guarantee the format of the HTML in advance. The > HTMLparser is usually better and simpler.

Re: [Tutor] regex problem

2005-01-05 Thread Michael Powe
On Tue, Jan 04, 2005 at 09:15:46PM -0800, Danny Yoo wrote: > > > On Tue, 4 Jan 2005, Michael Powe wrote: > > > def parseFile(inFile) : > > import re > > bSpace = re.compile("^ ") > > multiSpace = re.compile(r"\s\s+") > > nbsp = re.compile(r" ") > > HTMLRegEx = > > > > re

Re: [Tutor] regex problem

2005-01-05 Thread Kent Johnson
If you search comp.lang.python for 'convert html text', the top four results all have solutions for this problem including a reference to this cookbook recipe: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52297 comp.lang.python can be found here: http://groups-beta.google.com/group/com

Re: [Tutor] regex problem

2005-01-04 Thread Alan Gauld
> This function removes HTML formatting codes from a text email Using regex to remove HTML is usually the wrong approach unless you can guarantee the format of the HTML in advance. The HTMLparser is usually better and simpler. I think theres an example in the module doc of converting HTML to pl

Re: [Tutor] regex problem

2005-01-04 Thread Danny Yoo
On Tue, 4 Jan 2005, Michael Powe wrote: > def parseFile(inFile) : > import re > bSpace = re.compile("^ ") > multiSpace = re.compile(r"\s\s+") > nbsp = re.compile(r" ") > HTMLRegEx = > > re.compile(r"(<|<)/?((!--.*--)|(STYLE.*STYLE)|(P|BR|b|STRONG))/?(>|>) > ",re.I) > >

Re: [Tutor] regex problem

2005-01-04 Thread Rich Krauter
Michael Powe wrote: Hello, I'm having erratic results with a regex. I'm hoping someone can pinpoint the problem. This function removes HTML formatting codes from a text email that is poorly exported -- it is supposed to be a text version of an HTML mailing, but it's basically just a text version o

Re: [Tutor] regex problem

2005-01-04 Thread Liam Clarke
Hi Michael, Is a non regex way any help? I can think of a way that uses string methods - space=" " stringStuff="Stuff with multiple spaces" indexN = 0 ranges=[] while 1: try: indexN=stringStuff.index(space, indexN) if indexN+1 == space: indexT = indexN while

[Tutor] regex problem

2005-01-04 Thread Michael Powe
Hello, I'm having erratic results with a regex. I'm hoping someone can pinpoint the problem. This function removes HTML formatting codes from a text email that is poorly exported -- it is supposed to be a text version of an HTML mailing, but it's basically just a text version of the HTML page.