On Thu, Aug 6, 2009 at 11:37 PM, Tim Johnson wrote:
> Yes. You nailed it Kent.
> I had grokked the logic but not entirely the syntax.
> I'm looking thru docs now. Just curious .. is there a flag that
> enables adding a ':' to the internal list of "word" characters?
No. The list is affected by
* Kent Johnson [090806 18:31]:
> On Thu, Aug 6, 2009 at 8:47 PM, Tim Johnson wrote:
> > using python 2.5.
> > I'm having a problem with including a colon as part of a substring
> > bounded by whitespace or beginning of line.
> > Here's an example:
> > p = re.compile(r'\bcc:\b',re.IGNORECASE)
> >>>
On Thu, Aug 6, 2009 at 8:47 PM, Tim Johnson wrote:
> using python 2.5.
> I'm having a problem with including a colon as part of a substring
> bounded by whitespace or beginning of line.
> Here's an example:
> p = re.compile(r'\bcc:\b',re.IGNORECASE)
res = p.findall('malicious cc: here CC: ther
using python 2.5.
I'm having a problem with including a colon as part of a substring
bounded by whitespace or beginning of line.
Here's an example:
p = re.compile(r'\bcc:\b',re.IGNORECASE)
>>> res = p.findall('malicious cc: here CC: there')
>>> res
[]
## Darn! I'd hope that the 'cc:' and 'CC:' subs
> > Using regex to remove HTML is usually the wrong approach unless
>
> Thanks. This is one of those projects I've had in mind for a long
> time, decided it was a good way to learn some python.
It's a good way to write increasingly complex regex! Basically
because HTML is recursive in nature
On Wed, Jan 05, 2005 at 06:33:32AM -0500, Kent Johnson wrote:
> If you search comp.lang.python for 'convert html text', the top four
> results all have solutions for this problem including a reference to this
> cookbook recipe:
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52297
>
>
On Wed, Jan 05, 2005 at 07:37:58AM -, Alan Gauld wrote:
> > This function removes HTML formatting codes from a text email
> Using regex to remove HTML is usually the wrong approach unless
> you can guarantee the format of the HTML in advance. The
> HTMLparser is usually better and simpler.
On Tue, Jan 04, 2005 at 09:15:46PM -0800, Danny Yoo wrote:
>
>
> On Tue, 4 Jan 2005, Michael Powe wrote:
>
> > def parseFile(inFile) :
> > import re
> > bSpace = re.compile("^ ")
> > multiSpace = re.compile(r"\s\s+")
> > nbsp = re.compile(r" ")
> > HTMLRegEx =
> >
> > re
If you search comp.lang.python for 'convert html text', the top four results all have solutions for
this problem including a reference to this cookbook recipe:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52297
comp.lang.python can be found here:
http://groups-beta.google.com/group/com
> This function removes HTML formatting codes from a text email
Using regex to remove HTML is usually the wrong approach unless
you can guarantee the format of the HTML in advance. The
HTMLparser is usually better and simpler. I think theres an example
in the module doc of converting HTML to pl
On Tue, 4 Jan 2005, Michael Powe wrote:
> def parseFile(inFile) :
> import re
> bSpace = re.compile("^ ")
> multiSpace = re.compile(r"\s\s+")
> nbsp = re.compile(r" ")
> HTMLRegEx =
>
> re.compile(r"(<|<)/?((!--.*--)|(STYLE.*STYLE)|(P|BR|b|STRONG))/?(>|>)
> ",re.I)
>
>
Michael Powe wrote:
Hello,
I'm having erratic results with a regex. I'm hoping someone can
pinpoint the problem.
This function removes HTML formatting codes from a text email that is
poorly exported -- it is supposed to be a text version of an HTML
mailing, but it's basically just a text version o
Hi Michael,
Is a non regex way any help? I can think of a way that uses string methods -
space=" "
stringStuff="Stuff with multiple spaces"
indexN = 0
ranges=[]
while 1:
try:
indexN=stringStuff.index(space, indexN)
if indexN+1 == space:
indexT = indexN
while
Hello,
I'm having erratic results with a regex. I'm hoping someone can
pinpoint the problem.
This function removes HTML formatting codes from a text email that is
poorly exported -- it is supposed to be a text version of an HTML
mailing, but it's basically just a text version of the HTML page.
14 matches
Mail list logo