Re: [Tutor] MemoryError

Kent Johnson Thu, 09 Dec 2004 16:41:11 -0800

Liam,

Here's a nifty re trick for you. The sub() method can take a function as the replacement parameter. Instead of replacing with a fixed string, the function is called with the match object. Whatever string the function returns, is substituted for the match. So you can simplify your code a bit, something like this:

def replaceTag(item):   # item is a match object
    # This is exactly your code
    text=gettextFunc(item.group()) #Will try and stick to string method
 for this, but I'll see.
    if not text:
       text="Default" #Will give a text value for the href, so some
 lucky human can change it
    url=geturlFunc(item.group()) # The simpler the better, and so far
 re has been the simplest
    if not url:
      href = '"" #This will delete the applet, as there are applet's
 acting as placeholders
    else:
      href='<a "%s">%s</a>' % (url, text)

    # Now return href
    return href

now your loop and replacements get replaced by the single line
codeSt = reObj.sub(replaceTag, codeSt)

:-)

Kent

Liam Clarke wrote:

Hi all,

Yeah, I should've written this in functions from the get go, but I
thought it would be a simple script. :/

I'll come back to that script when I've had some sleep, my son was
recently born and it's amazing how dramatically lack of sleep affects
my acuity. But, I want to figure out what's going wrong.

That said, the re path is bearing fruit. I love the method finditer(),
 as I can reduce my overly complicated string methods from my original
code to

x=file("toolkit.txt",'r') s=x.read() x.close() appList=[]

regExIter=reObj.finditer(s) #Here's a re obj I compiled earlier.

for item in regExIter:
   text=gettextFunc(item.group()) #Will try and stick to string method
for this, but I'll see.
   if not text:
      text="Default" #Will give a text value for the href, so some
lucky human can change it
   url=geturlFunc(item.group()) # The simpler the better, and so far
re has been the simplest
   if not url:
     href = '"" #This will delete the applet, as there are applet's
acting as placeholders
   else:
     href='<a "%s">%s</a>' % (url, text)

   appList.append(item.span(), href)

appList.reverse()

for ((start, end), href) in appList:

     codeSt=codeSt.replace(codeSt[start:end], href)


Of course, that's just a rought draft, but it seems a whole lot
simpler to me. S'pose code needs a modicum of planning.

Oh, and I d/led BeautifulSoup, but I couldn't work it right, so I
tried re, and it suits my needs.

Thanks for all the help.

Regards,

Liam Clarke
On Thu, 09 Dec 2004 11:53:46 -0800, Jeff Shannon <[EMAIL PROTECTED]> wrote:

Liam Clarke wrote:

So, I'm going to throw caution to the wind, and try an re approach. It
can't be any more unwieldy and ugly than what I've got going at the
moment.


If you're going to try a new approach, I'd strongly suggest using a
proper html/xml parser instead of re's.  You'll almost certainly have
an easier time using a tool that's designed for your specific problem
domain than you will trying to force a more general tool to work.
Since you're specifically trying to find (and replace) certain html
tags and attributes, and that's exactly what html parsers *do*, well,
the conclusions seems obvious (to me at least). ;)

There are lots of html parsing tools available in Python (though I've
never needed one myself). I've heard lots of good things about
BeautifulSoup...

Jeff Shannon
Technician/Programmer
Credit International

_______________________________________________
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

_______________________________________________
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] MemoryError

Reply via email to