Liam Clarke wrote:
> Hi Kent,
> 
> I apologise for the not overly helpful initial post.
> 
> I had six possible uris to deal with -
> 
> /thread/28742/
> /thread/28742/?s=1291819247219837219837129
> /thread/28742/5/
> /thread/28742/5/?s=1291819247219837219837129
> /thread/28742/?goto=lastpost
> /thread/28742/?s=1291819247219837219837129&goto=lastpost
> 
> The only one I wanted to match was the first two.
> 
> My initial pattern /thread/[0-9]*?/(\?s\=.*)?(?!lastpost)$
> 
> matched the first two and the last in redemo.py (which I've got
> stashed as a py2exe bundle, should I ever find myself sans Python but
> having to use regexes).
> 
> I managed to sort it by using
> 
> /thread
> /[0-9]*?/
> (\?s\=\w*)?$
> 
> The s avoids the fourth possibility, and the \w precludes the & in the last 
> uri.

This seems like a good solution to me. The patterns you want to accept and 
reject are 
pretty similar so your regex has to be very specific to discriminate them.
> 
> But, circumventing the problem irks me no end, as I haven't fixed what
> I was doing wrong, which means I'll probably do it again, and avoiding
> problems instead of resolving them feels too much like programming for
> the Win32 api to me.
> (Where removing a service from the service database doesn't actually
> remove the service from the service database until you open and close
> a handle to the service database a second time...)
> 
> So yes, any advice on how to use negative lookaheads would be great. I
> get the feeling it was the .* before it.

I think you may misunderstand how * works. It will match as much as possible 
but it will 
backtrack and match less if that makes the whole match work.

For example the regex ab*d will match abbd with b* matching bb. If I change the 
regex to 
ab*(?!d) then it will still match abbd but the b* will just match one b and the 
d doesn't 
participate in the match.

So b* doesn't mean "match all the b's no matter what" it means "match as many 
b's as you 
can and still have the rest of the match succeed". In the case of ab*(?!d) this 
means, 
match an 'a', then a sequence of 'b', then something that is not 'd'. By 
shortening the 
match for b*, the 'not d' can match against the last 'b'.

> 
> As for my problem with BeautifulSoup, I'm not sure what was happening
> there. It was happening in interactive console only, and I can't
> replicate it today, which suggests to me that I've engaged email
> before brain again.
> 
> I do like BeautifulSoup, however. Although people keep telling about
> some XPath programme that's better, apparently, I like BeautifulSoup,
> it works.

Which XPath program is that? I haven't found one I really like.

Kent

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to