Re: [Tutor] again... regular expression

Kent Johnson Mon, 21 Nov 2005 10:19:48 -0800

lmac wrote:
> Ok. There is an error i made. The links in the HTML-Site are starting
> with good.php so there was no way ever to find an link.
> 
> re_site = re.compile(r"good\.php.+'")
> for a in file:
>       z = re_site.search(a)
>       if z != None:
>               print z.group(0)
> 
> 
> This will give me every line starting with "good.php" but does contain
> not the first ' at the end, there are more tags and text which ends with
> ' too. So how can i tell in an regex to stop at the first found ' after
> good.php ???


Use a non-greedy match. Normally + will match the longest possible string; if 
you put a ? after it, it will match the shortest string. So r"good\.php.+?'" 
will match just to the first '.

Kent

> 
> Thank you.
> 
> 
> 
>>Hallo.
>>I want to parse a website for links of this type:
>>
>>http://www.example.com/good.php?test=anything&egal=total&nochmal=nummer&so=site&seite=22";>
>>
>>---------------------------------------------------------------------
>>re_site = re.compile(r'http://\w+.\w+.\w+./good.php?.+";>')
>>for a in file:
>>      z = re_site.search(a)
>>      if z != None:
>>      print z.group(0)                        
>>
>>---------------------------------------------------------------------
>>
>>I still don't understand RE-Expressions. I tried some other expressions
>> but didn't get it work.
>>
>>The End of the link is ">. So it should not be a problem to extract the
>>link but it is.
>>
>>Thank you for the help.
>>
>>mac
>>
> 
> 
> _______________________________________________
> Tutor maillist  -  [email protected]
> http://mail.python.org/mailman/listinfo/tutor
> 
> 

-- 
http://www.kentsjohnson.com

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] again... regular expression

Reply via email to