lmac wrote: > Ok. There is an error i made. The links in the HTML-Site are starting > with good.php so there was no way ever to find an link. > > re_site = re.compile(r"good\.php.+'") > for a in file: > z = re_site.search(a) > if z != None: > print z.group(0) > > > This will give me every line starting with "good.php" but does contain > not the first ' at the end, there are more tags and text which ends with > ' too. So how can i tell in an regex to stop at the first found ' after > good.php ???
Use a non-greedy match. Normally + will match the longest possible string; if you put a ? after it, it will match the shortest string. So r"good\.php.+?'" will match just to the first '. Kent > > Thank you. > > > >>Hallo. >>I want to parse a website for links of this type: >> >>http://www.example.com/good.php?test=anything&egal=total&nochmal=nummer&so=site&seite=22"> >> >>--------------------------------------------------------------------- >>re_site = re.compile(r'http://\w+.\w+.\w+./good.php?.+">') >>for a in file: >> z = re_site.search(a) >> if z != None: >> print z.group(0) >> >>--------------------------------------------------------------------- >> >>I still don't understand RE-Expressions. I tried some other expressions >> but didn't get it work. >> >>The End of the link is ">. So it should not be a problem to extract the >>link but it is. >> >>Thank you for the help. >> >>mac >> > > > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > > -- http://www.kentsjohnson.com _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor