Re: Need a spider library

2005-10-12 Thread Walter Dörwald
Laszlo Zsolt Nagy wrote: > [...] > For example this malformed link: > > http://samplesite.current_location/page.html','Samle link'] Your options AFAIK are: * Beautiful Soup (http://www.crummy.com/software/BeautifulSoup/) * Various implementations of tidy (uTidyLib, mxTidy) * XIST (http://www.liv

Re: Need a spider library

2005-10-12 Thread Laszlo Zsolt Nagy
Fredrik Lundh wrote: >Laszlo Zsolt Nagy wrote: > > > >>The question: is there a good library for Python for extraction links and >>images >>out of (possibly malformed) HTML soucre code? >> >> > >http://www.crummy.com/software/BeautifulSoup/ > > Thanks a lot! This is just what I wanted. W

Re: Need a spider library

2005-10-12 Thread Fredrik Lundh
Laszlo Zsolt Nagy wrote: > The question: is there a good library for Python for extraction links and > images > out of (possibly malformed) HTML soucre code? http://www.crummy.com/software/BeautifulSoup/ -- http://mail.python.org/mailman/listinfo/python-list

Need a spider library

2005-10-12 Thread Laszlo Zsolt Nagy
Hi All, I'm writting a spider program. I need to go to serveral URLs and extract information from the HTML source. Including links. I was using FancyURLOpener and my own function that extracts the links from a HTML page. The problem is that I always need to change it. This is because some sit