Re: [UPHPU] Web site scraping

Alvaro Carrasco Thu, 25 Sep 2008 08:38:13 -0700

In my experience, the easiest way is: run website through tidy, load it
into a DOMDocument, and use xpath.


The xpath patterns are SO much easier to read and write than regex and
more resistant to changes to the website (if you write them correctly).
You can also use regex within xpath if you ever need it.

Alvaro

Nathan Lane wrote:
> I want to make what in effect is a website scraper using PHP, but it isn't
> obvious how this would best be done. I've tried using DOMDocument and I'm
> not sure if that's the best option or not. I'd really like to use something
> where I could use XPath to get the elements out that I want. Recently I
> wrote a similar program in C# that I call HttpAnalyzer. Could I just use
> that with PHP (i.e. call it from PHP) to get what I'm looking for? Any
> suggestions?
> 


_______________________________________________

UPHPU mailing list
[email protected]
http://uphpu.org/mailman/listinfo/uphpu
IRC: #uphpu on irc.freenode.net

Re: [UPHPU] Web site scraping

Reply via email to