[xml] Parse webpage HTML

Julius Hamilton via xml Fri, 17 Sep 2021 03:15:21 -0700

Hey,

I would like to write a script which extracts article text content from
webpage HTML. The webpages have similar structure because they are all
documentation pages from the same website, Microsoft Visual Basic for
Applications homepage.


I believe I should first inspect the HTML tree, i.e. the raw HTML returned
by wget, to figure out which nodes tend to have the text content I am
seeking. Should I do that in Firefox or Chrome, or is there a good
standalone tool for that?

Then, could I use this xml parsing library, or is there some other standard
one, for retrieving the text content at the nodes I have identified?

Thanks very much,
Julius

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
https://mail.gnome.org/mailman/listinfo/xml

[xml] Parse webpage HTML

Reply via email to