On Tue, January 14, 2014 12:45 pm, kfos...@tpg.com.au wrote: > There are man html reader libraries out there. I have used one for perl > for example. It enables you to look for some other tag to find your data > (eg > the css class name of that particular element) and rip the data by walking > the html tree. > > Pick a language and let us know I am sure you will get specific > recommendations on html reader / parser libraries. (eg html agility for > C#)
Ken, thanks. I think you might have anwsered my next question already: what I'm doing is like: wget url > html; lynx html > text initially, I was getting no text output, as html was html5 something; neither links nor lynx knew how to process html, blank output after installing latest dev build lynx, I started getting data in text; - some data in text gets 'trimmed' at 30 cols; editing html 'cols=30' and, re-dumping text fixes such fields; - some other data in text gets 'trimmed' at 20 cols; I can't find any 'cols=' statements in html, haven't found a way to output all data from these fields opening html in lynx, these fields are also 'trimmed' on screen, BUT, scrollable <> so, I guess the libraries you suggest would overcome such html5? issues? even it's somewhat outside of my abilities, I guess I could try Perl? (if I can find some sample code) thanks again V -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html