Hi,
> the page is reasonably constant (until any fixes?)
Yeah, that's always a danger...
> all the values I want are in a single 6000 char line, how do I break the
> 6000 char line into individual vaules, 'grep any_value file' gives me the
> whole 6000 chars ?
Several ways, probably the easiest
On Tue, January 14, 2014 1:31 pm, JiÅÃ Baum wrote:
> Ah, skip the lynx step. Just work with the html directly.
> All the tools (sed, awk, grep) can work directly with html.
> To some extent it depends on how variable the original page is.
> Once you skip the lynx step, you might even find that
Hi,
> thanks. I think you might have anwsered my next question already:
> what I'm doing is like: wget url > html; lynx html > text
Ah, skip the lynx step. Just work with the html directly.
All the tools (sed, awk, grep) can work directly with html.
> even it's somewhat outside of my abilities
On Tue, January 14, 2014 12:45 pm, kfos...@tpg.com.au wrote:
> There are man html reader libraries out there. I have used one for perl
> for example. It enables you to look for some other tag to find your data
> (eg
> the css class name of that particular element) and rip the data by walking
>
There are man html reader libraries out there. I have used one for perl for
example. It enables you to look for some other tag to find your data (eg
the css class name of that particular element) and rip the data by walking
the html tree.
Pick a language and let us know I am sure you will
I have a shell script that gets a web page, after around half dozen
sed/awk one liners I end up with like[1]:
I would like to extract all the 7 digit numeric values, currently starting
with '313', to use them further in the script
I'm hoping there is some better way ? (rather what I'm doing,