On Mon, 12 Sep 2011 10:17:44 +0100
James Courtier-Dutton wrote:
> Hi.
>
> I have a large file that contains snips of http pages.
> Each line is like this:
> some junk.
>
> I want extract the "some url" bits. I.e. Remove the href.
> You can probably do this quite easily in perl.
> Are th
Just lurking and I saw this. A simple technique might be to insert a
new line before each href then use grep and cut. e.g. open it in vim
and do:
:%s/href=/^Mhref=/gc
:%s/HREF=/^Mhref=/gc
(where ^M is ctrl+v followed by the return key)
Then
grep href filename.html|cut -d '"' -f 2
and option
> You can probably do this quite easily in perl.
You can.
> Are there any nice short programs to do this?
Something like this?
#! /usr/bin/perl
my $fname = $ARGV[0];
die "need a filename" unless defined ($fname);
open INFILE, "<$fname" or die "Can't open $fname for reading";
while ()
{
On 12 September 2011 10:54, James Courtier-Dutton
wrote:
>> lynx -dump --hiddenlinks=ignore foo.html
>>
>> Will dump it to stdout in plain text form with URLs removed.
>>
>
> Sorry, I was not very clear.
> I wish to keep the "some url" bits, and get rid of all the "some junk" bits.
> I.e. I wish t
On Mon, Sep 12 at 10:17, James Courtier-Dutton wrote:
> Hi.
>
> I have a large file that contains snips of http pages.
> Each line is like this:
> some junk.
>
> I want extract the "some url" bits. I.e. Remove the href.
> You can probably do this quite easily in perl.
> Are there any nice
On 12 September 2011 10:37, Alan Pope wrote:
> On 12 September 2011 10:17, James Courtier-Dutton
> wrote:
>> I want extract the "some url" bits. I.e. Remove the href.
>> You can probably do this quite easily in perl.
>> Are there any nice short programs to do this?
>> Is it easier to do in some o
On 12 September 2011 10:17, James Courtier-Dutton
wrote:
> I want extract the "some url" bits. I.e. Remove the href.
> You can probably do this quite easily in perl.
> Are there any nice short programs to do this?
> Is it easier to do in some other language?
>
lynx -dump --hiddenlinks=ignore foo.
Hi,
I forgot to mention, my starting document is not a valid http document
so probably will not load into a web browser.
Which what you have said still work?
I need this to be run as a cron job, so use of a web browser is
probably not the best solution.
On 12 September 2011 10:21, Benjie Gillam
Or, alternatively, open it into a decent web browser and type this into the
JavaScript console:
var as = document.getElementsByTagName('a'); var hrefs=[]; for (var i = 0, l =
as.length; i Hi.
>
> I have a large file that contains snips of http pages.
> Each line is like this:
> some junk...
Hi.
I have a large file that contains snips of http pages.
Each line is like this:
some junk.
I want extract the "some url" bits. I.e. Remove the href.
You can probably do this quite easily in perl.
Are there any nice short programs to do this?
Is it easier to do in some other language?
10 matches
Mail list logo