Re: [Hampshire] extracting phrases from a file.

2011-09-14 Thread David Anderson
On Mon, 12 Sep 2011 10:17:44 +0100 James Courtier-Dutton wrote: > Hi. > > I have a large file that contains snips of http pages. > Each line is like this: > some junk. > > I want extract the "some url" bits. I.e. Remove the href. > You can probably do this quite easily in perl. > Are th

Re: [Hampshire] extracting phrases from a file.

2011-09-12 Thread Jeremy Hooks
Just lurking and I saw this. A simple technique might be to insert a new line before each href then use grep and cut. e.g. open it in vim and do: :%s/href=/^Mhref=/gc :%s/HREF=/^Mhref=/gc (where ^M is ctrl+v followed by the return key) Then grep href filename.html|cut -d '"' -f 2 and option

Re: [Hampshire] extracting phrases from a file.

2011-09-12 Thread Vic
> You can probably do this quite easily in perl. You can. > Are there any nice short programs to do this? Something like this? #! /usr/bin/perl my $fname = $ARGV[0]; die "need a filename" unless defined ($fname); open INFILE, "<$fname" or die "Can't open $fname for reading"; while () {

Re: [Hampshire] extracting phrases from a file.

2011-09-12 Thread Alan Pope
On 12 September 2011 10:54, James Courtier-Dutton wrote: >> lynx -dump --hiddenlinks=ignore foo.html >> >> Will dump it to stdout in plain text form with URLs removed. >> > > Sorry, I was not very clear. > I wish to keep the "some url" bits, and get rid of all the "some junk" bits. > I.e. I wish t

Re: [Hampshire] extracting phrases from a file.

2011-09-12 Thread Bob Dunlop
On Mon, Sep 12 at 10:17, James Courtier-Dutton wrote: > Hi. > > I have a large file that contains snips of http pages. > Each line is like this: > some junk. > > I want extract the "some url" bits. I.e. Remove the href. > You can probably do this quite easily in perl. > Are there any nice

Re: [Hampshire] extracting phrases from a file.

2011-09-12 Thread James Courtier-Dutton
On 12 September 2011 10:37, Alan Pope wrote: > On 12 September 2011 10:17, James Courtier-Dutton > wrote: >> I want extract the "some url" bits. I.e. Remove the href. >> You can probably do this quite easily in perl. >> Are there any nice short programs to do this? >> Is it easier to do in some o

Re: [Hampshire] extracting phrases from a file.

2011-09-12 Thread Alan Pope
On 12 September 2011 10:17, James Courtier-Dutton wrote: > I want extract the "some url" bits. I.e. Remove the href. > You can probably do this quite easily in perl. > Are there any nice short programs to do this? > Is it easier to do in some other language? > lynx -dump --hiddenlinks=ignore foo.

Re: [Hampshire] extracting phrases from a file.

2011-09-12 Thread James Courtier-Dutton
Hi, I forgot to mention, my starting document is not a valid http document so probably will not load into a web browser. Which what you have said still work? I need this to be run as a cron job, so use of a web browser is probably not the best solution. On 12 September 2011 10:21, Benjie Gillam

Re: [Hampshire] extracting phrases from a file.

2011-09-12 Thread Benjie Gillam
Or, alternatively, open it into a decent web browser and type this into the JavaScript console: var as = document.getElementsByTagName('a'); var hrefs=[]; for (var i = 0, l = as.length; i Hi. > > I have a large file that contains snips of http pages. > Each line is like this: > some junk...

[Hampshire] extracting phrases from a file.

2011-09-12 Thread James Courtier-Dutton
Hi. I have a large file that contains snips of http pages. Each line is like this: some junk. I want extract the "some url" bits. I.e. Remove the href. You can probably do this quite easily in perl. Are there any nice short programs to do this? Is it easier to do in some other language?