Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Tolkin, Steve
person without authorization from Fidelity Investments. -Original Message- From: Chris Devers [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 02, 2007 1:53 PM To: Tolkin, Steve Cc: Boston Perl Mongers Subject: Re: [Boston.pm] Extract text from html preserving newlines On Wed, 2 May 2007,

Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Tolkin, Steve
thing. Steve -Original Message- From: Jerrad Pierce [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 02, 2007 1:45 PM To: Tolkin, Steve Cc: Boston Perl Mongers Subject: Re: [Boston.pm] Extract text from html preserving newlines lynx -dump -- Free map of local environmental resources: h

Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Jerrad Pierce
NTLM is bad, 'm-k? -- Free map of local environmental resources: http://CambridgeMA.GreenMap.org -- MOTD on Boomtime, the 49th of Discord, in the YOLD 3173: It is useless for sheep to pass resolutions in favor of vegetarianism while wolves remain of a different opinion.

Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Chris Devers
On Wed, 2 May 2007, Tolkin, Steve wrote: > Q1. Is there a way to automate IE or Mozilla Firefox to save 100's of > files as text? Probably, but might it be easier to automate using `lynx -dump` (or better still, `links -dump`) ? If those produce output the way you want them, automating it shoul

Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Jerrad Pierce
lynx -dump -- Free map of local environmental resources: http://CambridgeMA.GreenMap.org -- MOTD on Boomtime, the 49th of Discord, in the YOLD 3173: It is useless for sheep to pass resolutions in favor of vegetarianism while wolves remain of a different opinion.

[Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Tolkin, Steve
I want to extract the text from several hundred *.html files. Many html tags cause a newline to appear in the output, e.g. etc. In Internet Explorer if I do "Files Save As..." and change "Save as Type" to be "Text File (*.txt)" the output file preserves newlines (and other whitespace) in a reas