On Friday 05 October 2001 5:59 pm, Frazier, Joe Jr wrote:
> How do I transform HTML to text content? I KNOW I have seen a method to
> do this, but dont remember what module it is in.  I checked a few and
> did not seem to find.  It would also be nice if it had the ability to
> maintain layout( ie.. if there is data in a table, create columns using
> spritf or similar to maintain the look of columns in the output).

I've done it in the past with SGML::StripParser. This works apart from the 
DTD (which it leaves intact).

# open HTML file for tag stripping
open(HTML, $filepath) || warn{error("Cannot open $filepath: $!")};
# open temp file to take tagstripped HTML
open(PLAIN, ">$outfile") || warn{error("Cannot open $outfile: $!")};
    
# strip tags
my $rm_html = new SGML::StripParser;
$rm_html->set_outhandle(\*PLAIN); # output file 
$rm_html->parse_data(\*HTML); # html input file
_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users

Reply via email to