On Friday 05 October 2001 5:59 pm, Frazier, Joe Jr wrote: > How do I transform HTML to text content? I KNOW I have seen a method to > do this, but dont remember what module it is in. I checked a few and > did not seem to find. It would also be nice if it had the ability to > maintain layout( ie.. if there is data in a table, create columns using > spritf or similar to maintain the look of columns in the output).
I've done it in the past with SGML::StripParser. This works apart from the DTD (which it leaves intact). # open HTML file for tag stripping open(HTML, $filepath) || warn{error("Cannot open $filepath: $!")}; # open temp file to take tagstripped HTML open(PLAIN, ">$outfile") || warn{error("Cannot open $outfile: $!")}; # strip tags my $rm_html = new SGML::StripParser; $rm_html->set_outhandle(\*PLAIN); # output file $rm_html->parse_data(\*HTML); # html input file _______________________________________________ Perl-Win32-Users mailing list [EMAIL PROTECTED] http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users