RE: HTML::TokeParser and tags split between lines

2005-09-06 Thread Thomas, Mark - BLS CTR
> I can't pass in the entire file. Some of the > files are 60 MB and larger. My machine freezes > and crashes if I do that. 60MB HTML files? How are those being created? Sounds like you may want to re-think the process. HTML::Parser can parse a document in chunks; you may be able to take advanta

Re: HTML::TokeParser and tags split between lines

2005-09-06 Thread Craig Cardimon
I am passing to the parser the results I get from using Text::Context::EitherSide. I can't pass in the entire file. Some of the files are 60 MB and larger. My machine freezes and crashes if I do that. I'll re-read the spec again and see what I come up with. -- Craig Thomas, Mark - BLS CTR wro

RE: HTML::TokeParser and tags split between lines

2005-09-06 Thread Thomas, Mark - BLS CTR
> I'm using HTML::TokeParser to remove HTML. This functions > very well when tags are contained on one line. I used to use TokeParser although now I prefer parsing HTML with XPath. TokeParser is a stream parser--there aren't any problem with newlines. > What happens when you're reading a file li