Re: HTML Parsing problems...

2003-09-22 Thread Michael Giles
Yeah, I was using HTMLParser for a few days until I tried to parse a 400K document and it spun at 100% CPU for a very long time. It is tolerant of bad HTML, but does not appear to scale. TagSoup processed the same document in a second or less at <25% CPU. -Mike At 02:42 PM 9/22/2003 +0200, y

Re: HTML Parsing problems...

2003-09-22 Thread Andrzej Bialecki
Michael Giles wrote: Erik, Probably a good idea to swap something else in, although Neko introduces a dependency on Xerces. I didn't play with Neko because I am currently using a different XML parser and didn't want to deal with the conflicts (and also find dependencies on specific parsers ann

Re: HTML Parsing problems...

2003-09-20 Thread Michael Giles
Erik, Probably a good idea to swap something else in, although Neko introduces a dependency on Xerces. I didn't play with Neko because I am currently using a different XML parser and didn't want to deal with the conflicts (and also find dependencies on specific parsers annoying). However, yes

Re: HTML Parsing problems...

2003-09-19 Thread Erik Hatcher
I'm going to swap in the neko HTML parser for the demo refactorings I'm doing. I would be all for replacing the demo HTML parser with this. If you look at the Ant task in the sandbox, you'll see that I used JTidy for it and it works well, but I've heard that neko is faster and better so I'

Re: HTML Parsing problems...

2003-09-19 Thread Michael Giles
Tatu, Thanks for the reply. See below for comments. > just ignore everything inside of

Re: HTML Parsing problems...

2003-09-18 Thread Peter Becker
Tatu Saloranta wrote: On Thursday 18 September 2003 14:50, Michael Giles wrote: I know, I know, the HTML Parser in the demo is just that (i.e. a demo), but I also know that it is updated from time to time and performs much better than the other ones that I have tested. Frustratingly, the very

Re: HTML Parsing problems...

2003-09-18 Thread Tatu Saloranta
On Thursday 18 September 2003 14:50, Michael Giles wrote: > I know, I know, the HTML Parser in the demo is just that (i.e. a demo), but > I also know that it is updated from time to time and performs much better > than the other ones that I have tested. Frustratingly, the very first page > I tried

HTML Parsing problems...

2003-09-18 Thread Michael Giles
I know, I know, the HTML Parser in the demo is just that (i.e. a demo), but I also know that it is updated from time to time and performs much better than the other ones that I have tested. Frustratingly, the very first page I tried to parse failed (