Re: Extending HTML Parser to create subpage index documents

2009-10-20 Thread malcolm smith
Thank you very much for the helpful reply, I'm back on track. On Tue, Oct 20, 2009 at 2:01 AM, Andrzej Bialecki wrote: > malcolm smith wrote: > >> I am looking to create a parser for a groupware product that would read >> pages message board type web site. (Think phpBB). But rather than >> cr

Re: Extending HTML Parser to create subpage index documents

2009-10-19 Thread Andrzej Bialecki
malcolm smith wrote: I am looking to create a parser for a groupware product that would read pages message board type web site. (Think phpBB). But rather than creating a single Content item which is parsed and indexed to a single lucene document, I am planning to have the parser create a master

Extending HTML Parser to create subpage index documents

2009-10-19 Thread malcolm smith
I am looking to create a parser for a groupware product that would read pages message board type web site. (Think phpBB). But rather than creating a single Content item which is parsed and indexed to a single lucene document, I am planning to have the parser create a master document (for the orig

Re: html parser

2006-03-30 Thread Rajesh Munavalli
Ooops...actually I meant to ask XHTML parser. Is it safe to use HTML parser to parse XHTML? On 3/30/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > > Rajesh Munavalli wrote: > > Does anyone know where I can get the source code for html parser which > is in >

Re: html parser

2006-03-30 Thread Andrzej Bialecki
Rajesh Munavalli wrote: Does anyone know where I can get the source code for html parser which is in the plugins directory? Which one? parse-html uses two parsers: one is called CyberNeko, the other is called TagSoup. You can find their home pages and their sources easily through Google

html parser

2006-03-30 Thread Rajesh Munavalli
Does anyone know where I can get the source code for html parser which is in the plugins directory?

Re: [Nutch-general] html parser + relative urls

2005-07-27 Thread Raymond Creel
t; Has any one experience a problem with the way the > > standard html parser plugin handles relative urls? > > > > There is a site where the home page is something > like > > > > http://www.x.com/x.cgi > > > > and when browsing a link wi

Re: [Nutch-general] html parser + relative urls

2005-07-27 Thread ogjunk-nutch
I think Nutch is behaving correctly. Maybe that page has a BASE URL (view source, look at the HEAD elements) that throws off one or the other. Otis --- Raymond Creel <[EMAIL PROTECTED]> wrote: > Has any one experience a problem with the way the > standard html parser plugin hand

html parser + relative urls

2005-07-27 Thread Raymond Creel
Has any one experience a problem with the way the standard html parser plugin handles relative urls? There is a site where the home page is something like http://www.x.com/x.cgi and when browsing a link with its href set to '?paramname=paramvalue' a browser will naturally t