Re: parse-html plugin

Markus Jelsma Tue, 01 Feb 2011 09:28:02 -0800

Yes, understanding the parser's internals is not very easy. Try adding log 
lines so you can understand it better. You can use the ParserChecker to test.


On Tuesday 01 February 2011 15:25:20 a a wrote:
> hi,
> 
> is my question so difficult ?
> no one have an idea ?
> 
> thx
> 
> 
> mehdi
> 
> > From: [email protected]
> > To: [email protected]
> > Subject: RE: parse-html plugin
> > Date: Mon, 31 Jan 2011 16:05:22 +0000
> > 
> > 
> > Hi All,
> > 
> > any  idea ?
> > 
> > 
> > 
> > mehdi
> > 
> > > From: [email protected]
> > > To: [email protected]
> > > Subject: parse-html plugin
> > > Date: Thu, 27 Jan 2011 18:58:36 +0000
> > > 
> > > 
> > > hi,
> > > In the class HtmlParser I changed the 'text' variable to index only a
> > > part of my html page, and since i did lost lot off outlinks !
> > > 
> > > ...
> > > 
> > >  utils.getText(sb,extractIndexableContent(root));  //added on
> > >  26-01-2011 to extract only text inside <col_centre>
> > >  
> > >   // utils.getText(sb, root);          // extract text   --- disabled
> > >   on 26-01-2011-
> > >   
> > >       text = sb.toString();
> > > 
> > > ...
> > > 
> > > i beleived that outlinks are not obtained from the text variable ?!  in
> > > the same class we could see how outlinks are extracted !
> > > 
> > > 
> > > ArrayList<Outlink> l = new ArrayList<Outlink>();   // extract outlinks
> > > 
> > >       URL baseTag = utils.getBase(root);
> > >       if (LOG.isTraceEnabled()) { LOG.trace("Getting links..."); }
> > >       utils.getOutlinks(baseTag!=null?baseTag:base, l, root);
> > >       outlinks = l.toArray(new Outlink[l.size()]);
> > > 
> > > can you plz tell me what i did wrong.
> > > 
> > > 
> > > mehdi

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: parse-html plugin

Reply via email to