hi,

is my question so difficult ?
no one have an idea ?

thx


mehdi




> From: [email protected]
> To: [email protected]
> Subject: RE: parse-html plugin
> Date: Mon, 31 Jan 2011 16:05:22 +0000
> 
> 
> Hi All,
> 
> any  idea ?
> 
> 
> 
> mehdi
> 
> 
> 
> 
> > From: [email protected]
> > To: [email protected]
> > Subject: parse-html plugin
> > Date: Thu, 27 Jan 2011 18:58:36 +0000
> > 
> > 
> > hi,
> > In the class HtmlParser I changed the 'text' variable to index only a part 
> > of my html page, and since i did lost lot off outlinks !
> > 
> > ...
> >  utils.getText(sb,extractIndexableContent(root));  //added on 26-01-2011 to 
> > extract only text inside <col_centre>
> >   // utils.getText(sb, root);          // extract text   --- disabled on 
> > 26-01-2011-
> > 
> >       text = sb.toString();
> > ...
> > 
> > i beleived that outlinks are not obtained from the text variable ?!  in the 
> > same class we could see how outlinks are extracted !
> > 
> > 
> > ArrayList<Outlink> l = new ArrayList<Outlink>();   // extract outlinks
> >       URL baseTag = utils.getBase(root);
> >       if (LOG.isTraceEnabled()) { LOG.trace("Getting links..."); }
> >       utils.getOutlinks(baseTag!=null?baseTag:base, l, root);
> >       outlinks = l.toArray(new Outlink[l.size()]);
> > 
> > 
> > 
> > can you plz tell me what i did wrong.
> > 
> > 
> > mehdi
> > 
> > 
> >                                       
>                                         
                                          

Reply via email to