Hi All,

any  idea ?



mehdi




> From: [email protected]
> To: [email protected]
> Subject: parse-html plugin
> Date: Thu, 27 Jan 2011 18:58:36 +0000
> 
> 
> hi,
> In the class HtmlParser I changed the 'text' variable to index only a part of 
> my html page, and since i did lost lot off outlinks !
> 
> ...
>  utils.getText(sb,extractIndexableContent(root));  //added on 26-01-2011 to 
> extract only text inside <col_centre>
>   // utils.getText(sb, root);          // extract text   --- disabled on 
> 26-01-2011-
> 
>       text = sb.toString();
> ...
> 
> i beleived that outlinks are not obtained from the text variable ?!  in the 
> same class we could see how outlinks are extracted !
> 
> 
> ArrayList<Outlink> l = new ArrayList<Outlink>();   // extract outlinks
>       URL baseTag = utils.getBase(root);
>       if (LOG.isTraceEnabled()) { LOG.trace("Getting links..."); }
>       utils.getOutlinks(baseTag!=null?baseTag:base, l, root);
>       outlinks = l.toArray(new Outlink[l.size()]);
> 
> 
> 
> can you plz tell me what i did wrong.
> 
> 
> mehdi
> 
> 
>                                         
                                          

Reply via email to