hi,
In the class HtmlParser I changed the 'text' variable to index only a part of 
my html page, and since i did lost lot off outlinks !

...
 utils.getText(sb,extractIndexableContent(root));  //added on 26-01-2011 to 
extract only text inside <col_centre>
  // utils.getText(sb, root);          // extract text   --- disabled on 
26-01-2011-

      text = sb.toString();
...

i beleived that outlinks are not obtained from the text variable ?!  in the 
same class we could see how outlinks are extracted !


ArrayList<Outlink> l = new ArrayList<Outlink>();   // extract outlinks
      URL baseTag = utils.getBase(root);
      if (LOG.isTraceEnabled()) { LOG.trace("Getting links..."); }
      utils.getOutlinks(baseTag!=null?baseTag:base, l, root);
      outlinks = l.toArray(new Outlink[l.size()]);



can you plz tell me what i did wrong.


mehdi


                                          

Reply via email to