hi,
In the class HtmlParser I changed the 'text' variable to index only a part of
my html page, and since i did lost lot off outlinks !
...
utils.getText(sb,extractIndexableContent(root)); //added on 26-01-2011 to
extract only text inside <col_centre>
// utils.getText(sb, root); // extract text --- disabled on
26-01-2011-
text = sb.toString();
...
i beleived that outlinks are not obtained from the text variable ?! in the
same class we could see how outlinks are extracted !
ArrayList<Outlink> l = new ArrayList<Outlink>(); // extract outlinks
URL baseTag = utils.getBase(root);
if (LOG.isTraceEnabled()) { LOG.trace("Getting links..."); }
utils.getOutlinks(baseTag!=null?baseTag:base, l, root);
outlinks = l.toArray(new Outlink[l.size()]);
can you plz tell me what i did wrong.
mehdi