Hi All, any idea ?
mehdi > From: [email protected] > To: [email protected] > Subject: parse-html plugin > Date: Thu, 27 Jan 2011 18:58:36 +0000 > > > hi, > In the class HtmlParser I changed the 'text' variable to index only a part of > my html page, and since i did lost lot off outlinks ! > > ... > utils.getText(sb,extractIndexableContent(root)); //added on 26-01-2011 to > extract only text inside <col_centre> > // utils.getText(sb, root); // extract text --- disabled on > 26-01-2011- > > text = sb.toString(); > ... > > i beleived that outlinks are not obtained from the text variable ?! in the > same class we could see how outlinks are extracted ! > > > ArrayList<Outlink> l = new ArrayList<Outlink>(); // extract outlinks > URL baseTag = utils.getBase(root); > if (LOG.isTraceEnabled()) { LOG.trace("Getting links..."); } > utils.getOutlinks(baseTag!=null?baseTag:base, l, root); > outlinks = l.toArray(new Outlink[l.size()]); > > > > can you plz tell me what i did wrong. > > > mehdi > > >

