Yes, understanding the parser's internals is not very easy. Try adding log lines so you can understand it better. You can use the ParserChecker to test.
On Tuesday 01 February 2011 15:25:20 a a wrote: > hi, > > is my question so difficult ? > no one have an idea ? > > thx > > > mehdi > > > From: [email protected] > > To: [email protected] > > Subject: RE: parse-html plugin > > Date: Mon, 31 Jan 2011 16:05:22 +0000 > > > > > > Hi All, > > > > any idea ? > > > > > > > > mehdi > > > > > From: [email protected] > > > To: [email protected] > > > Subject: parse-html plugin > > > Date: Thu, 27 Jan 2011 18:58:36 +0000 > > > > > > > > > hi, > > > In the class HtmlParser I changed the 'text' variable to index only a > > > part of my html page, and since i did lost lot off outlinks ! > > > > > > ... > > > > > > utils.getText(sb,extractIndexableContent(root)); //added on > > > 26-01-2011 to extract only text inside <col_centre> > > > > > > // utils.getText(sb, root); // extract text --- disabled > > > on 26-01-2011- > > > > > > text = sb.toString(); > > > > > > ... > > > > > > i beleived that outlinks are not obtained from the text variable ?! in > > > the same class we could see how outlinks are extracted ! > > > > > > > > > ArrayList<Outlink> l = new ArrayList<Outlink>(); // extract outlinks > > > > > > URL baseTag = utils.getBase(root); > > > if (LOG.isTraceEnabled()) { LOG.trace("Getting links..."); } > > > utils.getOutlinks(baseTag!=null?baseTag:base, l, root); > > > outlinks = l.toArray(new Outlink[l.size()]); > > > > > > can you plz tell me what i did wrong. > > > > > > > > > mehdi -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

