Re: DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1

2011-04-11 Thread Lance Norskog
over all”-Field without nested xpaths? >> >   (schema.xml   will not help, >> because we lose the original token order) >> > 2. Does anyone try to improve XPathRecordReader to deal with nested >> xpaths? >> > 3. Does anyone else need this feature? >> > >> > >> > Best regards >> >  Karsten >> > > > http://lucene.472066.n3.nabble.com/DIH-Enhance-XPathRecordReader-to-deal-with-body-FLATTEN-true-and-body-h1-td2799005.html > -- Lance Norskog goks...@gmail.com

Re: DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1

2011-04-11 Thread karsten-solr
gt; >   (schema.xml   will not help, > because we lose the original token order) > > 2. Does anyone try to improve XPathRecordReader to deal with nested > xpaths? > > 3. Does anyone else need this feature? > > > > > > Best regards > >  Karsten > > http://lucene.472066.n3.nabble.com/DIH-Enhance-XPathRecordReader-to-deal-with-body-FLATTEN-true-and-body-h1-td2799005.html

Re: DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1

2011-04-11 Thread karsten-solr
> > 2. Does anyone try to improve XPathRecordReader to deal with nested > xpaths? > > 3. Does anyone else need this feature? > > > > > > Best regards > >  Karsten > > http://lucene.472066.n3.nabble.com/DIH-Enhance-XPathRecordReader-to-deal-with-body-FLATTEN-true-and-body-h1-td2799005.html

Re: DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1

2011-04-10 Thread Lance Norskog
There is an option somewhere to use the full XML DOM implementation for using xpaths. The purpose of the XPathEP is to be as simple and dumb as possible and handle most cases: RSS feeds and other open standards. Search for xsl(optional) http://wiki.apache.org/solr/DataImportHandler#Configuration_

DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1

2011-04-09 Thread karsten-solr
Hi Folks, does anyone improve DIH XPathRecordReader to deal with nested xpaths? e.g. data-config.xml with and the XML stream contains /html/body/h1... will only fill field “alltext” but field “title” will be empty. This is a known issue from 2009 https://issues.apache.org/jira/browse/SOLR-