There is an option somewhere to use the full XML DOM implementation for using xpaths. The purpose of the XPathEP is to be as simple and dumb as possible and handle most cases: RSS feeds and other open standards.
Search for xsl(optional) http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 On Sat, Apr 9, 2011 at 5:32 AM, <karsten-s...@gmx.de> wrote: > Hi Folks, > > does anyone improve DIH XPathRecordReader to deal with nested xpaths? > e.g. > data-config.xml with > <entity .. processor="XPathEntityProcessor" .. > <field column="title" xpath="//body/h1"/> > <field column="alltext” xpath="//body" flatten="true"/> > and the XML stream contains > /html/body/h1... > will only fill field “alltext” but field “title” will be empty. > > This is a known issue from 2009 > https://issues.apache.org/jira/browse/SOLR-1437#commentauthor_12756469_verbose > > So three questions: > 1. How to fill a “search over all”-Field without nested xpaths? > (schema.xml <copyField source="*" dest="alltext"/> will not help, because > we lose the original token order) > 2. Does anyone try to improve XPathRecordReader to deal with nested xpaths? > 3. Does anyone else need this feature? > > > Best regards > Karsten > -- Lance Norskog goks...@gmail.com