The XPath parser in the DIH is a limited implementation. The unit test
program is the only enumeration (that I can find) of what it handles:

http://svn.apache.org/repos/asf/lucene/solr/trunk/contrib/dataimporthandler/src/test/java/org/apache/solr/handler/dataimport/TestXPathRecordReader.java

//BODY in fact is not allowed, and should throw an Exception. Or at
least some kind of error message. Perhaps there is one in the logs?


On Wed, Mar 17, 2010 at 2:45 PM, Neil Chaudhuri
<nchaudh...@potomacfusion.com> wrote:
> Incidentally, I tried adding this:
>
> <datasource name="f" type="FieldReaderDataSource" />
> <document>
>        <entity dataSource="f" processor="XPathEntityProcessor" 
> dataField="d.text" forEach="/MESSAGE">
>                  <field column="body" xpath="//BODY"/>
>        </entity>
> </document>
>
> But this didn't seem to change anything.
>
> Any insight is appreciated.
>
> Thanks.
>
>
>
> From: Neil Chaudhuri
> Sent: Wednesday, March 17, 2010 3:24 PM
> To: solr-user@lucene.apache.org
> Subject: XPath Processing Applied to Clob
>
> I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
> and the text of a document. This is an Oracle database, and the document is 
> an XML document stored as Oracle's xmltype data type. Since this is nothing 
> more than a fancy CLOB, I am using the ClobTransformer to extract the actual 
> XML. However, I don't want to index/store all the XML but instead just the 
> XML within a set of tags. The XPath itself is trivial, but it seems like the 
> XPathEntityProcessor only works for XML file content rather than the output 
> of a Transformer.
>
> Here is what I currently have that fails:
>
>
> <document>
>
>        <entity name="doc" query="SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, 
> d.XML.getClobVal() AS TEXT FROM DOC d" transformer="ClobTransformer">
>
>            <field column="EFFECTIVE_DT" name="effectiveDate" />
>
>            <field column="ARCHIVE_ID" name="id" />
>
>            <field column="TEXT" name="text" clob="true">
>            <entity name="text" processor="XPathEntityProcessor" 
> forEach="/MESSAGE" url="${doc.text}">
>                <field column="body" xpath="//BODY"/>
>
>            </entity>
>
>        </entity>
>
> </document>
>
>
> Is there an easy way to do this without writing my own custom transformer?
>
> Thanks.
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to