Hi Venn,

I think what is happening when the BODY element is being processed by
xpath expressen (/document/category/BODY), is that it does not
retrieve the text content from the P elements inside the body element.
The expression will only retrieve text content that is directly a
child of the BODY element. I do not know the xpath function(s) the
data importhandler currently supports to return the text content of a
node and all its child nodes.

Maybe the expression  /document/category/BODY/* will work.

Cheers,

Martijn

2009/8/19 venn hardy <venn.ha...@hotmail.com>:
>
> Hello,
>
> I have just started trying out SOLR to index some XML documents that I 
> receive. I am
> using the SOLR 1.3 and its HttpDataSource in conjunction with the 
> XPathEntityProcessor.
>
>
>
> I am finding the data import really useful so far, but I am having a few 
> problems when
> I try and import HTML contained within one of the XML tags <BODY>. The data 
> import just seems
> to ignore the textContent silently but it imports everything else.
>
>
>
> When I do a query through the SOLR admin interface, only the id and author 
> fields are displayed.
>
> Any ideas what I am doing wrong?
>
>
>
> Thanks
>
>
>
> This is what my dataConfig looks like:
> <dataConfig>
>  <dataSource type="HttpDataSource" />
>  <document>
>  <entity name="archive" pk="id" 
> url="http://localhost:9080/data/20090817070752.xml"; 
> processor="XPathEntityProcessor" forEach="/document/category" 
> transformer="DateFormatTransformer" stream="true" dataSource="dataSource">
>         <field column="id" xpath="/document/category/reference" />
>  <field column="textContent" xpath="/document/category/BODY" />
>  <field column="author" xpath="/document/category/author" />
>  </entity>
>  </document>
> </dataConfig>
>
>
>
> This is how I have specified my schema
> <fields>
>   <field name="id" type="string" indexed="true" stored="true" required="true" 
> />
>   <field name="author" type="string" indexed="true" stored="true"/>
>   <field name="textContent" type="text" indexed="true" stored="true" />
> </fields>
>
>  <uniqueKey>id</uniqueKey>
>  <defaultSearchField>id</defaultSearchField>
>
>
>
> And this is what my XML document looks like:
>
> <document>
>  <category>
>  <reference>123456</reference>
>  <author>Authori name</author>
>  <BODY>
>  <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
>  Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius 
> varius felis ut vestibulum</P>
>  <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
>  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
> vestibulum</P>
>  <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
>  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
> vestibulum</P>
>  </BODY>
>  </category>
> </document>
>
> _________________________________________________________________
> Looking for a place to rent, share or buy this winter? Find your next place 
> with Ninemsn property
> http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline&_t=774152450&_r=Domain_tagline&_m=EXT

Reply via email to