Hi Venn, I think what is happening when the BODY element is being processed by xpath expressen (/document/category/BODY), is that it does not retrieve the text content from the P elements inside the body element. The expression will only retrieve text content that is directly a child of the BODY element. I do not know the xpath function(s) the data importhandler currently supports to return the text content of a node and all its child nodes.
Maybe the expression /document/category/BODY/* will work. Cheers, Martijn 2009/8/19 venn hardy <venn.ha...@hotmail.com>: > > Hello, > > I have just started trying out SOLR to index some XML documents that I > receive. I am > using the SOLR 1.3 and its HttpDataSource in conjunction with the > XPathEntityProcessor. > > > > I am finding the data import really useful so far, but I am having a few > problems when > I try and import HTML contained within one of the XML tags <BODY>. The data > import just seems > to ignore the textContent silently but it imports everything else. > > > > When I do a query through the SOLR admin interface, only the id and author > fields are displayed. > > Any ideas what I am doing wrong? > > > > Thanks > > > > This is what my dataConfig looks like: > <dataConfig> > <dataSource type="HttpDataSource" /> > <document> > <entity name="archive" pk="id" > url="http://localhost:9080/data/20090817070752.xml" > processor="XPathEntityProcessor" forEach="/document/category" > transformer="DateFormatTransformer" stream="true" dataSource="dataSource"> > <field column="id" xpath="/document/category/reference" /> > <field column="textContent" xpath="/document/category/BODY" /> > <field column="author" xpath="/document/category/author" /> > </entity> > </document> > </dataConfig> > > > > This is how I have specified my schema > <fields> > <field name="id" type="string" indexed="true" stored="true" required="true" > /> > <field name="author" type="string" indexed="true" stored="true"/> > <field name="textContent" type="text" indexed="true" stored="true" /> > </fields> > > <uniqueKey>id</uniqueKey> > <defaultSearchField>id</defaultSearchField> > > > > And this is what my XML document looks like: > > <document> > <category> > <reference>123456</reference> > <author>Authori name</author> > <BODY> > <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit. > Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius > varius felis ut vestibulum</P> > <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, > lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut > vestibulum</P> > <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit, > lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut > vestibulum</P> > </BODY> > </category> > </document> > > _________________________________________________________________ > Looking for a place to rent, share or buy this winter? Find your next place > with Ninemsn property > http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline&_t=774152450&_r=Domain_tagline&_m=EXT