Hi Fergus,
XPathEntityprocessor can read multivalued fields easily
eg
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<entity name ="f" processor="FileListEntityProcessor"
baseDir="***"
fileName=".*xml"
rootEntity="false"
dataSource="null" >
<entity
name="record"
processor="XPathEntityProcessor"
forEach="/record"
url="${f.fileAbsolutePath}">
<field column="ID" xpath="/record/@id"
commonField="true"/> ***change**
<field column="address_street"
xpath="/record/address/@street" />
<field column="address_state"
xpath="/record/address/@state" />
<field column="address_type"
xpath="/record/address/@type" />
</entity>
</entity>
</document>
</dataConfig>
In this case all address_street,address_state,address_type will be
returned as separate lists while parsing. If you wish to put them into
multple fields you can write a transformer and iterate thru the lists
and put them into separate fields. If there are 3 <address> tags then
you get a List<String> for each fields where the length of the
list==3. If an item is missing it will be added as a null.
ensure that the fields are marked as multiValued="true" in the
schema.xml. Otherwise it does not return List<String> . If there is
no corresponding mapping in schema.xml you can explicitly put it here
in the dataconfig.xml
eg: <field column="address_state" multiValued="true"
xpath="/record/address/@state" />
I saw the syntax '/record/address//@state'. '//' is not supported .
You will have to explicitly give the full path.
--Noble
On Sat, Jan 24, 2009 at 2:57 PM, Noble Paul നോബിള് नोब्ळ्
<[email protected]> wrote:
> nesting of an XPathEntityProcessor into another XPathEntityProcessor
> is possible only if a field in an xml is a filename/url .
> what is the purpose of nesting like this?
> is it because you have multiple addresses? the possible solutions are
> discussed elsewhere in this thread
>
> On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie <[email protected]> wrote:
>> Hello,
>>
>> I am also a newbie and was wanting to do almost the exact same thing.
>> I was planning on doing the equivalent of:-
>>
>> <dataConfig>
>> <dataSource type="FileDataSource" encoding="UTF-8" />
>> <document>
>> <entity name ="f" processor="FileListEntityProcessor"
>> baseDir="***"
>> fileName=".*xml"
>> rootEntity="false"
>> dataSource="null" >
>> <entity
>> name="record"
>> processor="XPathEntityProcessor"
>> stream="false"
>> rootEntity="false" ***changed***
>> forEach="/record"
>> url="${f.fileAbsolutePath}">
>> <field column="ID" xpath="/record/@id" commonField="true"/>
>> ***change**
>> <!-- Address -->
>> <entity
>> name="record_adr"
>> processor="XPathEntityProcessor"
>> stream="false"
>> forEach="/record/address"
>> url="${f.fileAbsolutePath}">
>> <field column="address_street" xpath="/
>> record/address/@street" />
>> <field column="address_state"
>> xpath="/record/address//@state" />
>> <field column="address_type" xpath="/
>> record/address//@type" />
>> </entity>
>> </entity>
>> </entity>
>> </document>
>> </dataConfig>
>>
>> ID is no longer unique within Solr, There would be multiple "documents"
>> with a given ID; one for each address. You can then search on ID and get
>> the three addresses, you can also search on an address more sensibly.
>>
>> I have not been able to try this yet as other issues are still to be
>> dealt with.
>>
>> Comments?????
>>
>>>Hi
>>>I may be completely off on this being new to SOLR but I am not sure
>>>how to index related groups of fields in a document and preserver
>>>their 'grouping'. I would appreciate any help on this. Detailed
>>>description of the problem below.
>>>
>>>I am trying to index an entity that can have multiple occurrences in
>>>the same document - e.g. Address. The address could be Shipping,
>>>Home, Office etc. Each address element has multiple values in it
>>>like street, state etc. Thus each address element is a group with
>>>the state and street in one address element being related to each other.
>>>
>>>It looks like this in my source xml
>>>
>>><record>
>>> <coreInfo id="123" , .../>
>>> <address street="XYZ1" State="CA" ...type="home" />
>>> <address street="XYZ2" state="CA" ... type="Office"/>
>>> <address street="XYZ3" state="CA" ....type="Other"/>
>>></record>
>>>
>>>I have setup my DIH to treat these as entities as below
>>>
>>><dataConfig>
>>> <dataSource type="FileDataSource" encoding="UTF-8" />
>>> <document>
>>> <entity name ="f" processor="FileListEntityProcessor"
>>> baseDir="***"
>>> fileName=".*xml"
>>> rootEntity="false"
>>> dataSource="null" >
>>> <entity
>>> name="record"
>>> processor="XPathEntityProcessor"
>>> stream="false"
>>> forEach="/record"
>>> url="${f.fileAbsolutePath}">
>>> <field column="ID" xpath="/record/@id" />
>>>
>>> <!-- Address -->
>>> <entity
>>> name="record_adr"
>>> processor="XPathEntityProcessor"
>>> stream="false"
>>> forEach="/record/address"
>>> url="${f.fileAbsolutePath}">
>>> <field column="address_street" xpath="/
>>>record/address/@street" />
>>> <field column="address_state"
>>> xpath="/record/address//@state" />
>>> <field column="address_type" xpath="/
>>>record/address//@type" />
>>> </entity>
>>> </entity>
>>> </entity>
>>> </document>
>>></dataConfig>
>>>
>>>
>>>The problem is as follows. DIH seems to treat these as entities but
>>>solr seems to flatten them out on indexing to fields in a document
>>>(losing the entity part).
>>>
>>>So when I search for the an ID - in the response all the street fields
>>>are bunched to-gather, followed by all the state fields type etc.
>>>Thus I can't associate which street address corresponds to which
>>>address type in the response.
>>>
>>>What seems harder is this - say I need to query on 'Street' = XYZ1 and
>>>type="Office". This should NOT return a document since the street for
>>>the office address is "XY2" and not "XYZ1". However when I query for
>>>address_state:"XYZ1" and address_type:"Office" I get back this document.
>>>
>>>The problem seems to be that while DIH allows 'entities' within a
>>>document the SOLR schema does not preserve them - it 'flattens' all
>>>of them out as indices for the document.
>>>
>>>I could work around the problem by creating SOLR fields like
>>>"home_address_street" and "office_address_street" and do some xpath
>>>mapping. However I don't want to do it as we can have multiple
>>>'other' addresses. Also I have other fields whose type is not easily
>>>distinguished like address.
>>>
>>>As I mentioned being new to SOLR I might have completely goofed on a
>>>way to set it up - much appreciate any direction on it. I am using
>>>SOLR 1.3
>>>
>>>Regards,
>>>Guna
>>
>> --
>>
>> ===============================================================
>> Fergus McMenemie Email:[email protected]
>> Techmore Ltd Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets Analyst Programmer
>> ===============================================================
>>
>
>
>
> --
> --Noble Paul
>
--
--Noble Paul