Hi
I may be completely off on this being new to SOLR but I am not sure
how to index related groups of fields in a document and preserver
their 'grouping'. I would appreciate any help on this. Detailed
description of the problem below.
I am trying to index an entity that can have multiple occurrences in
the same document - e.g. Address. The address could be Shipping,
Home, Office etc. Each address element has multiple values in it
like street, state etc. Thus each address element is a group with
the state and street in one address element being related to each other.
It looks like this in my source xml
<record>
<coreInfo id="123" , .../>
<address street="XYZ1" State="CA" ...type="home" />
<address street="XYZ2" state="CA" ... type="Office"/>
<address street="XYZ3" state="CA" ....type="Other"/>
</record>
I have setup my DIH to treat these as entities as below
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<entity name ="f" processor="FileListEntityProcessor"
baseDir="***"
fileName=".*xml"
rootEntity="false"
dataSource="null" >
<entity
name="record"
processor="XPathEntityProcessor"
stream="false"
forEach="/record"
url="${f.fileAbsolutePath}">
<field column="ID" xpath="/record/@id" />
<!-- Address -->
<entity
name="record_adr"
processor="XPathEntityProcessor"
stream="false"
forEach="/record/address"
url="${f.fileAbsolutePath}">
<field column="address_street" xpath="/
record/address/@street" />
<field column="address_state"
xpath="/record/address//@state" />
<field column="address_type" xpath="/
record/address//@type" />
</entity>
</entity>
</entity>
</document>
</dataConfig>
The problem is as follows. DIH seems to treat these as entities but
solr seems to flatten them out on indexing to fields in a document
(losing the entity part).
So when I search for the an ID - in the response all the street fields
are bunched to-gather, followed by all the state fields type etc.
Thus I can't associate which street address corresponds to which
address type in the response.
What seems harder is this - say I need to query on 'Street' = XYZ1 and
type="Office". This should NOT return a document since the street for
the office address is "XY2" and not "XYZ1". However when I query for
address_state:"XYZ1" and address_type:"Office" I get back this document.
The problem seems to be that while DIH allows 'entities' within a
document the SOLR schema does not preserve them - it 'flattens' all
of them out as indices for the document.
I could work around the problem by creating SOLR fields like
"home_address_street" and "office_address_street" and do some xpath
mapping. However I don't want to do it as we can have multiple
'other' addresses. Also I have other fields whose type is not easily
distinguished like address.
As I mentioned being new to SOLR I might have completely goofed on a
way to set it up - much appreciate any direction on it. I am using
SOLR 1.3
Regards,
Guna