Hi
I may be completely off on this being new to SOLR but I am not sure how to index related groups of fields in a document and preserver their 'grouping'. I would appreciate any help on this. Detailed description of the problem below.

I am trying to index an entity that can have multiple occurrences in the same document - e.g. Address. The address could be Shipping, Home, Office etc. Each address element has multiple values in it like street, state etc. Thus each address element is a group with the state and street in one address element being related to each other.

It looks like this in my source xml

<record>
   <coreInfo id="123" , .../>
   <address street="XYZ1" State="CA" ...type="home" />
   <address street="XYZ2" state="CA" ... type="Office"/>
   <address street="XYZ3" state="CA" ....type="Other"/>
</record>

I have setup my DIH to treat these as entities as below

<dataConfig>
   <dataSource type="FileDataSource" encoding="UTF-8" />
   <document>
     <entity name ="f" processor="FileListEntityProcessor"
             baseDir="***"
             fileName=".*xml"
             rootEntity="false"
             dataSource="null" >
        <entity
           name="record"
           processor="XPathEntityProcessor"
           stream="false"
           forEach="/record"
           url="${f.fileAbsolutePath}">
                <field column="ID" xpath="/record/@id" />

                <!-- Address  -->
                 <entity
                     name="record_adr"
                     processor="XPathEntityProcessor"
                     stream="false"
                     forEach="/record/address"
                     url="${f.fileAbsolutePath}">
<field column="address_street" xpath="/ record/address/@street" />
                         <field column="address_state"   
xpath="/record/address//@state" />
<field column="address_type" xpath="/ record/address//@type" />
                </entity>
           </entity>
     </entity>
   </document>
</dataConfig>


The problem is as follows. DIH seems to treat these as entities but solr seems to flatten them out on indexing to fields in a document (losing the entity part).

So when I search for the an ID - in the response all the street fields are bunched to-gather, followed by all the state fields type etc. Thus I can't associate which street address corresponds to which address type in the response.

What seems harder is this - say I need to query on 'Street' = XYZ1 and type="Office". This should NOT return a document since the street for the office address is "XY2" and not "XYZ1". However when I query for address_state:"XYZ1" and address_type:"Office" I get back this document.

The problem seems to be that while DIH allows 'entities' within a document the SOLR schema does not preserve them - it 'flattens' all of them out as indices for the document.

I could work around the problem by creating SOLR fields like "home_address_street" and "office_address_street" and do some xpath mapping. However I don't want to do it as we can have multiple 'other' addresses. Also I have other fields whose type is not easily distinguished like address.

As I mentioned being new to SOLR I might have completely goofed on a way to set it up - much appreciate any direction on it. I am using SOLR 1.3

Regards,
Guna


Reply via email to