Hi Fergus,
XPathEntityprocessor can read multivalued fields easily

eg
<dataConfig>
   <dataSource type="FileDataSource" encoding="UTF-8" />
   <document>
     <entity name ="f" processor="FileListEntityProcessor"
             baseDir="***"
             fileName=".*xml"
             rootEntity="false"
             dataSource="null" >
        <entity
          name="record"
          processor="XPathEntityProcessor"
          forEach="/record"
          url="${f.fileAbsolutePath}">
                <field column="ID" xpath="/record/@id"
commonField="true"/> ***change**
                <field column="address_street"
xpath="/record/address/@street" />
                         <field column="address_state"
xpath="/record/address/@state" />
                         <field column="address_type"
xpath="/record/address/@type" />

           </entity>
     </entity>
   </document>
</dataConfig>


In this case all address_street,address_state,address_type will be
returned as separate lists while parsing. If you wish to put them into
multple fields you can write a transformer and iterate thru the lists
and put them into separate fields. If there are 3 <address> tags then
you get a List<String> for each fields where the length of the
list==3. If an item is missing it will be added as a null.

ensure that the fields are marked as multiValued="true" in the
schema.xml. Otherwise it does not return List<String>  . If there is
no corresponding mapping in schema.xml you can explicitly put it here
in the dataconfig.xml
eg: <field column="address_state"   multiValued="true"
xpath="/record/address/@state" />


I saw the syntax '/record/address//@state'. '//' is not supported .
You will have to explicitly give the full path.
--Noble



On Sat, Jan 24, 2009 at 2:57 PM, Noble Paul നോബിള്‍  नोब्ळ्
<noble.p...@gmail.com> wrote:
> nesting of an XPathEntityProcessor into another XPathEntityProcessor
> is possible only if a field in an xml is a filename/url .
> what is the purpose of nesting like this?
> is it because you have multiple addresses? the possible solutions are
> discussed elsewhere in this thread
>
> On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie <fer...@twig.me.uk> wrote:
>> Hello,
>>
>> I am also a newbie and was wanting to do almost the exact same thing.
>> I was planning on doing the equivalent of:-
>>
>> <dataConfig>
>>    <dataSource type="FileDataSource" encoding="UTF-8" />
>>    <document>
>>      <entity name ="f" processor="FileListEntityProcessor"
>>              baseDir="***"
>>              fileName=".*xml"
>>              rootEntity="false"
>>              dataSource="null" >
>>         <entity
>>           name="record"
>>           processor="XPathEntityProcessor"
>>           stream="false"
>>           rootEntity="false"            ***changed***
>>           forEach="/record"
>>           url="${f.fileAbsolutePath}">
>>                 <field column="ID" xpath="/record/@id" commonField="true"/> 
>> ***change**
>>                 <!-- Address  -->
>>                  <entity
>>                     name="record_adr"
>>                     processor="XPathEntityProcessor"
>>                     stream="false"
>>                     forEach="/record/address"
>>                     url="${f.fileAbsolutePath}">
>>                          <field column="address_street"  xpath="/
>> record/address/@street" />
>>                          <field column="address_state"   
>> xpath="/record/address//@state" />
>>                          <field column="address_type"    xpath="/
>> record/address//@type" />
>>                </entity>
>>            </entity>
>>      </entity>
>>    </document>
>> </dataConfig>
>>
>> ID is no longer unique within Solr, There would be multiple "documents"
>> with a given ID; one for each address. You can then search on ID and get
>> the three addresses, you can also search on an address more sensibly.
>>
>> I have not been able to try this yet as other issues are still to be
>> dealt with.
>>
>> Comments?????
>>
>>>Hi
>>>I may be completely off on this being new to SOLR but I am not sure
>>>how to index related groups of fields in a document and preserver
>>>their 'grouping'.   I  would appreciate any help on this.    Detailed
>>>description of the problem below.
>>>
>>>I am trying to index an entity that can have multiple occurrences in
>>>the same document - e.g. Address.  The address could be Shipping,
>>>Home, Office etc.   Each address element has multiple values in it
>>>like street, state etc.    Thus each address element is a group with
>>>the state and street in one address element being related to each other.
>>>
>>>It looks like this in my source xml
>>>
>>><record>
>>>    <coreInfo id="123" , .../>
>>>    <address street="XYZ1" State="CA" ...type="home" />
>>>    <address street="XYZ2" state="CA" ... type="Office"/>
>>>    <address street="XYZ3" state="CA" ....type="Other"/>
>>></record>
>>>
>>>I have setup my DIH to treat these as entities as below
>>>
>>><dataConfig>
>>>    <dataSource type="FileDataSource" encoding="UTF-8" />
>>>    <document>
>>>      <entity name ="f" processor="FileListEntityProcessor"
>>>              baseDir="***"
>>>              fileName=".*xml"
>>>              rootEntity="false"
>>>              dataSource="null" >
>>>         <entity
>>>            name="record"
>>>          processor="XPathEntityProcessor"
>>>          stream="false"
>>>          forEach="/record"
>>>            url="${f.fileAbsolutePath}">
>>>                 <field column="ID" xpath="/record/@id" />
>>>
>>>                 <!-- Address  -->
>>>                  <entity
>>>                      name="record_adr"
>>>                    processor="XPathEntityProcessor"
>>>                    stream="false"
>>>                    forEach="/record/address"
>>>                            url="${f.fileAbsolutePath}">
>>>                          <field column="address_street"  xpath="/
>>>record/address/@street" />
>>>                        <field column="address_state"   
>>> xpath="/record/address//@state" />
>>>                          <field column="address_type"    xpath="/
>>>record/address//@type" />
>>>               </entity>
>>>            </entity>
>>>      </entity>
>>>    </document>
>>></dataConfig>
>>>
>>>
>>>The problem is as follows.  DIH seems to treat these as entities but
>>>solr seems to flatten them out on indexing to fields in a document
>>>(losing the entity part).
>>>
>>>So when I search for the an ID - in the response all the street fields
>>>are bunched to-gather, followed by all the state fields type etc.
>>>Thus I can't associate which street address corresponds to which
>>>address type in the response.
>>>
>>>What seems harder is this - say I need to query on 'Street' = XYZ1 and
>>>type="Office".  This should NOT return a document since the street for
>>>the office address is "XY2" and not "XYZ1".  However when I query for
>>>address_state:"XYZ1" and address_type:"Office" I get back this document.
>>>
>>>The problem seems to be that while DIH allows 'entities' within a
>>>document  the SOLR schema does not preserve them - it 'flattens' all
>>>of them out as indices for the document.
>>>
>>>I could work around the problem by creating SOLR fields like
>>>"home_address_street" and "office_address_street" and do some xpath
>>>mapping.  However I don't want to do it as we can have multiple
>>>'other' addresses.  Also I have other fields whose type is not easily
>>>distinguished like address.
>>>
>>>As I mentioned being new to SOLR I might have completely goofed on a
>>>way to set it up - much appreciate any direction on it. I am using
>>>SOLR 1.3
>>>
>>>Regards,
>>>Guna
>>
>> --
>>
>> ===============================================================
>> Fergus McMenemie               Email:fer...@twig.me.uk
>> Techmore Ltd                   Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets             Analyst Programmer
>> ===============================================================
>>
>
>
>
> --
> --Noble Paul
>



-- 
--Noble Paul

Reply via email to