Hi Fergus, XPathEntityprocessor can read multivalued fields easily eg <dataConfig> <dataSource type="FileDataSource" encoding="UTF-8" /> <document> <entity name ="f" processor="FileListEntityProcessor" baseDir="***" fileName=".*xml" rootEntity="false" dataSource="null" > <entity name="record" processor="XPathEntityProcessor" forEach="/record" url="${f.fileAbsolutePath}"> <field column="ID" xpath="/record/@id" commonField="true"/> ***change** <field column="address_street" xpath="/record/address/@street" /> <field column="address_state" xpath="/record/address/@state" /> <field column="address_type" xpath="/record/address/@type" />
</entity> </entity> </document> </dataConfig> In this case all address_street,address_state,address_type will be returned as separate lists while parsing. If you wish to put them into multple fields you can write a transformer and iterate thru the lists and put them into separate fields. If there are 3 <address> tags then you get a List<String> for each fields where the length of the list==3. If an item is missing it will be added as a null. ensure that the fields are marked as multiValued="true" in the schema.xml. Otherwise it does not return List<String> . If there is no corresponding mapping in schema.xml you can explicitly put it here in the dataconfig.xml eg: <field column="address_state" multiValued="true" xpath="/record/address/@state" /> I saw the syntax '/record/address//@state'. '//' is not supported . You will have to explicitly give the full path. --Noble On Sat, Jan 24, 2009 at 2:57 PM, Noble Paul നോബിള് नोब्ळ् <noble.p...@gmail.com> wrote: > nesting of an XPathEntityProcessor into another XPathEntityProcessor > is possible only if a field in an xml is a filename/url . > what is the purpose of nesting like this? > is it because you have multiple addresses? the possible solutions are > discussed elsewhere in this thread > > On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie <fer...@twig.me.uk> wrote: >> Hello, >> >> I am also a newbie and was wanting to do almost the exact same thing. >> I was planning on doing the equivalent of:- >> >> <dataConfig> >> <dataSource type="FileDataSource" encoding="UTF-8" /> >> <document> >> <entity name ="f" processor="FileListEntityProcessor" >> baseDir="***" >> fileName=".*xml" >> rootEntity="false" >> dataSource="null" > >> <entity >> name="record" >> processor="XPathEntityProcessor" >> stream="false" >> rootEntity="false" ***changed*** >> forEach="/record" >> url="${f.fileAbsolutePath}"> >> <field column="ID" xpath="/record/@id" commonField="true"/> >> ***change** >> <!-- Address --> >> <entity >> name="record_adr" >> processor="XPathEntityProcessor" >> stream="false" >> forEach="/record/address" >> url="${f.fileAbsolutePath}"> >> <field column="address_street" xpath="/ >> record/address/@street" /> >> <field column="address_state" >> xpath="/record/address//@state" /> >> <field column="address_type" xpath="/ >> record/address//@type" /> >> </entity> >> </entity> >> </entity> >> </document> >> </dataConfig> >> >> ID is no longer unique within Solr, There would be multiple "documents" >> with a given ID; one for each address. You can then search on ID and get >> the three addresses, you can also search on an address more sensibly. >> >> I have not been able to try this yet as other issues are still to be >> dealt with. >> >> Comments????? >> >>>Hi >>>I may be completely off on this being new to SOLR but I am not sure >>>how to index related groups of fields in a document and preserver >>>their 'grouping'. I would appreciate any help on this. Detailed >>>description of the problem below. >>> >>>I am trying to index an entity that can have multiple occurrences in >>>the same document - e.g. Address. The address could be Shipping, >>>Home, Office etc. Each address element has multiple values in it >>>like street, state etc. Thus each address element is a group with >>>the state and street in one address element being related to each other. >>> >>>It looks like this in my source xml >>> >>><record> >>> <coreInfo id="123" , .../> >>> <address street="XYZ1" State="CA" ...type="home" /> >>> <address street="XYZ2" state="CA" ... type="Office"/> >>> <address street="XYZ3" state="CA" ....type="Other"/> >>></record> >>> >>>I have setup my DIH to treat these as entities as below >>> >>><dataConfig> >>> <dataSource type="FileDataSource" encoding="UTF-8" /> >>> <document> >>> <entity name ="f" processor="FileListEntityProcessor" >>> baseDir="***" >>> fileName=".*xml" >>> rootEntity="false" >>> dataSource="null" > >>> <entity >>> name="record" >>> processor="XPathEntityProcessor" >>> stream="false" >>> forEach="/record" >>> url="${f.fileAbsolutePath}"> >>> <field column="ID" xpath="/record/@id" /> >>> >>> <!-- Address --> >>> <entity >>> name="record_adr" >>> processor="XPathEntityProcessor" >>> stream="false" >>> forEach="/record/address" >>> url="${f.fileAbsolutePath}"> >>> <field column="address_street" xpath="/ >>>record/address/@street" /> >>> <field column="address_state" >>> xpath="/record/address//@state" /> >>> <field column="address_type" xpath="/ >>>record/address//@type" /> >>> </entity> >>> </entity> >>> </entity> >>> </document> >>></dataConfig> >>> >>> >>>The problem is as follows. DIH seems to treat these as entities but >>>solr seems to flatten them out on indexing to fields in a document >>>(losing the entity part). >>> >>>So when I search for the an ID - in the response all the street fields >>>are bunched to-gather, followed by all the state fields type etc. >>>Thus I can't associate which street address corresponds to which >>>address type in the response. >>> >>>What seems harder is this - say I need to query on 'Street' = XYZ1 and >>>type="Office". This should NOT return a document since the street for >>>the office address is "XY2" and not "XYZ1". However when I query for >>>address_state:"XYZ1" and address_type:"Office" I get back this document. >>> >>>The problem seems to be that while DIH allows 'entities' within a >>>document the SOLR schema does not preserve them - it 'flattens' all >>>of them out as indices for the document. >>> >>>I could work around the problem by creating SOLR fields like >>>"home_address_street" and "office_address_street" and do some xpath >>>mapping. However I don't want to do it as we can have multiple >>>'other' addresses. Also I have other fields whose type is not easily >>>distinguished like address. >>> >>>As I mentioned being new to SOLR I might have completely goofed on a >>>way to set it up - much appreciate any direction on it. I am using >>>SOLR 1.3 >>> >>>Regards, >>>Guna >> >> -- >> >> =============================================================== >> Fergus McMenemie Email:fer...@twig.me.uk >> Techmore Ltd Phone:(UK) 07721 376021 >> >> Unix/Mac/Intranets Analyst Programmer >> =============================================================== >> > > > > -- > --Noble Paul > -- --Noble Paul