Sounds like maybe you have an invalid XML fragment with several elements next to each other without common parent.
I am starting to think that perhaps you are better off doing a hacky regular-expression at least to get you through your first iteration. Or a custom-coded pre-processor that will do some basic search and replace in a loop to get that type value anywhere on a more direct path of the XPath processor. Regards, Alex. ---- Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 23 October 2015 at 07:20, Routley, Alan <alan.rout...@bl.uk> wrote: > Thanks Alex for getting back to me. > > As per your suggestion I've gone down the xsl root. I've created a > transformation that works fine in various test tools, but Solr is throwing > errors such as: > > Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: > org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed > for xml, url:null rows processed:0 Processing Document # 1 > > ..... > > Caused by: java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: > Illegal to have multiple roots (start tag in epilog?). > > Looks like I need to dig a bit deeper > > Regards, > Alan. > > > -----Original Message----- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: 23 October 2015 12:00 > To: solr-user > Subject: Re: Select sibling data via XPathEntityProcessor > > If you are stuck with DIH, it looks like you can specify xsl attribute to the > XPathEntityProcessor and it will be used as a pre-procesor. > > I would probably use it to convert outer NamedAuthority tag into a > corresponding Author or Subject tag. Looks easiest. > > If you are not sure how to generate good XSL, have a look at something like > http://xmlstar.sourceforge.net/overview.php - it is sort of command line > processor but can also emit XSL to show you what it should look like. I wrote > about this tool many many moons ago at: > http://www.freesoftwaremagazine.com/articles/xml_starlet > > Regards, > Alex. > ---- > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 23 October 2015 at 02:25, Routley, Alan <alan.rout...@bl.uk> wrote: >> Hi Alex >> >> Thanks for the reply. >> >> I think I'm stuck with using the DIH as I'm initially using the >> SqlEntityProcessor to extract records from SQL server, indexing some the >> standard relational fields before handing the xml piece over to the >> XPathEntityProcessor. >> I'll look into adding an XSLT processor into the mix, but not used one >> before, so if you could possibly point me at an example that could get me >> started that would be a great help. >> >> Thanks >> >> Alan. >> >> -----Original Message----- >> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] >> Sent: 22 October 2015 15:43 >> To: solr-user >> Subject: Re: Select sibling data via XPathEntityProcessor >> >> I don't think DIH supports siblings. Have you thought of using XSLT >> processor before sending XML to Solr. Or using it instead of DIH during the >> update (not a well know part of Solr): >> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+I >> ndex+Handlers#UploadingDatawithIndexHandlers-UsingXSLTtoTransformXMLIn >> dexUpdates >> >> With XSLT, you could just confirm your format directly into Solr XML Update >> format and not bother with field mapping. >> >> Regards, >> Alex. >> ---- >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: >> http://www.solr-start.com/ >> >> >> On 22 October 2015 at 10:17, Routley, Alan <alan.rout...@bl.uk> wrote: >>> Hi, >>> >>> Given an xml structure: >>> >>> <Person> >>> <Relationships> >>> <NamedAuthority> <Type>Subject</Type> >>> <Id>032-001946363</Id> </NamedAuthority> >>> <NamedAuthority> <Type>Subject</Type> >>> <Id>037-001946370</Id> </NamedAuthority> >>> <NamedAuthority> <Type>Author</Type> >>> <Id>040-001959713</Id> </NamedAuthority> >>> <NamedAuthority> <Type>Author</Type> >>> <Id>040-001959829</Id> </NamedAuthority> >>> <NamedAuthority> <Type>Subject</Type> >>> <Id>032-001961797</Id> </NamedAuthority> >>> <NamedAuthority> <Type>Author</Type> >>> <Id>040-001961798</Id> </NamedAuthority> >>> </Relationships> >>> </Person> >>> >>> I’m trying to use the XPathEntityProcessor to put all the Subject Id’s into >>> one multiValued field and the Author Id’s into another. >>> >>> I was hoping I could use field’s with the following, but the XPath does not >>> seem to be supported. >>> >>> <field column="SubjectRelationships" xpath=" >>> /Person/Relationships/NamedAuthority >>> /Type[.='Subject']/following-sibling::Id” /> <field >>> column="AuthorRelationships" xpath=" >>> /Person/Relationships/NamedAuthority >>> /Type[.='Author']/following-sibling::Id” /> >>> >>> Could anyone suggest a way for me to achieve this. >>> >>> Many Thanks. >>> >>> >>> >>> >>> >>> >>> >>> ********************************************************************* >>> * >>> ******************************************** >>> Experience the British Library online at www.bl.uk<http://www.bl.uk/> >>> The British Library’s latest Annual Report and Accounts : >>> www.bl.uk/aboutus/annrep/index.html<http://www.bl.uk/aboutus/annrep/i >>> n dex.html> Help the British Library conserve the world's knowledge. >>> Adopt a Book. www.bl.uk/adoptabook<http://www.bl.uk/adoptabook> >>> The Library's St Pancras site is WiFi - enabled >>> ********************************************************************* >>> * >>> ******************************************* >>> The information contained in this e-mail is confidential and may be legally >>> privileged. It is intended for the addressee(s) only. If you are not the >>> intended recipient, please delete this e-mail and notify the >>> postmas...@bl.uk<mailto:postmas...@bl.uk> : The contents of this e-mail >>> must not be disclosed or copied without the sender's consent. >>> The statements and opinions expressed in this message are those of the >>> author and do not necessarily reflect those of the British Library. The >>> British Library does not take any responsibility for the views of the >>> author. >>> ********************************************************************* >>> * >>> ******************************************* >>> Think before you print