Sounds like maybe you have an invalid XML fragment with several
elements next to each other without common parent.

I am starting to think that perhaps you are better off doing a hacky
regular-expression at least to get you through your first iteration.

Or a custom-coded pre-processor that will do some basic search and
replace in a loop to get that type value anywhere on a more direct
path of the XPath processor.

Regards,
   Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 23 October 2015 at 07:20, Routley, Alan <alan.rout...@bl.uk> wrote:
> Thanks Alex for getting back to me.
>
> As per your suggestion I've gone down the xsl root. I've created a 
> transformation that works fine in various test tools, but Solr is throwing 
> errors such as:
>
> Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed 
> for xml, url:null rows processed:0 Processing Document # 1
>
> .....
>
> Caused by: java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: 
> Illegal to have multiple roots (start tag in epilog?).
>
> Looks like I need to dig a bit deeper
>
> Regards,
> Alan.
>
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: 23 October 2015 12:00
> To: solr-user
> Subject: Re: Select sibling data via XPathEntityProcessor
>
> If you are stuck with DIH, it looks like you can specify xsl attribute to the 
> XPathEntityProcessor and it will be used as a pre-procesor.
>
> I would probably use it to convert outer NamedAuthority tag into a 
> corresponding Author or Subject tag. Looks easiest.
>
> If you are not sure how to generate good XSL, have a look at something like 
> http://xmlstar.sourceforge.net/overview.php - it is sort of command line 
> processor but can also emit XSL to show you what it should look like. I wrote 
> about this tool many many moons ago at:
> http://www.freesoftwaremagazine.com/articles/xml_starlet
>
> Regards,
>    Alex.
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 23 October 2015 at 02:25, Routley, Alan <alan.rout...@bl.uk> wrote:
>> Hi Alex
>>
>> Thanks for the reply.
>>
>> I think I'm stuck with using the DIH as I'm initially using the 
>> SqlEntityProcessor to extract records from SQL server, indexing some the 
>> standard relational fields before handing the xml piece over to the 
>> XPathEntityProcessor.
>> I'll look into adding an XSLT processor into the mix, but not used one 
>> before, so if you could possibly point me at an example that could get me 
>> started that would be a great help.
>>
>> Thanks
>>
>> Alan.
>>
>> -----Original Message-----
>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>> Sent: 22 October 2015 15:43
>> To: solr-user
>> Subject: Re: Select sibling data via XPathEntityProcessor
>>
>> I don't think DIH supports siblings. Have you thought of using XSLT 
>> processor before sending XML to Solr. Or using it instead of DIH during the 
>> update (not a well know part of Solr):
>> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+I
>> ndex+Handlers#UploadingDatawithIndexHandlers-UsingXSLTtoTransformXMLIn
>> dexUpdates
>>
>> With XSLT, you could just confirm your format directly into Solr XML Update 
>> format and not bother with field mapping.
>>
>> Regards,
>>    Alex.
>> ----
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 22 October 2015 at 10:17, Routley, Alan <alan.rout...@bl.uk> wrote:
>>> Hi,
>>>
>>> Given an xml structure:
>>>
>>> <Person>
>>>                 <Relationships>
>>>                                 <NamedAuthority> <Type>Subject</Type>
>>> <Id>032-001946363</Id> </NamedAuthority>
>>>                                 <NamedAuthority> <Type>Subject</Type>
>>> <Id>037-001946370</Id> </NamedAuthority>
>>>                                 <NamedAuthority> <Type>Author</Type>
>>> <Id>040-001959713</Id> </NamedAuthority>
>>>                                 <NamedAuthority> <Type>Author</Type>
>>> <Id>040-001959829</Id> </NamedAuthority>
>>>                                 <NamedAuthority> <Type>Subject</Type>
>>> <Id>032-001961797</Id> </NamedAuthority>
>>>                                 <NamedAuthority> <Type>Author</Type>
>>> <Id>040-001961798</Id> </NamedAuthority>
>>>                 </Relationships>
>>> </Person>
>>>
>>> I’m trying to use the XPathEntityProcessor to put all the Subject Id’s into 
>>> one multiValued field and the Author Id’s into another.
>>>
>>> I was hoping I could use field’s with the following, but the XPath does not 
>>> seem to be supported.
>>>
>>> <field column="SubjectRelationships" xpath="
>>> /Person/Relationships/NamedAuthority
>>> /Type[.='Subject']/following-sibling::Id” /> <field
>>> column="AuthorRelationships" xpath="
>>> /Person/Relationships/NamedAuthority
>>> /Type[.='Author']/following-sibling::Id” />
>>>
>>> Could anyone suggest a way for me to achieve this.
>>>
>>> Many Thanks.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *********************************************************************
>>> *
>>> ********************************************
>>> Experience the British Library online at www.bl.uk<http://www.bl.uk/>
>>> The British Library’s latest Annual Report and Accounts :
>>> www.bl.uk/aboutus/annrep/index.html<http://www.bl.uk/aboutus/annrep/i
>>> n dex.html> Help the British Library conserve the world's knowledge.
>>> Adopt a Book. www.bl.uk/adoptabook<http://www.bl.uk/adoptabook>
>>> The Library's St Pancras site is WiFi - enabled
>>> *********************************************************************
>>> *
>>> *******************************************
>>> The information contained in this e-mail is confidential and may be legally 
>>> privileged. It is intended for the addressee(s) only. If you are not the 
>>> intended recipient, please delete this e-mail and notify the 
>>> postmas...@bl.uk<mailto:postmas...@bl.uk> : The contents of this e-mail 
>>> must not be disclosed or copied without the sender's consent.
>>> The statements and opinions expressed in this message are those of the 
>>> author and do not necessarily reflect those of the British Library. The 
>>> British Library does not take any responsibility for the views of the 
>>> author.
>>> *********************************************************************
>>> *
>>> *******************************************
>>> Think before you print

Reply via email to