There was another error which I think it should be an indexing error.
The listprice below is a pdouble filed, the update process didn't ignore the 
error when it was sent wrong data.

Response: {
  "responseHeader":{
    "status":400,
    "QTime":133551},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","java.lang.NumberFormatException"],
    "msg":"ERROR: [doc=978194537913] Error adding field 
'listprice'='106Chapter' msg=For input string: \"106Chapter\"",
    "code":400}}


________________________________
From: Shawn Heisey <apa...@elyograg.org>
Sent: Tuesday, June 9, 2020 3:19 PM
To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning

On 6/9/2020 12:44 AM, Hup Chen wrote:
> Thanks for your reply, this is one of the example where it fail.  POST by 
> using  charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error 
> found in the title field,  I hope solr can simply skip this record and go 
> ahead to index the rest data.
>
> <add>
> <doc>
>   <field name="id">9780373773244</field>
>   <field name="isbn13">9780373773244</field>
> <field name="title">Missing: Innocent By Association^Zachary's Law (Hqn 
> Romance) </field>
>   <field name="author">Lisa_Jackson </field>
> </doc>
> </add>
>
> curl 
> "http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain&maxErrors=100";
>  -H 'Content-Type: text/xml; charset=utf-8' -d @data
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>
> <lst name="responseHeader">
>    <arr name="errors"/>
>    <int name="maxErrors">100</int>
>    <int name="status">400</int>
>    <int name="QTime">0</int>
> </lst>
> <lst name="error">
>    <lst name="metadata">
>      <str name="error-class">org.apache.solr.common.SolrException</str>
>      <str 
> name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException</str>
>    </lst>
>    <str name="msg">Illegal character ((CTRL-CHAR, code 26))
>   at [row,col {unknown-source}]: [1,225]</str>
>    <int name="code">400</int>
> </lst>
> </response>

I tried your example XML as it is shown in your original message, saved
to a file named "foo.xml", and didn't have any trouble.  I wasn't even
using the tolerant update processor.   I just fired up the techproducts
example on a solr-8.3.0 download I already had, added a field named
"isbn13" (string type) so the schema was compatible, and tried the
following command:

curl "http://localhost:8983/solr/techproducts/update"; -H 'Content-Type:
text/xml; charset=utf-8' -d @foo.xml

I then tried it again with the ^Z (which is two characters) replaced by
an actual Ctrl-Z character.  When I did that, I got exactly the same
error you did.

A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML,
which is why you're getting the error.

The tolerant update processor can't ignore errors in the actual format
of the input ... it only ignores errors during *indexing*.  This error
occurred during the input parsing, not during indexing, so the update
processor could not ignore it.

Thanks,
Shawn

Reply via email to