[ 
https://issues.apache.org/jira/browse/CONNECTORS-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1681.
-----------------------------------------
    Fix Version/s: ManifoldCF 2.21
       Resolution: Fixed

r1895299

> TikaServiceRmeta: recordActivity can cause Database exception
> -------------------------------------------------------------
>
>                 Key: CONNECTORS-1681
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1681
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Tika service connector
>    Affects Versions: ManifoldCF 2.20
>            Reporter: Julien Massiera
>            Assignee: Julien Massiera
>            Priority: Major
>             Fix For: ManifoldCF 2.21
>
>
> Some files containing non ASCII characters can cause Tika to trigger an 
> exception describing the parsing problem. 
> As the TikaServiceRmeta connector creates an activity record for any Tika 
> exception containing its description (and so that contains the non ASCII char 
> in those cases), it causes an SQL exception when MCF tries to insert the 
> activity record in Postgres:
> {code:java}
> ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - 
> MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and 
> restarting due to database connection reset: Database exception: SQLException 
> doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database 
> exception: SQLException doing query (22021): ERROR: invalid byte sequence for 
> encoding "UTF8": 0x00 {code}
> So to avoid this, we need to remove any non ASCII chars from the exception 
> description before recording the activity
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to