Build failed in Jenkins: ManifoldCF » ManifoldCF-mvn #31

2021-11-24 Thread Apache Jenkins Server
See 


Changes:

[Julien Massiera] Fix CONNECTORS-1681


--
[...truncated 2.42 MB...]
[ERROR]   protected static void writeField(ModifiableSolrParams out, String 
fieldName, List fieldValues)
[ERROR] ^
[ERROR] 
:907:
 warning: no @param for out
[ERROR]   protected static void writeField(ModifiableSolrParams out, String 
fieldName, String fieldValue)
[ERROR] ^
[ERROR] 
:907:
 warning: no @param for fieldName
[ERROR]   protected static void writeField(ModifiableSolrParams out, String 
fieldName, String fieldValue)
[ERROR] ^
[ERROR] 
:907:
 warning: no @param for fieldValue
[ERROR]   protected static void writeField(ModifiableSolrParams out, String 
fieldName, String fieldValue)
[ERROR] ^
[ERROR] 
:913:
 warning: no @param for out
[ERROR]   protected void writeACLs(ModifiableSolrParams out, String aclType, 
String[] acl, String[] denyAcl)
[ERROR]  ^
[ERROR] 
:913:
 warning: no @param for aclType
[ERROR]   protected void writeACLs(ModifiableSolrParams out, String aclType, 
String[] acl, String[] denyAcl)
[ERROR]  ^
[ERROR] 
:913:
 warning: no @param for acl
[ERROR]   protected void writeACLs(ModifiableSolrParams out, String aclType, 
String[] acl, String[] denyAcl)
[ERROR]  ^
[ERROR] 
:913:
 warning: no @param for denyAcl
[ERROR]   protected void writeACLs(ModifiableSolrParams out, String aclType, 
String[] acl, String[] denyAcl)
[ERROR]  ^
[ERROR] 
:930:
 warning: no @param for inputDoc
[ERROR]   protected void writeACLsInSolrDoc( SolrInputDocument inputDoc, String 
aclType, String[] acl, String[] denyAcl )
[ERROR]  ^
[ERROR] 
:930:
 warning: no @param for aclType
[ERROR]   protected void writeACLsInSolrDoc( SolrInputDocument inputDoc, String 
aclType, String[] acl, String[] denyAcl )
[ERROR]  ^
[ERROR] 
:930:
 warning: no @param for acl
[ERROR]   protected void writeACLsInSolrDoc( SolrInputDocument inputDoc, String 
aclType, String[] acl, String[] denyAcl )
[ERROR]  ^
[ERROR] 
:930:
 warning: no @param for denyAcl
[ERROR]   protected void writeACLsInSolrDoc( SolrInputDocument inputDoc, String 
aclType, String[] acl, String[] denyAcl )
[ERROR]  ^
[ERROR] 
:1759:
 warning: no @param for inputField
[ERROR]   protected static String makeSafeLuceneField(String inputField)
[ERROR]   ^
[ERROR] 
:1759:
 warning: no @return
[ERROR]   protected static String makeSafeLuceneField(String inputField)
[ERROR]   ^
[ERROR] 

Build failed in Jenkins: ManifoldCF » ManifoldCF-Artifacts-Ant-JDK11 #28

2021-11-24 Thread Apache Jenkins Server
See 


Changes:


--
Started by an SCM change
Running as SYSTEM
[EnvInject] - Loading node environment variables.
Building remotely on H28 (ubuntu) in workspace 

Checking out a fresh workspace because there's no workspace at 

Cleaning local Directory .
Checking out https://svn.apache.org/repos/asf/manifoldcf/trunk at revision 
'2021-11-25T01:34:10.017 +'
ERROR: Failed to check out https://svn.apache.org/repos/asf/manifoldcf/trunk
org.tmatesoft.svn.core.SVNException: svn: E175002: timed out waiting for server
svn: E175002: OPTIONS request failed on '/repos/asf/manifoldcf/trunk'
at 
org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:112)
at 
org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:96)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:765)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:352)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:340)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.performHttpRequest(DAVConnection.java:910)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities(DAVConnection.java:702)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnection.java:113)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAVRepository.java:1047)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getLatestRevision(DAVRepository.java:169)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgRepositoryAccess.getRevisionNumber(SvnNgRepositoryAccess.java:119)
at 
org.tmatesoft.svn.core.internal.wc2.SvnRepositoryAccess.getLocations(SvnRepositoryAccess.java:180)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgRepositoryAccess.createRepositoryFor(SvnNgRepositoryAccess.java:43)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgAbstractUpdate.checkout(SvnNgAbstractUpdate.java:831)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgCheckout.run(SvnNgCheckout.java:26)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgCheckout.run(SvnNgCheckout.java:11)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgOperationRunner.run(SvnNgOperationRunner.java:20)
at 
org.tmatesoft.svn.core.internal.wc2.SvnOperationRunner.run(SvnOperationRunner.java:21)
at 
org.tmatesoft.svn.core.wc2.SvnOperationFactory.run(SvnOperationFactory.java:1239)
at org.tmatesoft.svn.core.wc2.SvnOperation.run(SvnOperation.java:294)
at 
hudson.scm.subversion.CheckoutUpdater$SubversionUpdateTask.perform(CheckoutUpdater.java:130)
at 
hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:168)
at 
hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:176)
at 
hudson.scm.subversion.UpdateUpdater$TaskImpl.perform(UpdateUpdater.java:132)
at 
hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:168)
at 
hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:1064)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:1040)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:1013)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3317)
at hudson.remoting.UserRequest.perform(UserRequest.java:211)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:376)
at 
hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:476)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:218)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:394)
at java.net.Socket.connect(Socket.java:606)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:287)
at 

Build failed in Jenkins: ManifoldCF » ManifoldCF-mvn-1x #32

2021-11-24 Thread Apache Jenkins Server
See 


Changes:


--
Started by an SCM change
Running as SYSTEM
[EnvInject] - Loading node environment variables.
Building remotely on H29 (ubuntu) in workspace 

Checking out a fresh workspace because there's no workspace at 

Cleaning local Directory .
Checking out https://svn.apache.org/repos/asf/manifoldcf/branches/dev_1x at 
revision '2021-11-25T01:11:10.890 +'
ERROR: Failed to check out 
https://svn.apache.org/repos/asf/manifoldcf/branches/dev_1x
org.tmatesoft.svn.core.SVNException: svn: E175002: timed out waiting for server
svn: E175002: OPTIONS request failed on '/repos/asf/manifoldcf/branches/dev_1x'
at 
org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:112)
at 
org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:96)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:765)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:352)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:340)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.performHttpRequest(DAVConnection.java:910)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities(DAVConnection.java:702)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnection.java:113)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAVRepository.java:1047)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getLatestRevision(DAVRepository.java:169)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgRepositoryAccess.getRevisionNumber(SvnNgRepositoryAccess.java:119)
at 
org.tmatesoft.svn.core.internal.wc2.SvnRepositoryAccess.getLocations(SvnRepositoryAccess.java:180)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgRepositoryAccess.createRepositoryFor(SvnNgRepositoryAccess.java:43)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgAbstractUpdate.checkout(SvnNgAbstractUpdate.java:831)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgCheckout.run(SvnNgCheckout.java:26)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgCheckout.run(SvnNgCheckout.java:11)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgOperationRunner.run(SvnNgOperationRunner.java:20)
at 
org.tmatesoft.svn.core.internal.wc2.SvnOperationRunner.run(SvnOperationRunner.java:21)
at 
org.tmatesoft.svn.core.wc2.SvnOperationFactory.run(SvnOperationFactory.java:1239)
at org.tmatesoft.svn.core.wc2.SvnOperation.run(SvnOperation.java:294)
at 
hudson.scm.subversion.CheckoutUpdater$SubversionUpdateTask.perform(CheckoutUpdater.java:130)
at 
hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:168)
at 
hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:176)
at 
hudson.scm.subversion.UpdateUpdater$TaskImpl.perform(UpdateUpdater.java:132)
at 
hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:168)
at 
hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:1064)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:1040)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:1013)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3317)
at hudson.remoting.UserRequest.perform(UserRequest.java:211)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:376)
at 
hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:476)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:218)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:394)
at java.net.Socket.connect(Socket.java:606)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:287)
at 

[jira] [Updated] (CONNECTORS-1681) TikaServiceRmeta: recordActivity can cause Database exception

2021-11-24 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera updated CONNECTORS-1681:

Description: 
Some files containing non UTF8 characters can cause Tika to trigger an 
exception describing the parsing problem. 
As the TikaServiceRmeta connector creates an activity record for any Tika 
exception containing its description (and so that contains the non UTF8 char in 
those cases), it causes an SQL exception when MCF tries to insert the activity 
record in the Database:
{code:java}
ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - 
MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and 
restarting due to database connection reset: Database exception: SQLException 
doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: 
SQLException doing query (22021): ERROR: invalid byte sequence for encoding 
"UTF8": 0x00 {code}
So to avoid this, we need to remove those problematic chars from the exception 
description before recording the activity

 

  was:
Some files containing non ASCII characters can cause Tika to trigger an 
exception describing the parsing problem. 
As the TikaServiceRmeta connector creates an activity record for any Tika 
exception containing its description (and so that contains the non ASCII char 
in those cases), it causes an SQL exception when MCF tries to insert the 
activity record in Postgres:
{code:java}
ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - 
MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and 
restarting due to database connection reset: Database exception: SQLException 
doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: 
SQLException doing query (22021): ERROR: invalid byte sequence for encoding 
"UTF8": 0x00 {code}
So to avoid this, we need to remove any non ASCII chars from the exception 
description before recording the activity

 


> TikaServiceRmeta: recordActivity can cause Database exception
> -
>
> Key: CONNECTORS-1681
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1681
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Tika service connector
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
>
> Some files containing non UTF8 characters can cause Tika to trigger an 
> exception describing the parsing problem. 
> As the TikaServiceRmeta connector creates an activity record for any Tika 
> exception containing its description (and so that contains the non UTF8 char 
> in those cases), it causes an SQL exception when MCF tries to insert the 
> activity record in the Database:
> {code:java}
> ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - 
> MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and 
> restarting due to database connection reset: Database exception: SQLException 
> doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database 
> exception: SQLException doing query (22021): ERROR: invalid byte sequence for 
> encoding "UTF8": 0x00 {code}
> So to avoid this, we need to remove those problematic chars from the 
> exception description before recording the activity
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (CONNECTORS-1681) TikaServiceRmeta: recordActivity can cause Database exception

2021-11-24 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448766#comment-17448766
 ] 

Julien Massiera commented on CONNECTORS-1681:
-

Indeed [~kwri...@metacarta.com], it is the description of my issue that is 
wrong. I decided to remove non ASCII chars and not just non UTF8 chars because 
the description of the error that the TikaServiceRmeta connector is writing as 
activity record is just there to be readable and give a global idea of what was 
wrong during the Tika processing phase. So I wanted to be sure that the 
activity record only contains "standard" chars even if we loose some of them, 
the accurate exception is still available in the log file. Are you ok with that 
? 

> TikaServiceRmeta: recordActivity can cause Database exception
> -
>
> Key: CONNECTORS-1681
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1681
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Tika service connector
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
>
> Some files containing non ASCII characters can cause Tika to trigger an 
> exception describing the parsing problem. 
> As the TikaServiceRmeta connector creates an activity record for any Tika 
> exception containing its description (and so that contains the non ASCII char 
> in those cases), it causes an SQL exception when MCF tries to insert the 
> activity record in Postgres:
> {code:java}
> ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - 
> MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and 
> restarting due to database connection reset: Database exception: SQLException 
> doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database 
> exception: SQLException doing query (22021): ERROR: invalid byte sequence for 
> encoding "UTF8": 0x00 {code}
> So to avoid this, we need to remove any non ASCII chars from the exception 
> description before recording the activity
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (CONNECTORS-1681) TikaServiceRmeta: recordActivity can cause Database exception

2021-11-24 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448749#comment-17448749
 ] 

Karl Wright commented on CONNECTORS-1681:
-

[~julienFL], the database record just needs to not include any non-UTF8 
strings.  You do not need to limit it to just ASCII.  If you read the 
description, you will note that the error message says as much: it says you 
don't have a valid UTF-8 sequence, and since the input is a Java string, it 
must contain codepoints that cannot be represented as UTF-8.


> TikaServiceRmeta: recordActivity can cause Database exception
> -
>
> Key: CONNECTORS-1681
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1681
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Tika service connector
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
>
> Some files containing non ASCII characters can cause Tika to trigger an 
> exception describing the parsing problem. 
> As the TikaServiceRmeta connector creates an activity record for any Tika 
> exception containing its description (and so that contains the non ASCII char 
> in those cases), it causes an SQL exception when MCF tries to insert the 
> activity record in Postgres:
> {code:java}
> ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - 
> MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and 
> restarting due to database connection reset: Database exception: SQLException 
> doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database 
> exception: SQLException doing query (22021): ERROR: invalid byte sequence for 
> encoding "UTF8": 0x00 {code}
> So to avoid this, we need to remove any non ASCII chars from the exception 
> description before recording the activity
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CONNECTORS-1681) TikaServiceRmeta: recordActivity can cause Database exception

2021-11-24 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1681.
-
Fix Version/s: ManifoldCF 2.21
   Resolution: Fixed

r1895299

> TikaServiceRmeta: recordActivity can cause Database exception
> -
>
> Key: CONNECTORS-1681
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1681
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Tika service connector
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
>
> Some files containing non ASCII characters can cause Tika to trigger an 
> exception describing the parsing problem. 
> As the TikaServiceRmeta connector creates an activity record for any Tika 
> exception containing its description (and so that contains the non ASCII char 
> in those cases), it causes an SQL exception when MCF tries to insert the 
> activity record in Postgres:
> {code:java}
> ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - 
> MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and 
> restarting due to database connection reset: Database exception: SQLException 
> doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database 
> exception: SQLException doing query (22021): ERROR: invalid byte sequence for 
> encoding "UTF8": 0x00 {code}
> So to avoid this, we need to remove any non ASCII chars from the exception 
> description before recording the activity
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CONNECTORS-1681) TikaServiceRmeta: recordActivity can cause Database exception

2021-11-24 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1681:
---

 Summary: TikaServiceRmeta: recordActivity can cause Database 
exception
 Key: CONNECTORS-1681
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1681
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tika service connector
Affects Versions: ManifoldCF 2.20
Reporter: Julien Massiera
Assignee: Julien Massiera


Some files containing non ASCII characters can cause Tika to trigger an 
exception describing the parsing problem. 
As the TikaServiceRmeta connector creates an activity record for any Tika 
exception containing its description (and so that contains the non ASCII char 
in those cases), it causes an SQL exception when MCF tries to insert the 
activity record in Postgres:
{code:java}
ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - 
MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and 
restarting due to database connection reset: Database exception: SQLException 
doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: 
SQLException doing query (22021): ERROR: invalid byte sequence for encoding 
"UTF8": 0x00 {code}
So to avoid this, we need to remove any non ASCII chars from the exception 
description before recording the activity

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)