[jira] [Commented] (CONNECTORS-1740) Solr 9 output connector

2023-10-09 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773278#comment-17773278
 ] 

Julien Massiera commented on CONNECTORS-1740:
-

Bonjour,
Merci pour votre email. Mr Massiera a quitte ses fonctions le 14 avril 2023 et 
cet email sera bientot desactive. Pour toute question sur France Labs ou sur 
des projets en cours, merci de contacter cedric.ulmer att francelabs.com


> Solr 9 output connector
> ---
>
> Key: CONNECTORS-1740
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1740
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Attachments: CONNECTORS-1740.patch
>
>
> The current Solr output connector is not compatible with Solr 9.x
> We need to update the connector with SolrJ 9 and make sure that the custom 
> code (multipart post requests, basic/preemptive auth) is still required, and, 
> in case it is, port it ! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1740) Solr 9 output connector

2023-10-09 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773268#comment-17773268
 ] 

Julien Massiera commented on CONNECTORS-1740:
-

Bonjour,
Merci pour votre email. Mr Massiera a quitte ses fonctions le 14 avril 2023 et 
cet email sera bientot desactive. Pour toute question sur France Labs ou sur 
des projets en cours, merci de contacter cedric.ulmer att francelabs.com


> Solr 9 output connector
> ---
>
> Key: CONNECTORS-1740
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1740
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Attachments: CONNECTORS-1740.patch
>
>
> The current Solr output connector is not compatible with Solr 9.x
> We need to update the connector with SolrJ 9 and make sure that the custom 
> code (multipart post requests, basic/preemptive auth) is still required, and, 
> in case it is, port it ! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (CONNECTORS-1740) Solr 9 output connector

2023-07-19 Thread Julien Massiera (Jira)


[ https://issues.apache.org/jira/browse/CONNECTORS-1740 ]


Julien Massiera deleted comment on CONNECTORS-1740:
-

was (Author: julienfl):
Bonjour,
Merci pour votre email. Mr Massiera a quitte ses fonctions le 14 avril 2023 et 
cet email sera bientot desactive. Pour toute question sur France Labs ou sur 
des projets en cours, merci de contacter cedric.ulmer att francelabs.com


> Solr 9 output connector
> ---
>
> Key: CONNECTORS-1740
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1740
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Attachments: CONNECTORS-1740.patch
>
>
> The current Solr output connector is not compatible with Solr 9.x
> We need to update the connector with SolrJ 9 and make sure that the custom 
> code (multipart post requests, basic/preemptive auth) is still required, and, 
> in case it is, port it ! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (CONNECTORS-1740) Solr 9 output connector

2023-07-19 Thread Julien Massiera (Jira)


[ https://issues.apache.org/jira/browse/CONNECTORS-1740 ]


Julien Massiera deleted comment on CONNECTORS-1740:
-

was (Author: julienfl):
Bonjour,
Merci pour votre email. Mr Massiera a quitte ses fonctions le 14 avril 2023 et 
cet email sera bientot desactive. Pour toute question sur France Labs ou sur 
des projets en cours, merci de contacter cedric.ulmer att francelabs.com


> Solr 9 output connector
> ---
>
> Key: CONNECTORS-1740
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1740
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Attachments: CONNECTORS-1740.patch
>
>
> The current Solr output connector is not compatible with Solr 9.x
> We need to update the connector with SolrJ 9 and make sure that the custom 
> code (multipart post requests, basic/preemptive auth) is still required, and, 
> in case it is, port it ! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (CONNECTORS-1740) Solr 9 output connector

2023-07-19 Thread Julien Massiera (Jira)


[ https://issues.apache.org/jira/browse/CONNECTORS-1740 ]


Julien Massiera deleted comment on CONNECTORS-1740:
-

was (Author: julienfl):
Bonjour,
Merci pour votre email. Mr Massiera a quitte ses fonctions le 14 avril 2023 et 
cet email sera bientot desactive. Pour toute question sur France Labs ou sur 
des projets en cours, merci de contacter cedric.ulmer att francelabs.com


> Solr 9 output connector
> ---
>
> Key: CONNECTORS-1740
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1740
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Attachments: CONNECTORS-1740.patch
>
>
> The current Solr output connector is not compatible with Solr 9.x
> We need to update the connector with SolrJ 9 and make sure that the custom 
> code (multipart post requests, basic/preemptive auth) is still required, and, 
> in case it is, port it ! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1740) Solr 9 output connector

2023-07-19 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744458#comment-17744458
 ] 

Julien Massiera commented on CONNECTORS-1740:
-

Bonjour,
Merci pour votre email. Mr Massiera a quitte ses fonctions le 14 avril 2023 et 
cet email sera bientot desactive. Pour toute question sur France Labs ou sur 
des projets en cours, merci de contacter cedric.ulmer att francelabs.com


> Solr 9 output connector
> ---
>
> Key: CONNECTORS-1740
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1740
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Attachments: CONNECTORS-1740.patch
>
>
> The current Solr output connector is not compatible with Solr 9.x
> We need to update the connector with SolrJ 9 and make sure that the custom 
> code (multipart post requests, basic/preemptive auth) is still required, and, 
> in case it is, port it ! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1740) Solr 9 output connector

2023-06-04 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729127#comment-17729127
 ] 

Julien Massiera commented on CONNECTORS-1740:
-

Bonjour,
Merci pour votre email. Mr Massiera a quitte ses fonctions le 14 avril 2023 et 
cet email sera bientot desactive. Pour toute question sur France Labs ou sur 
des projets en cours, merci de contacter cedric.ulmer att francelabs.com


> Solr 9 output connector
> ---
>
> Key: CONNECTORS-1740
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1740
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Attachments: CONNECTORS-1740.patch
>
>
> The current Solr output connector is not compatible with Solr 9.x
> We need to update the connector with SolrJ 9 and make sure that the custom 
> code (multipart post requests, basic/preemptive auth) is still required, and, 
> in case it is, port it ! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1740) Solr 9 output connector

2023-06-04 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729125#comment-17729125
 ] 

Julien Massiera commented on CONNECTORS-1740:
-

Bonjour,
Merci pour votre email. Mr Massiera a quitte ses fonctions le 14 avril 2023 et 
cet email sera bientot desactive. Pour toute question sur France Labs ou sur 
des projets en cours, merci de contacter cedric.ulmer att francelabs.com


> Solr 9 output connector
> ---
>
> Key: CONNECTORS-1740
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1740
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Attachments: CONNECTORS-1740.patch
>
>
> The current Solr output connector is not compatible with Solr 9.x
> We need to update the connector with SolrJ 9 and make sure that the custom 
> code (multipart post requests, basic/preemptive auth) is still required, and, 
> in case it is, port it ! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1740) Solr 9 output connector

2023-04-14 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712355#comment-17712355
 ] 

Julien Massiera commented on CONNECTORS-1740:
-

No I did not try with an older Zookeeper version, I used the recommended one 
specified for Solr 9. We can indeed test this ! 

> Solr 9 output connector
> ---
>
> Key: CONNECTORS-1740
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1740
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
>
> The current Solr output connector is not compatible with Solr 9.x
> We need to update the connector with SolrJ 9 and make sure that the custom 
> code (multipart post requests, basic/preemptive auth) is still required, and, 
> in case it is, port it ! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1740) Solr 9 output connector

2023-04-12 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17711425#comment-17711425
 ] 

Julien Massiera commented on CONNECTORS-1740:
-

r1909097 of the branch, the Solr output connector has been fixed and now 
multipart is applied on all POST requests

The Unit tests of framework core (Zookeeper tests) and Solr connector are still 
broken. I would really appreciate help to fix the tests !

> Solr 9 output connector
> ---
>
> Key: CONNECTORS-1740
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1740
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Priority: Major
>
> The current Solr output connector is not compatible with Solr 9.x
> We need to update the connector with SolrJ 9 and make sure that the custom 
> code (multipart post requests, basic/preemptive auth) is still required, and, 
> in case it is, port it ! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CONNECTORS-1742) Handle CSV in JDBC connector

2022-12-08 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1742.
-
Fix Version/s: ManifoldCF next
   Resolution: Fixed

r1905864

> Handle CSV in JDBC connector
> 
>
> Key: CONNECTORS-1742
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1742
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: JDBC connector
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF next
>
>
> A JDBC CSV driver exists [https://github.com/jprante/jdbc-driver-csv] and can 
> be useful to crawl big CSV files. We should add the possibility to use it in 
> the JDBC connector



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1742) Handle CSV in JDBC connector

2022-12-08 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1742:
---

 Summary: Handle CSV in JDBC connector
 Key: CONNECTORS-1742
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1742
 Project: ManifoldCF
  Issue Type: Improvement
  Components: JDBC connector
Affects Versions: ManifoldCF 2.23
Reporter: Julien Massiera
Assignee: Julien Massiera


A JDBC CSV driver exists [https://github.com/jprante/jdbc-driver-csv] and can 
be useful to crawl big CSV files. We should add the possibility to use it in 
the JDBC connector



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1740) Solr 9 output connector

2022-12-06 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643980#comment-17643980
 ] 

Julien Massiera commented on CONNECTORS-1740:
-

I created a CONNECTORS-1740 branch. I updated the SolrJ version to 9 and 
Zookeeper to 3.8.0

Some problems appear: 
 * The tests of the framework core are broken because of Zookeeper
 * The tests of the Solr connector are broken
 * The updated Solr output connector works with Solr 9 and older versions but I 
did not port the basic/preemptive auth, neither the multipart post requests 
custom code. After some tests it appears that they are required because some 
documents with a lot of metadata trigger errors during the ingest phase 

Unfortunately I currently don't have more time to spend on these issues and I 
would appreciate any help to solve them !  

> Solr 9 output connector
> ---
>
> Key: CONNECTORS-1740
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1740
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Priority: Major
>
> The current Solr output connector is not compatible with Solr 9.x
> We need to update the connector with SolrJ 9 and make sure that the custom 
> code (multipart post requests, basic/preemptive auth) is still required, and, 
> in case it is, port it ! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1740) Solr 9 output connector

2022-12-02 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1740:
---

 Summary: Solr 9 output connector
 Key: CONNECTORS-1740
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1740
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Lucene/SOLR connector
Affects Versions: ManifoldCF 2.23
Reporter: Julien Massiera


The current Solr output connector is not compatible with Solr 9.x

We need to update the connector with SolrJ 9 and make sure that the custom code 
(multipart post requests, basic/preemptive auth) is still required, and, in 
case it is, port it ! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CONNECTORS-1736) LDAP Mapper: attribute condition

2022-10-05 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera updated CONNECTORS-1736:

Component/s: LDAP Mapper
 (was: LDAP authority)

> LDAP Mapper: attribute condition
> 
>
> Key: CONNECTORS-1736
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1736
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: LDAP Mapper
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.24
>
>
> Sometimes, the user mapping may depends on a specific attribute value. It 
> would be good to provide a way to configure a mapping condition, based on an 
> LDAP attribute matching a regexp, that will determine the final mapping to 
> perform. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CONNECTORS-1736) LDAP Mapper: attribute condition

2022-10-03 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1736.
-
Fix Version/s: ManifoldCF 2.24
   Resolution: Fixed

r1904380

> LDAP Mapper: attribute condition
> 
>
> Key: CONNECTORS-1736
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1736
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: LDAP authority
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.24
>
>
> Sometimes, the user mapping may depends on a specific attribute value. It 
> would be good to provide a way to configure a mapping condition, based on an 
> LDAP attribute matching a regexp, that will determine the final mapping to 
> perform. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1736) LDAP Mapper: attribute condition

2022-10-03 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1736:
---

 Summary: LDAP Mapper: attribute condition
 Key: CONNECTORS-1736
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1736
 Project: ManifoldCF
  Issue Type: Improvement
  Components: LDAP authority
Affects Versions: ManifoldCF 2.23
Reporter: Julien Massiera
Assignee: Julien Massiera


Sometimes, the user mapping may depends on a specific attribute value. It would 
be good to provide a way to configure a mapping condition, based on an LDAP 
attribute matching a regexp, that will determine the final mapping to perform. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CONNECTORS-1734) Add space and user details in error logs of Confluence authority

2022-09-26 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera updated CONNECTORS-1734:

Fix Version/s: ManifoldCF 2.24
   (was: ManifoldCF next)

> Add space and user details in error logs of Confluence authority
> 
>
> Key: CONNECTORS-1734
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1734
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Confluence connector
>Affects Versions: ManifoldCF 2.22
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.24
>
>
> Currently in the Confluence authority connector, when an error occurs when 
> retrieving user permissions, we generate an error log that does not specify 
> the user and the space concerned. It would be better to put them in the log 
> for debugging purposes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CONNECTORS-1733) TikaServiceRmeta does not properly handle unknown tika exceptions

2022-09-26 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera updated CONNECTORS-1733:

Fix Version/s: ManifoldCF 2.24
   (was: ManifoldCF next)

> TikaServiceRmeta does not properly handle unknown tika exceptions
> -
>
> Key: CONNECTORS-1733
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1733
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Tika service connector
>Affects Versions: ManifoldCF 2.22
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.24
>
>
> With the introduction of new exception formats in Tika 2.0, the 
> TikaServiceRmeta connector does not correctly handle some of them, resulting 
> in metadata and content extraction issues for some files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CONNECTORS-1735) TikaServiceRmeta does not properly handle embedded resources

2022-09-26 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1735.
-
Resolution: Fixed

r1904280

> TikaServiceRmeta does not properly handle embedded resources
> 
>
> Key: CONNECTORS-1735
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1735
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Tika service connector
>Affects Versions: ManifoldCF 2.23
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.24
>
>
> Currently when a file processed by Tika contains embedded resources, the 
> TikaServiceRmeta connector simply ignores the embedded resources. 
> The connector should at least add the extracted content of embedded resources 
> to the main document content if the "Extract archives content" option is 
> enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1735) TikaServiceRmeta does not properly handle embedded resources

2022-09-26 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1735:
---

 Summary: TikaServiceRmeta does not properly handle embedded 
resources
 Key: CONNECTORS-1735
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1735
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tika service connector
Affects Versions: ManifoldCF 2.23
Reporter: Julien Massiera
Assignee: Julien Massiera
 Fix For: ManifoldCF 2.24


Currently when a file processed by Tika contains embedded resources, the 
TikaServiceRmeta connector simply ignores the embedded resources. 

The connector should at least add the extracted content of embedded resources 
to the main document content if the "Extract archives content" option is 
enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CONNECTORS-1734) Add space and user details in error logs of Confluence authority

2022-09-26 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1734.
-
Fix Version/s: ManifoldCF next
   Resolution: Fixed

r1904267

> Add space and user details in error logs of Confluence authority
> 
>
> Key: CONNECTORS-1734
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1734
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Confluence connector
>Affects Versions: ManifoldCF 2.22
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF next
>
>
> Currently in the Confluence authority connector, when an error occurs when 
> retrieving user permissions, we generate an error log that does not specify 
> the user and the space concerned. It would be better to put them in the log 
> for debugging purposes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1734) Add space and user details in error logs of Confluence authority

2022-09-26 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1734:
---

 Summary: Add space and user details in error logs of Confluence 
authority
 Key: CONNECTORS-1734
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1734
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Confluence connector
Affects Versions: ManifoldCF 2.22
Reporter: Julien Massiera
Assignee: Julien Massiera


Currently in the Confluence authority connector, when an error occurs when 
retrieving user permissions, we generate an error log that does not specify the 
user and the space concerned. It would be better to put them in the log for 
debugging purposes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CONNECTORS-1733) TikaServiceRmeta does not properly handle unknown tika exceptions

2022-09-26 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1733.
-
Fix Version/s: ManifoldCF next
   Resolution: Fixed

r1904264

> TikaServiceRmeta does not properly handle unknown tika exceptions
> -
>
> Key: CONNECTORS-1733
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1733
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Tika service connector
>Affects Versions: ManifoldCF 2.22
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF next
>
>
> With the introduction of new exception formats in Tika 2.0, the 
> TikaServiceRmeta connector does not correctly handle some of them, resulting 
> in metadata and content extraction issues for some files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1733) TikaServiceRmeta does not properly handle unknown tika exceptions

2022-09-26 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1733:
---

 Summary: TikaServiceRmeta does not properly handle unknown tika 
exceptions
 Key: CONNECTORS-1733
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1733
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tika service connector
Affects Versions: ManifoldCF 2.22
Reporter: Julien Massiera
Assignee: Julien Massiera


With the introduction of new exception formats in Tika 2.0, the 
TikaServiceRmeta connector does not correctly handle some of them, resulting in 
metadata and content extraction issues for some files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CONNECTORS-1732) Github mirror out of sync

2022-09-21 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1732.
-
Resolution: Invalid

wrong place for that issue

> Github mirror out of sync
> -
>
> Key: CONNECTORS-1732
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1732
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Build
>Affects Versions: ManifoldCF 2.22
>Reporter: Julien Massiera
>Priority: Major
>
> The Github mirror seems out of sync with the SVN repo since June 2022



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1732) Github mirror out of sync

2022-09-21 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1732:
---

 Summary: Github mirror out of sync
 Key: CONNECTORS-1732
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1732
 Project: ManifoldCF
  Issue Type: Bug
  Components: Build
Affects Versions: ManifoldCF 2.22
Reporter: Julien Massiera


The Github mirror seems out of sync with the SVN repo since June 2022



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CONNECTORS-1721) Confluence v6 does not distinguish 404 errors

2022-07-19 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1721.
-
Fix Version/s: ManifoldCF 2.23
   Resolution: Fixed

r1902854

> Confluence v6 does not distinguish 404 errors
> -
>
> Key: CONNECTORS-1721
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1721
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Confluence connector
>Affects Versions: ManifoldCF 2.22
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.23
>
>
> The ConfluenceV6 connector does not distinguish 404 errors from others. It is 
> problematic concerning the authority because the 404 error corresponds to a 
> "user not found" response instead of a "dead authority"
> The connector must correctly handle the 404 errors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1721) Confluence v6 does not distinguish 404 errors

2022-07-19 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1721:
---

 Summary: Confluence v6 does not distinguish 404 errors
 Key: CONNECTORS-1721
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1721
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Confluence connector
Affects Versions: ManifoldCF 2.22
Reporter: Julien Massiera
Assignee: Julien Massiera


The ConfluenceV6 connector does not distinguish 404 errors from others. It is 
problematic concerning the authority because the 404 error corresponds to a 
"user not found" response instead of a "dead authority"

The connector must correctly handle the 404 errors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CONNECTORS-1719) Handle MariaDB in JDBC connector

2022-07-07 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1719.
-
Fix Version/s: ManifoldCF 2.23
   Resolution: Fixed

r1902537

> Handle MariaDB in JDBC connector
> 
>
> Key: CONNECTORS-1719
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1719
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: JDBC connector
>Affects Versions: ManifoldCF 2.22
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.23
>
>
> Currently the JDBC connector does not officially handle MariaDB databases.
> It may work with MariaDB databases up to v2.x using the MySQL type because 
> before v3, MariaDB was compatible with the MySQL JDBC driver and provider 
> name, but it is not anymore the case.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONNECTORS-1719) Handle MariaDB in JDBC connector

2022-07-07 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1719:
---

 Summary: Handle MariaDB in JDBC connector
 Key: CONNECTORS-1719
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1719
 Project: ManifoldCF
  Issue Type: Improvement
  Components: JDBC connector
Affects Versions: ManifoldCF 2.22
Reporter: Julien Massiera
Assignee: Julien Massiera


Currently the JDBC connector does not officially handle MariaDB databases.

It may work with MariaDB databases up to v2.x using the MySQL type because 
before v3, MariaDB was compatible with the MySQL JDBC driver and provider name, 
but it is not anymore the case.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONNECTORS-1667) New Tika Service Connector

2022-06-08 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551454#comment-17551454
 ] 

Julien Massiera commented on CONNECTORS-1667:
-

Hi [~cguzel], no the Tika service connector does not correctly handle Tika 
server 2.x because of the metadata keys indeed. You should consider using the 
tika-service-rmeta-connector instead which is better in terms of performances 
and stability, and has been updated to be compatible with the latest version of 
Tika Server (see CONNECTORS-1703)

I am currently only maintaining that version of tika service connector by the 
way, because as you said, the maintenance cost is very limited, and having an 
external Tika instead of an embedded one is more reliable.

 

> New Tika Service Connector
> --
>
> Key: CONNECTORS-1667
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1667
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Tika service connector
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.20
>
>
> The current Tika Service Connector exploits the '/unpack/all' endpoint of a 
> Tika Server. This endpoint is not optimal to only extract document's metadata 
> and content.  We should develop a new connector based on the 'rmeta' endpoint 
> which is more suited for our needs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (CONNECTORS-1712) Broken Velocity UI

2022-05-06 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1712:
---

 Summary: Broken Velocity UI
 Key: CONNECTORS-1712
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1712
 Project: ManifoldCF
  Issue Type: Bug
  Components: API
Affects Versions: ManifoldCF 2.22
Reporter: Julien Massiera


In the mcf-crawler-ui, we cannot enter in edition mode for any connector 
because there is a problem with Velocity.

We obtain the following error in the logs:

 
{code:java}
java.lang.NoSuchMethodError: 'void 
org.apache.velocity.app.VelocityEngine.setExtendedProperties(org.apache.commons.collections.ExtendedProperties)'
    at 
org.apache.manifoldcf.core.i18n.Messages.createVelocityEngine(Messages.java:62) 
~[mcf-core.jar:?]
    at 
org.apache.manifoldcf.ui.i18n.Messages.outputResourceWithVelocity(Messages.java:132)
 ~[mcf-ui-core.jar:?]
    at 
com.francelabs.datafari.connectors.share.Messages.outputResourceWithVelocity(Messages.java:111)
 ~[?:?]
    at 
com.francelabs.datafari.connectors.share.SharedDriveConnector.outputSpecificationHeader(SharedDriveConnector.java:2829)
 ~[?:?]
    at org.apache.jsp.editjob_jsp._jspService(editjob_jsp.java:977) 
~[mcf-crawler-ui.jar:?]
    at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) 
~[jasper.jar:9.0.56]
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:764) 
~[servlet-api.jar:4.0.FR]
    at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:466) 
~[jasper.jar:9.0.56]
    at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:379) 
~[jasper.jar:9.0.56]
    at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:327) 
~[jasper.jar:9.0.56]
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:764) 
~[servlet-api.jar:4.0.FR]
    at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:227)
 ~[catalina.jar:9.0.56]
    at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
 ~[catalina.jar:9.0.56]
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) 
~[tomcat-websocket.jar:9.0.56]
    at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
 ~[catalina.jar:9.0.56]
    at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
 ~[catalina.jar:9.0.56]
    at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:197)
 [catalina.jar:9.0.56]
    at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97)
 [catalina.jar:9.0.56]
    at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:540)
 [catalina.jar:9.0.56]
    at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:135) 
[catalina.jar:9.0.56]
    at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) 
[catalina.jar:9.0.56]
    at 
org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:687)
 [catalina.jar:9.0.56]
    at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78)
 [catalina.jar:9.0.56]
    at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:357) 
[catalina.jar:9.0.56]
    at org.apache.coyote.ajp.AjpProcessor.service(AjpProcessor.java:433) 
[tomcat-coyote.jar:9.0.56]
    at 
org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
 [tomcat-coyote.jar:9.0.56]
    at 
org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:895)
 [tomcat-coyote.jar:9.0.56]
    at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1732)
 [tomcat-coyote.jar:9.0.56]
    at 
org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) 
[tomcat-coyote.jar:9.0.56]
    at 
org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
 [tomcat-util.jar:9.0.56]
    at 
org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
 [tomcat-util.jar:9.0.56]
    at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
 [tomcat-util.jar:9.0.56]
    at java.lang.Thread.run(Thread.java:829) [?:?] {code}
 

 

After some investigations it seems related to the updated velocity lib from 
velocity-1.7 to velocity-engine-core-2.3 

The first problem found is that the old velocity version libs and their 
dependencies are still present in the MCF build for the MCF Agent AND the 
mcf-crawler-ui.
The concerned libs are:

- commons-collections-3.2.2.jar
- commons-lang-2.6.jar
- velocity-1.7.jar

 

The second problem is that with the new velocity version, the way to set 
properties to the engine has changed. So the code in the 
org.apache.manifoldcf.core.i18n.Messages.createVelocityEngine method 

[jira] [Commented] (CONNECTORS-1707) LiveLink Connector Ant build broken

2022-04-27 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528777#comment-17528777
 ] 

Julien Massiera commented on CONNECTORS-1707:
-

Thanks [~kwri...@metacarta.com], indeed I missed to clean but I needed to 
perform a global "ant clean" because just the "ant clean-core-deps" was not 
enough. 

Now I cant build everything but I still have a test fail on the cmisoutput, I 
will send a mail on the dev mailing list on the subject

> LiveLink Connector Ant build broken
> ---
>
> Key: CONNECTORS-1707
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1707
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: LiveLink connector
>Reporter: Piergiorgio Lucidi
>Priority: Major
>
> Trying to build the LiveLink connector executing Ant returns an error. Using 
> Maven everything is correctly compiled.
> The cause is related to the WSDL generation, the Ant process is failing but 
> it seems to return a success outcome even if we have the following error 
> executing ant classcreate-wsdls:
>  
> {code:java}
> WSDLToJava Error: org.apache.cxf.bus.extension.ExtensionException: Could not 
> load extension class org.apache.cxf.common.util.ASMHelperImpl.{code}
>  
> Below the entire output of the Ant build:
>  
> {code:java}
> pjlucidi@MBP-Pj csws $ant
> Buildfile: 
> /Users/pjlucidi/workspaces/manifoldcf/manifoldcf-trunk/connectors/csws/build.xmlcalculate-condition:precompile-warn:precompile-check:has-RMI-check:compile-interface:jar-interface:has-stubs-check:has-proprietary-materials-check:build-stubs-check:compile-stubs:compile-implementation:setup-rmic:rmic-build-all:compile-rmic:jar-rmistub:lib-rmi:classcreate-wsdls:classcreate-wsdl-cxf:
>      [java] SLF4J: Class path contains multiple SLF4J bindings.
>      [java] SLF4J: Found binding in 
> [jar:file:/Users/pjlucidi/workspaces/manifoldcf/manifoldcf-trunk/dist/lib/slf4j-simple-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>      [java] SLF4J: Found binding in 
> [jar:file:/Users/pjlucidi/workspaces/manifoldcf/manifoldcf-trunk/dist/lib/slf4j-simple-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>      [java] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for 
> an explanation.
>      [java] SLF4J: Actual binding is of type 
> [org.slf4j.impl.SimpleLoggerFactory]
>      [java] WARNING: An illegal reflective access operation has occurred
>      [java] WARNING: Illegal reflective access by 
> com.sun.xml.bind.v2.runtime.reflect.opt.Injector 
> (file:/Users/pjlucidi/workspaces/manifoldcf/manifoldcf-trunk/dist/connector-common-lib/jaxb-impl-2.3.0.jar)
>  to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int)
>      [java] WARNING: Please consider reporting this to the maintainers of 
> com.sun.xml.bind.v2.runtime.reflect.opt.Injector
>      [java] WARNING: Use --illegal-access=warn to enable warnings of further 
> illegal reflective access operations
>      [java] WARNING: All illegal access operations will be denied in a future 
> release
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default databinding source
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default databinding domsource
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default databinding staxsource
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default databinding saxsource
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default databinding jaxb
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default frontend jaxws
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default frontend jaxws21
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default frontend cxf
>      [java] [main] WARN org.apache.velocity.deprecation - configuration key 
> 'class.resource.loader.class' has been deprecated in favor of 
> 'resource.loader.class.class'
>      [java] [main] WARN org.apache.velocity.deprecation - configuration key 
> 'resource.loader' has been deprecated in favor of 'resource.loaders'
>      [java]
>      [java] WSDLToJava Error: 
> org.apache.cxf.bus.extension.ExtensionException: Could not load extension 
> class org.apache.cxf.common.util.ASMHelperImpl.
>      [java]
>      [java] Java Result: 1classcreate-wsdl-cxf:
>      [java] SLF4J: Class path contains multiple SLF4J bindings.
>      [java] SLF4J: Found binding in 
> [jar:file:/Users/pjlucidi/workspaces/manifoldcf/manifoldcf-trunk/dist/lib/slf4j-simple-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>      [java] SLF4J: Found binding in 
> 

[jira] [Commented] (CONNECTORS-1707) LiveLink Connector Ant build broken

2022-04-27 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528686#comment-17528686
 ] 

Julien Massiera commented on CONNECTORS-1707:
-

[~kwri...@metacarta.com] , I see that the WSDLToJava includes the cxf jars of 
the connector-common-lib of the dist folder. So I checked it and in my case it 
contains two versions of each cxf lib : the version 3.3.1 and the 3.5.0. This 
is for sure causing troubles 

> LiveLink Connector Ant build broken
> ---
>
> Key: CONNECTORS-1707
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1707
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: LiveLink connector
>Reporter: Piergiorgio Lucidi
>Priority: Major
>
> Trying to build the LiveLink connector executing Ant returns an error. Using 
> Maven everything is correctly compiled.
> The cause is related to the WSDL generation, the Ant process is failing but 
> it seems to return a success outcome even if we have the following error 
> executing ant classcreate-wsdls:
>  
> {code:java}
> WSDLToJava Error: org.apache.cxf.bus.extension.ExtensionException: Could not 
> load extension class org.apache.cxf.common.util.ASMHelperImpl.{code}
>  
> Below the entire output of the Ant build:
>  
> {code:java}
> pjlucidi@MBP-Pj csws $ant
> Buildfile: 
> /Users/pjlucidi/workspaces/manifoldcf/manifoldcf-trunk/connectors/csws/build.xmlcalculate-condition:precompile-warn:precompile-check:has-RMI-check:compile-interface:jar-interface:has-stubs-check:has-proprietary-materials-check:build-stubs-check:compile-stubs:compile-implementation:setup-rmic:rmic-build-all:compile-rmic:jar-rmistub:lib-rmi:classcreate-wsdls:classcreate-wsdl-cxf:
>      [java] SLF4J: Class path contains multiple SLF4J bindings.
>      [java] SLF4J: Found binding in 
> [jar:file:/Users/pjlucidi/workspaces/manifoldcf/manifoldcf-trunk/dist/lib/slf4j-simple-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>      [java] SLF4J: Found binding in 
> [jar:file:/Users/pjlucidi/workspaces/manifoldcf/manifoldcf-trunk/dist/lib/slf4j-simple-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>      [java] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for 
> an explanation.
>      [java] SLF4J: Actual binding is of type 
> [org.slf4j.impl.SimpleLoggerFactory]
>      [java] WARNING: An illegal reflective access operation has occurred
>      [java] WARNING: Illegal reflective access by 
> com.sun.xml.bind.v2.runtime.reflect.opt.Injector 
> (file:/Users/pjlucidi/workspaces/manifoldcf/manifoldcf-trunk/dist/connector-common-lib/jaxb-impl-2.3.0.jar)
>  to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int)
>      [java] WARNING: Please consider reporting this to the maintainers of 
> com.sun.xml.bind.v2.runtime.reflect.opt.Injector
>      [java] WARNING: Use --illegal-access=warn to enable warnings of further 
> illegal reflective access operations
>      [java] WARNING: All illegal access operations will be denied in a future 
> release
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default databinding source
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default databinding domsource
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default databinding staxsource
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default databinding saxsource
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default databinding jaxb
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default frontend jaxws
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default frontend jaxws21
>      [java] [main] INFO org.apache.cxf.tools.wsdlto.core.PluginLoader - 
> Replaced default frontend cxf
>      [java] [main] WARN org.apache.velocity.deprecation - configuration key 
> 'class.resource.loader.class' has been deprecated in favor of 
> 'resource.loader.class.class'
>      [java] [main] WARN org.apache.velocity.deprecation - configuration key 
> 'resource.loader' has been deprecated in favor of 'resource.loaders'
>      [java]
>      [java] WSDLToJava Error: 
> org.apache.cxf.bus.extension.ExtensionException: Could not load extension 
> class org.apache.cxf.common.util.ASMHelperImpl.
>      [java]
>      [java] Java Result: 1classcreate-wsdl-cxf:
>      [java] SLF4J: Class path contains multiple SLF4J bindings.
>      [java] SLF4J: Found binding in 
> [jar:file:/Users/pjlucidi/workspaces/manifoldcf/manifoldcf-trunk/dist/lib/slf4j-simple-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>      [java] SLF4J: Found binding in 
> 

[jira] [Resolved] (CONNECTORS-1704) Confluence v6: rename project name

2022-04-14 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1704.
-
Resolution: Fixed

r1899863

> Confluence v6: rename project name
> --
>
> Key: CONNECTORS-1704
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1704
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Confluence connector
>Affects Versions: ManifoldCF 2.21
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF next
>
>
> The final jar name of the confluence v6 connector contains a space because 
> the project name in the build.xml file is "confluence v6". 
> Having spaces in filenames is not a good practice so it would be better to 
> rename the project name to avoid that



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CONNECTORS-1704) Confluence v6: rename project name

2022-04-14 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1704:
---

 Summary: Confluence v6: rename project name
 Key: CONNECTORS-1704
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1704
 Project: ManifoldCF
  Issue Type: Task
  Components: Confluence connector
Affects Versions: ManifoldCF 2.21
Reporter: Julien Massiera
Assignee: Julien Massiera
 Fix For: ManifoldCF next


The final jar name of the confluence v6 connector contains a space because the 
project name in the build.xml file is "confluence v6". 

Having spaces in filenames is not a good practice so it would be better to 
rename the project name to avoid that



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CONNECTORS-1703) TikaServiceRmeta: update to handle 2.4.0 changes

2022-04-13 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1703.
-
Resolution: Fixed

r1899815

> TikaServiceRmeta: update to handle 2.4.0 changes
> 
>
> Key: CONNECTORS-1703
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1703
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Tika service connector
>Affects Versions: ManifoldCF 2.21
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF next
>
>
> Tika 2.4.0 introduces a new warn message to indicate that metadata have been 
> truncated. The connector can be updated to consider this warn and specify it 
> in the doc process description



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CONNECTORS-1703) TikaServiceRmeta: update to handle 2.4.0 changes

2022-04-13 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1703:
---

 Summary: TikaServiceRmeta: update to handle 2.4.0 changes
 Key: CONNECTORS-1703
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1703
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Tika service connector
Affects Versions: ManifoldCF 2.21
Reporter: Julien Massiera
Assignee: Julien Massiera
 Fix For: ManifoldCF next


Tika 2.4.0 introduces a new warn message to indicate that metadata have been 
truncated. The connector can be updated to consider this warn and specify it in 
the doc process description



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (CONNECTORS-1701) Add date info on OOM Error in WorkerThread

2022-03-16 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera updated CONNECTORS-1701:

Component/s: Framework crawler agent
 (was: Framework agents process)

> Add date info on OOM Error in WorkerThread
> --
>
> Key: CONNECTORS-1701
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1701
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Framework crawler agent
>Affects Versions: ManifoldCF 2.21
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF next
>
>
> When an OOM Error occurs, the timestamp/date of the error can be very useful 
> for investigations. Since it is currently not present in the output, it is 
> worth adding it



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CONNECTORS-1701) Add date info on OOM Error in WorkerThread

2022-03-16 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1701.
-
Fix Version/s: ManifoldCF next
   Resolution: Fixed

r1898966

> Add date info on OOM Error in WorkerThread
> --
>
> Key: CONNECTORS-1701
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1701
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Framework agents process
>Affects Versions: ManifoldCF 2.21
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF next
>
>
> When an OOM Error occurs, the timestamp/date of the error can be very useful 
> for investigations. Since it is currently not present in the output, it is 
> worth adding it



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CONNECTORS-1701) Add date info on OOM Error in WorkerThread

2022-03-16 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1701:
---

 Summary: Add date info on OOM Error in WorkerThread
 Key: CONNECTORS-1701
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1701
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Framework agents process
Affects Versions: ManifoldCF 2.21
Reporter: Julien Massiera
Assignee: Julien Massiera


When an OOM Error occurs, the timestamp/date of the error can be very useful 
for investigations. Since it is currently not present in the output, it is 
worth adding it



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CONNECTORS-1700) TikaServiceRmeta: Add options to filter out metadata based on size

2022-03-15 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1700.
-
Fix Version/s: ManifoldCF next
   Resolution: Fixed

r1898949

> TikaServiceRmeta: Add options to filter out metadata based on size
> --
>
> Key: CONNECTORS-1700
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1700
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Tika service connector
>Affects Versions: ManifoldCF 2.21
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF next
>
>
> Some files may contain abnormally big metadata (several MB, be it for the 
> metadata values, but also for the total amount of metadata) that can be 
> problematic concerning the memory consumption of the connector. 
> To avoid this, we can provide job configuration options for the 
> TikaServiceRmetaConnector to set limits on both metadata values and global 
> amount of metadata, and exclude metadata that exceed the limits



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CONNECTORS-1700) TikaServiceRmeta: Add options to filter out metadata based on size

2022-03-15 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1700:
---

 Summary: TikaServiceRmeta: Add options to filter out metadata 
based on size
 Key: CONNECTORS-1700
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1700
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Tika service connector
Affects Versions: ManifoldCF 2.21
Reporter: Julien Massiera
Assignee: Julien Massiera


Some files may contain abnormally big metadata (several MB, be it for the 
metadata values, but also for the total amount of metadata) that can be 
problematic concerning the memory consumption of the connector. 

To avoid this, we can provide job configuration options for the 
TikaServiceRmetaConnector to set limits on both metadata values and global 
amount of metadata, and exclude metadata that exceed the limits



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (CONNECTORS-1665) WebConnector: Add activity records for excluded URLs

2022-01-24 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480957#comment-17480957
 ] 

Julien Massiera commented on CONNECTORS-1665:
-

r1897405

> WebConnector: Add activity records for excluded URLs 
> -
>
> Key: CONNECTORS-1665
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1665
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.18
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Trivial
> Fix For: ManifoldCF 2.19
>
> Attachments: patch-CONNECTORS-1665
>
>
> It would be interesting to add activity records in the WebConnector to keep 
> track of excluded URLs that match an exclude filter



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (CONNECTORS-1665) WebConnector: Add activity records for excluded URLs

2022-01-21 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480170#comment-17480170
 ] 

Julien Massiera commented on CONNECTORS-1665:
-

The patch has never been reviewed. [~kwri...@metacarta.com] can you take a look 
and tell if it can be integrated to the trunk please ?

> WebConnector: Add activity records for excluded URLs 
> -
>
> Key: CONNECTORS-1665
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1665
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.18
>Reporter: Julien Massiera
>Priority: Trivial
> Fix For: ManifoldCF 2.19
>
> Attachments: patch-CONNECTORS-1665
>
>
> It would be interesting to add activity records in the WebConnector to keep 
> track of excluded URLs that match an exclude filter



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (CONNECTORS-1692) LDAP Mapper Connector

2022-01-11 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472956#comment-17472956
 ] 

Julien Massiera commented on CONNECTORS-1692:
-

r1896917 of branch CONNECTORS-1692

> LDAP Mapper Connector
> -
>
> Key: CONNECTORS-1692
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1692
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: LDAP authority
>Affects Versions: ManifoldCF 2.21
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
>
> Sometimes one need to be able to map an LDAP user id to a specific attribute. 
> So it would be good to develop an LDAP Mapper for this purpose



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CONNECTORS-1692) LDAP Mapper Connector

2022-01-11 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1692:
---

 Summary: LDAP Mapper Connector
 Key: CONNECTORS-1692
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1692
 Project: ManifoldCF
  Issue Type: New Feature
  Components: LDAP authority
Affects Versions: ManifoldCF 2.21
Reporter: Julien Massiera
Assignee: Julien Massiera


Sometimes one need to be able to map an LDAP user id to a specific attribute. 
So it would be good to develop an LDAP Mapper for this purpose



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CONNECTORS-1667) New Tika Service Connector

2022-01-11 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1667.
-
Fix Version/s: ManifoldCF 2.20
   Resolution: Fixed

> New Tika Service Connector
> --
>
> Key: CONNECTORS-1667
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1667
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Tika service connector
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.20
>
>
> The current Tika Service Connector exploits the '/unpack/all' endpoint of a 
> Tika Server. This endpoint is not optimal to only extract document's metadata 
> and content.  We should develop a new connector based on the 'rmeta' endpoint 
> which is more suited for our needs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (CONNECTORS-1686) Solr Ingester: issues with CursorMark

2021-12-15 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460228#comment-17460228
 ] 

Julien Massiera commented on CONNECTORS-1686:
-

r1896007

> Solr Ingester: issues with CursorMark
> -
>
> Key: CONNECTORS-1686
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1686
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
>
> The Solr Ingester connector may have some issues with the 
> response.getNextCursorMark() method when processing requests responses. 
> Indeed, sometimes the response contains errors and/or is malformed, and this 
> method raises an exception that is currently not handled.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CONNECTORS-1686) Solr Ingester: issues with CursorMark

2021-12-15 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1686.
-
Resolution: Fixed

> Solr Ingester: issues with CursorMark
> -
>
> Key: CONNECTORS-1686
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1686
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
>
> The Solr Ingester connector may have some issues with the 
> response.getNextCursorMark() method when processing requests responses. 
> Indeed, sometimes the response contains errors and/or is malformed, and this 
> method raises an exception that is currently not handled.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CONNECTORS-1688) Solr Ingester: ensure single valued date field

2021-12-14 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1688.
-
Fix Version/s: ManifoldCF 2.21
   Resolution: Fixed

r1895960

> Solr Ingester: ensure single valued date field
> --
>
> Key: CONNECTORS-1688
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1688
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
>
> In the Solr Ingester connector, the date field is supposed to be single 
> valued, but there is no check in the code that it is the case, which can be 
> problematic for the output. 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CONNECTORS-1688) Solr Ingester: ensure single valued date field

2021-12-14 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1688:
---

 Summary: Solr Ingester: ensure single valued date field
 Key: CONNECTORS-1688
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1688
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Lucene/SOLR connector
Affects Versions: ManifoldCF 2.20
Reporter: Julien Massiera
Assignee: Julien Massiera


In the Solr Ingester connector, the date field is supposed to be single valued, 
but there is no check in the code that it is the case, which can be problematic 
for the output. 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CONNECTORS-1687) Solr Ingester: support more field types in field mappings

2021-12-14 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1687.
-
Fix Version/s: ManifoldCF 2.21
   Resolution: Fixed

r1895959

> Solr Ingester: support more field types in field mappings
> -
>
> Key: CONNECTORS-1687
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1687
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
>
> Currently the Solr Ingester connector can only handle string type fields when 
> configuring the "Field mappings" parameter in a job configuration.
> It would be good to also support at least long, int, and date types



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CONNECTORS-1687) Solr Ingester: support more field types in field mappings

2021-12-14 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1687:
---

 Summary: Solr Ingester: support more field types in field mappings
 Key: CONNECTORS-1687
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1687
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Lucene/SOLR connector
Affects Versions: ManifoldCF 2.20
Reporter: Julien Massiera
Assignee: Julien Massiera


Currently the Solr Ingester connector can only handle string type fields when 
configuring the "Field mappings" parameter in a job configuration.

It would be good to also support at least long, int, and date types



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CONNECTORS-1686) Solr Ingester: issues with CursorMark

2021-12-14 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1686.
-
Fix Version/s: ManifoldCF 2.21
   Resolution: Fixed

r1895958

> Solr Ingester: issues with CursorMark
> -
>
> Key: CONNECTORS-1686
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1686
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
>
> The Solr Ingester connector may have some issues with the 
> response.getNextCursorMark() method when processing requests responses. 
> Indeed, sometimes the response contains errors and/or is malformed, and this 
> method raises an exception that is currently not handled.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CONNECTORS-1686) Solr Ingester: issues with CursorMark

2021-12-14 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1686:
---

 Summary: Solr Ingester: issues with CursorMark
 Key: CONNECTORS-1686
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1686
 Project: ManifoldCF
  Issue Type: Bug
  Components: Lucene/SOLR connector
Affects Versions: ManifoldCF 2.20
Reporter: Julien Massiera
Assignee: Julien Massiera


The Solr Ingester connector may have some issues with the 
response.getNextCursorMark() method when processing requests responses. Indeed, 
sometimes the response contains errors and/or is malformed, and this method 
raises an exception that is currently not handled.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (CONNECTORS-1681) TikaServiceRmeta: recordActivity can cause Database exception

2021-11-24 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera updated CONNECTORS-1681:

Description: 
Some files containing non UTF8 characters can cause Tika to trigger an 
exception describing the parsing problem. 
As the TikaServiceRmeta connector creates an activity record for any Tika 
exception containing its description (and so that contains the non UTF8 char in 
those cases), it causes an SQL exception when MCF tries to insert the activity 
record in the Database:
{code:java}
ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - 
MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and 
restarting due to database connection reset: Database exception: SQLException 
doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: 
SQLException doing query (22021): ERROR: invalid byte sequence for encoding 
"UTF8": 0x00 {code}
So to avoid this, we need to remove those problematic chars from the exception 
description before recording the activity

 

  was:
Some files containing non ASCII characters can cause Tika to trigger an 
exception describing the parsing problem. 
As the TikaServiceRmeta connector creates an activity record for any Tika 
exception containing its description (and so that contains the non ASCII char 
in those cases), it causes an SQL exception when MCF tries to insert the 
activity record in Postgres:
{code:java}
ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - 
MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and 
restarting due to database connection reset: Database exception: SQLException 
doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: 
SQLException doing query (22021): ERROR: invalid byte sequence for encoding 
"UTF8": 0x00 {code}
So to avoid this, we need to remove any non ASCII chars from the exception 
description before recording the activity

 


> TikaServiceRmeta: recordActivity can cause Database exception
> -
>
> Key: CONNECTORS-1681
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1681
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Tika service connector
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
>
> Some files containing non UTF8 characters can cause Tika to trigger an 
> exception describing the parsing problem. 
> As the TikaServiceRmeta connector creates an activity record for any Tika 
> exception containing its description (and so that contains the non UTF8 char 
> in those cases), it causes an SQL exception when MCF tries to insert the 
> activity record in the Database:
> {code:java}
> ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - 
> MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and 
> restarting due to database connection reset: Database exception: SQLException 
> doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database 
> exception: SQLException doing query (22021): ERROR: invalid byte sequence for 
> encoding "UTF8": 0x00 {code}
> So to avoid this, we need to remove those problematic chars from the 
> exception description before recording the activity
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (CONNECTORS-1681) TikaServiceRmeta: recordActivity can cause Database exception

2021-11-24 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448766#comment-17448766
 ] 

Julien Massiera commented on CONNECTORS-1681:
-

Indeed [~kwri...@metacarta.com], it is the description of my issue that is 
wrong. I decided to remove non ASCII chars and not just non UTF8 chars because 
the description of the error that the TikaServiceRmeta connector is writing as 
activity record is just there to be readable and give a global idea of what was 
wrong during the Tika processing phase. So I wanted to be sure that the 
activity record only contains "standard" chars even if we loose some of them, 
the accurate exception is still available in the log file. Are you ok with that 
? 

> TikaServiceRmeta: recordActivity can cause Database exception
> -
>
> Key: CONNECTORS-1681
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1681
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Tika service connector
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
>
> Some files containing non ASCII characters can cause Tika to trigger an 
> exception describing the parsing problem. 
> As the TikaServiceRmeta connector creates an activity record for any Tika 
> exception containing its description (and so that contains the non ASCII char 
> in those cases), it causes an SQL exception when MCF tries to insert the 
> activity record in Postgres:
> {code:java}
> ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - 
> MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and 
> restarting due to database connection reset: Database exception: SQLException 
> doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database 
> exception: SQLException doing query (22021): ERROR: invalid byte sequence for 
> encoding "UTF8": 0x00 {code}
> So to avoid this, we need to remove any non ASCII chars from the exception 
> description before recording the activity
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CONNECTORS-1681) TikaServiceRmeta: recordActivity can cause Database exception

2021-11-24 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1681.
-
Fix Version/s: ManifoldCF 2.21
   Resolution: Fixed

r1895299

> TikaServiceRmeta: recordActivity can cause Database exception
> -
>
> Key: CONNECTORS-1681
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1681
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Tika service connector
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
>
> Some files containing non ASCII characters can cause Tika to trigger an 
> exception describing the parsing problem. 
> As the TikaServiceRmeta connector creates an activity record for any Tika 
> exception containing its description (and so that contains the non ASCII char 
> in those cases), it causes an SQL exception when MCF tries to insert the 
> activity record in Postgres:
> {code:java}
> ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - 
> MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and 
> restarting due to database connection reset: Database exception: SQLException 
> doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database 
> exception: SQLException doing query (22021): ERROR: invalid byte sequence for 
> encoding "UTF8": 0x00 {code}
> So to avoid this, we need to remove any non ASCII chars from the exception 
> description before recording the activity
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CONNECTORS-1681) TikaServiceRmeta: recordActivity can cause Database exception

2021-11-24 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1681:
---

 Summary: TikaServiceRmeta: recordActivity can cause Database 
exception
 Key: CONNECTORS-1681
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1681
 Project: ManifoldCF
  Issue Type: Bug
  Components: Tika service connector
Affects Versions: ManifoldCF 2.20
Reporter: Julien Massiera
Assignee: Julien Massiera


Some files containing non ASCII characters can cause Tika to trigger an 
exception describing the parsing problem. 
As the TikaServiceRmeta connector creates an activity record for any Tika 
exception containing its description (and so that contains the non ASCII char 
in those cases), it causes an SQL exception when MCF tries to insert the 
activity record in Postgres:
{code:java}
ERROR 2021-11-24T13:37:00,121 (Worker thread '41') - 
MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Worker thread aborting and 
restarting due to database connection reset: Database exception: SQLException 
doing query (22021): ERROR: invalid byte sequence for encoding "UTF8": 0x00
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: 
SQLException doing query (22021): ERROR: invalid byte sequence for encoding 
"UTF8": 0x00 {code}
So to avoid this, we need to remove any non ASCII chars from the exception 
description before recording the activity

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (CONNECTORS-1679) HTML Extractor: output has escaped entities

2021-11-19 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446405#comment-17446405
 ] 

Julien Massiera commented on CONNECTORS-1679:
-

r1895172

> HTML Extractor: output has escaped entities
> ---
>
> Key: CONNECTORS-1679
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1679
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: HTML extractor
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
> Attachments: patch-CONNECTORS-1679.txt
>
>
> The output of the HTML extractor is generated with escaped entities (eg '&' 
> becomes '& amp ;'), which is not the wanted behavior as we want this 
> connector to extract text from HTML as it is



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (CONNECTORS-1679) HTML Extractor: output has escaped entities

2021-11-18 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446104#comment-17446104
 ] 

Julien Massiera commented on CONNECTORS-1679:
-

Patch submitted 

> HTML Extractor: output has escaped entities
> ---
>
> Key: CONNECTORS-1679
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1679
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: HTML extractor
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
> Attachments: patch-CONNECTORS-1679.txt
>
>
> The output of the HTML extractor is generated with escaped entities (eg '&' 
> becomes ''), which is not the wanted behavior as we want this connector 
> to extract text from HTML as it is



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (CONNECTORS-1679) HTML Extractor: output has escaped entities

2021-11-18 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera updated CONNECTORS-1679:

Description: The output of the HTML extractor is generated with escaped 
entities (eg '&' becomes '& amp ;'), which is not the wanted behavior as we 
want this connector to extract text from HTML as it is  (was: The output of the 
HTML extractor is generated with escaped entities (eg '&' becomes ''), 
which is not the wanted behavior as we want this connector to extract text from 
HTML as it is)

> HTML Extractor: output has escaped entities
> ---
>
> Key: CONNECTORS-1679
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1679
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: HTML extractor
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.21
>
> Attachments: patch-CONNECTORS-1679.txt
>
>
> The output of the HTML extractor is generated with escaped entities (eg '&' 
> becomes '& amp ;'), which is not the wanted behavior as we want this 
> connector to extract text from HTML as it is



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CONNECTORS-1679) HTML Extractor: output has escaped entities

2021-11-18 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1679:
---

 Summary: HTML Extractor: output has escaped entities
 Key: CONNECTORS-1679
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1679
 Project: ManifoldCF
  Issue Type: Bug
  Components: HTML extractor
Affects Versions: ManifoldCF 2.20
Reporter: Julien Massiera


The output of the HTML extractor is generated with escaped entities (eg '&' 
becomes ''), which is not the wanted behavior as we want this connector to 
extract text from HTML as it is



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CONNECTORS-1678) Confluence v6 - Configuration of retry interval and retry numbers on exceptions

2021-10-22 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1678.
-
Resolution: Fixed

r1894485

> Confluence v6 - Configuration of retry interval and retry numbers on 
> exceptions
> ---
>
> Key: CONNECTORS-1678
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1678
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Confluence connector
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
>
> Currently the value of the retry interval in ms, and the number of retries 
> when exceptions occur are hardcoded and can be inappropriate depending on the 
> Confluence performances.
> These values should be configurable in the connector's configuration



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CONNECTORS-1678) Confluence v6 - Configuration of retry interval and retry numbers on exceptions

2021-10-22 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1678:
---

 Summary: Confluence v6 - Configuration of retry interval and retry 
numbers on exceptions
 Key: CONNECTORS-1678
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1678
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Confluence connector
Affects Versions: ManifoldCF 2.20
Reporter: Julien Massiera
Assignee: Julien Massiera


Currently the value of the retry interval in ms, and the number of retries when 
exceptions occur are hardcoded and can be inappropriate depending on the 
Confluence performances.

These values should be configurable in the connector's configuration



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1677) Confluence v6 does not crawl empty pages and their children

2021-10-22 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432959#comment-17432959
 ] 

Julien Massiera commented on CONNECTORS-1677:
-

r1894475

> Confluence v6 does not crawl empty pages and their children
> ---
>
> Key: CONNECTORS-1677
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1677
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Confluence connector
>Affects Versions: ManifoldCF 2.20
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
>
> The confluence v6 connector does not crawl empty pages and thus does not 
> crawl their children. Originally it was indicated that it was the only way to 
> detect deleted pages, but currently this is not the case and the connector 
> model is set to "full" anyway so  it makes no sense 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CONNECTORS-1677) Confluence v6 does not crawl empty pages and their children

2021-10-22 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1677:
---

 Summary: Confluence v6 does not crawl empty pages and their 
children
 Key: CONNECTORS-1677
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1677
 Project: ManifoldCF
  Issue Type: Bug
  Components: Confluence connector
Affects Versions: ManifoldCF 2.20
Reporter: Julien Massiera
Assignee: Julien Massiera


The confluence v6 connector does not crawl empty pages and thus does not crawl 
their children. Originally it was indicated that it was the only way to detect 
deleted pages, but currently this is not the case and the connector model is 
set to "full" anyway so  it makes no sense 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CONNECTORS-1675) Unable to delete Mapping Connections via JSON API

2021-10-19 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1675:
---

 Summary: Unable to delete Mapping Connections via JSON API
 Key: CONNECTORS-1675
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1675
 Project: ManifoldCF
  Issue Type: Bug
  Components: API
Affects Versions: ManifoldCF 2.20
Reporter: Julien Massiera


The DELETE action via the JSON API 
mappingconnections/__ does not seem to work. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1671) Solr output connector behavior on some exceptions

2021-09-08 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411817#comment-17411817
 ] 

Julien Massiera commented on CONNECTORS-1671:
-

So, [~kwri...@metacarta.com] any news about integrating this patch ? Let me 
know if my explanations are not clear, I'm at your disposal

> Solr output connector behavior on some exceptions
> -
>
> Key: CONNECTORS-1671
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1671
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.19
>Reporter: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF next
>
> Attachments: patch-CONNECTORS-1671.txt
>
>
> In the « handleIOException » method of the « HttpPoster » class, the unknown 
> case triggers a job failure despite the exception can only concern the 
> document/action itself and not a problem with a potential "Solr down" issue 
> (all "Solr down" issues are handled upstream)
> Same thing in the « handleSolrServerException » method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1671) Solr output connector behavior on some exceptions

2021-07-31 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390958#comment-17390958
 ] 

Julien Massiera commented on CONNECTORS-1671:
-

[~kwri...@metacarta.com], it is a runtime exception that occurs on Solr side, 
not on connector side, and that is a cause (child) of a SolrServerException ! 
Furthermore, I did not create the handleRuntimeException method, it was already 
there, I am just using it in another place where it can happen, for the same 
reason described in the method Javadoc 
{code:java}
/** Handle a SolrServerException. * These exceptions seem to be catch-all 
exceptions having to do with misconfiguration or * underlying IO exceptions, or 
request parsing exceptions. * If this method doesn't throw an exception, it 
means that the exception should be interpreted * as meaning that the document 
or action is illegal and should not be repeated. */{code}
I have just updated the Javadoc to add "or request parsing exceptions" because 
this is what happens. 
{code:java}
 {code}
 

> Solr output connector behavior on some exceptions
> -
>
> Key: CONNECTORS-1671
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1671
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.19
>Reporter: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF next
>
> Attachments: patch-CONNECTORS-1671.txt
>
>
> In the « handleIOException » method of the « HttpPoster » class, the unknown 
> case triggers a job failure despite the exception can only concern the 
> document/action itself and not a problem with a potential "Solr down" issue 
> (all "Solr down" issues are handled upstream)
> Same thing in the « handleSolrServerException » method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1671) Solr output connector behavior on some exceptions

2021-07-30 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390553#comment-17390553
 ] 

Julien Massiera commented on CONNECTORS-1671:
-

[~kwri...@metacarta.com], is the patch ok for you ? 

> Solr output connector behavior on some exceptions
> -
>
> Key: CONNECTORS-1671
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1671
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.19
>Reporter: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF next
>
> Attachments: patch-CONNECTORS-1671.txt
>
>
> In the « handleIOException » method of the « HttpPoster » class, the unknown 
> case triggers a job failure despite the exception can only concern the 
> document/action itself and not a problem with a potential "Solr down" issue 
> (all "Solr down" issues are handled upstream)
> Same thing in the « handleSolrServerException » method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (CONNECTORS-1661) Admin UI does not handle UTF8 passwords

2021-07-15 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera updated CONNECTORS-1661:

Comment: was deleted

(was: Bonjour,
Je suis actuellement absent. Je serai de retour le lundi 22 février 2021. Pour 
toute question, merci d'écrire à l'email suivant: cedric [point] Ulmer [att] 
francelabs [point] com

Cordialement,
Julien Massiera
+

Hi,
I will be out of office until Sunday Feb 21st included. For any question, 
please contact cedric [point] Ulmer [att] francelabs [dot] com
)

> Admin UI does not handle UTF8 passwords
> ---
>
> Key: CONNECTORS-1661
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1661
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Kishore Kumar
>Priority: Critical
> Fix For: ManifoldCF 2.19
>
> Attachments: patch-CONNECTORS-1661.txt
>
>
> Setting UTF-8 non alphanumeric characters in the password for the admin user 
> does not work when obfuscating the password and setting it through the 
> org.apache.manifoldcf.login.password.obfuscated parameter of the 
> properties.xml file.
> Alphanumeric characters work well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CONNECTORS-1667) New Tika Service Connector

2021-05-04 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339141#comment-17339141
 ] 

Julien Massiera edited comment on CONNECTORS-1667 at 5/4/21, 4:50 PM:
--

R1889497 on branch CONNECTORS-1667


was (Author: julienfl):
R1889497 on branch CONNECTORS-1667

> New Tika Service Connector
> --
>
> Key: CONNECTORS-1667
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1667
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Tika service connector
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
>
> The current Tika Service Connector exploits the '/unpack/all' endpoint of a 
> Tika Server. This endpoint is not optimal to only extract document's metadata 
> and content.  We should develop a new connector based on the 'rmeta' endpoint 
> which is more suited for our needs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1667) New Tika Service Connector

2021-05-04 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339141#comment-17339141
 ] 

Julien Massiera commented on CONNECTORS-1667:
-

R1889497 on branch CONNECTORS-1667

> New Tika Service Connector
> --
>
> Key: CONNECTORS-1667
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1667
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Tika service connector
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
>
> The current Tika Service Connector exploits the '/unpack/all' endpoint of a 
> Tika Server. This endpoint is not optimal to only extract document's metadata 
> and content.  We should develop a new connector based on the 'rmeta' endpoint 
> which is more suited for our needs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CONNECTORS-1667) New Tika Service Connector

2021-05-04 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1667:
---

 Summary: New Tika Service Connector
 Key: CONNECTORS-1667
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1667
 Project: ManifoldCF
  Issue Type: New Feature
  Components: Tika service connector
Reporter: Julien Massiera
Assignee: Julien Massiera


The current Tika Service Connector exploits the '/unpack/all' endpoint of a 
Tika Server. This endpoint is not optimal to only extract document's metadata 
and content.  We should develop a new connector based on the 'rmeta' endpoint 
which is more suited for our needs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1665) WebConnector: Add activity records for excluded URLs

2021-03-10 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298820#comment-17298820
 ] 

Julien Massiera commented on CONNECTORS-1665:
-

Patch available

> WebConnector: Add activity records for excluded URLs 
> -
>
> Key: CONNECTORS-1665
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1665
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.18
>Reporter: Julien Massiera
>Priority: Trivial
> Fix For: ManifoldCF 2.19
>
> Attachments: patch-CONNECTORS-1665
>
>
> It would be interesting to add activity records in the WebConnector to keep 
> track of excluded URLs that match an exclude filter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CONNECTORS-1665) WebConnector: Add activity records for excluded URLs

2021-03-10 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1665:
---

 Summary: WebConnector: Add activity records for excluded URLs 
 Key: CONNECTORS-1665
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1665
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Web connector
Affects Versions: ManifoldCF 2.18
Reporter: Julien Massiera


It would be interesting to add activity records in the WebConnector to keep 
track of excluded URLs that match an exclude filter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (CONNECTORS-1656) HTML extractor produces invalid XML

2021-02-24 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera updated CONNECTORS-1656:

Comment: was deleted

(was: Bonjour,
Je suis actuellement absent. Je serai de retour le lundi 22 février 2021. Pour 
toute question, merci d'écrire à l'email suivant: cedric [point] Ulmer [att] 
francelabs [point] com

Cordialement,
Julien Massiera
+

Hi,
I will be out of office until Sunday Feb 21st included. For any question, 
please contact cedric [point] Ulmer [att] francelabs [dot] com
)

> HTML extractor produces invalid XML
> ---
>
> Key: CONNECTORS-1656
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: HTML extractor
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.19
>
> Attachments: patch-CONNECTORS-1656
>
>
> The HTML extractor connector produces valid HTML doc (when the 'Strip HTML' 
> option is disabled) but invalid XML (some tags like img do not have closing 
> tag), and in some cases it is problematic. For example, when Tika is used 
> behind, it processes the document as an XML document and most of the time a 
> parse exception is raised.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1656) HTML extractor produces invalid XML

2021-02-23 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289516#comment-17289516
 ] 

Julien Massiera commented on CONNECTORS-1656:
-

Bonjour,
Je suis actuellement absent. Je serai de retour le lundi 22 février 2021. Pour 
toute question, merci d'écrire à l'email suivant: cedric [point] Ulmer [att] 
francelabs [point] com

Cordialement,
Julien Massiera
+

Hi,
I will be out of office until Sunday Feb 21st included. For any question, 
please contact cedric [point] Ulmer [att] francelabs [dot] com


> HTML extractor produces invalid XML
> ---
>
> Key: CONNECTORS-1656
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: HTML extractor
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.19
>
> Attachments: patch-CONNECTORS-1656
>
>
> The HTML extractor connector produces valid HTML doc (when the 'Strip HTML' 
> option is disabled) but invalid XML (some tags like img do not have closing 
> tag), and in some cases it is problematic. For example, when Tika is used 
> behind, it processes the document as an XML document and most of the time a 
> parse exception is raised.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1661) Admin UI does not handle UTF8 passwords

2021-02-23 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289559#comment-17289559
 ] 

Julien Massiera commented on CONNECTORS-1661:
-

Bonjour,
Je suis actuellement absent. Je serai de retour le lundi 22 février 2021. Pour 
toute question, merci d'écrire à l'email suivant: cedric [point] Ulmer [att] 
francelabs [point] com

Cordialement,
Julien Massiera
+

Hi,
I will be out of office until Sunday Feb 21st included. For any question, 
please contact cedric [point] Ulmer [att] francelabs [dot] com


> Admin UI does not handle UTF8 passwords
> ---
>
> Key: CONNECTORS-1661
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1661
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Kishore Kumar
>Priority: Critical
> Fix For: ManifoldCF 2.19
>
> Attachments: patch-CONNECTORS-1661.txt
>
>
> Setting UTF-8 non alphanumeric characters in the password for the admin user 
> does not work when obfuscating the password and setting it through the 
> org.apache.manifoldcf.login.password.obfuscated parameter of the 
> properties.xml file.
> Alphanumeric characters work well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1656) HTML extractor produces invalid XML

2021-02-12 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283832#comment-17283832
 ] 

Julien Massiera commented on CONNECTORS-1656:
-

[~kwri...@metacarta.com], is the patch ok ? 

> HTML extractor produces invalid XML
> ---
>
> Key: CONNECTORS-1656
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: HTML extractor
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF next
>
> Attachments: patch-CONNECTORS-1656
>
>
> The HTML extractor connector produces valid HTML doc (when the 'Strip HTML' 
> option is disabled) but invalid XML (some tags like img do not have closing 
> tag), and in some cases it is problematic. For example, when Tika is used 
> behind, it processes the document as an XML document and most of the time a 
> parse exception is raised.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1661) Admin UI does not handle UTF8 passwords

2021-02-12 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283831#comment-17283831
 ] 

Julien Massiera commented on CONNECTORS-1661:
-

[~kwri...@metacarta.com] and [~kishorekumar], is the patch ok for you ? 

> Admin UI does not handle UTF8 passwords
> ---
>
> Key: CONNECTORS-1661
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1661
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Kishore Kumar
>Priority: Critical
> Fix For: ManifoldCF 2.19
>
> Attachments: patch-CONNECTORS-1661.txt
>
>
> Setting UTF-8 non alphanumeric characters in the password for the admin user 
> does not work when obfuscating the password and setting it through the 
> org.apache.manifoldcf.login.password.obfuscated parameter of the 
> properties.xml file.
> Alphanumeric characters work well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CONNECTORS-1662) JIRA connector - NullPointerException after getCharSet method

2021-01-29 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1662:
---

 Summary: JIRA connector - NullPointerException after getCharSet 
method
 Key: CONNECTORS-1662
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1662
 Project: ManifoldCF
  Issue Type: Bug
  Components: JIRA connector
Affects Versions: ManifoldCF 2.17
Reporter: Julien Massiera


Sometimes the following exception is triggered on some documents during crawl:


{code:java}
Error tossed: charsetjava.lang.NullPointerException: charset    at 
java.io.InputStreamReader.(InputStreamReader.java:115) ~[?:?]    at 
org.apache.manifoldcf.crawler.connectors.jira.JiraSession.convertToString(JiraSession.java:183)
 ~[?:?]    at 
org.apache.manifoldcf.crawler.connectors.jira.JiraSession.getRest(JiraSession.java:237)
 ~[?:?]    at 
org.apache.manifoldcf.crawler.connectors.jira.JiraSession.getIssue(JiraSession.java:317)
 ~[?:?]    at 
org.apache.manifoldcf.crawler.connectors.jira.JiraRepositoryConnector$GetIssueThread.run(JiraRepositoryConnector.java:1409)
 ~[?:?]
{code}
After investigations it appears that the getCharSet method of the JiraSession 
class may return null charset when it is null (no check) or a 
UnsupportedCharsetException happens 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1661) Admin UI does not handle UTF8 passwords

2021-01-26 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272131#comment-17272131
 ] 

Julien Massiera commented on CONNECTORS-1661:
-

Hi [~kishorekumar], did you make progress on this issue ? Or are you still in 
need of additional information ? 

> Admin UI does not handle UTF8 passwords
> ---
>
> Key: CONNECTORS-1661
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1661
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Kishore Kumar
>Priority: Critical
>
> Setting UTF-8 non alphanumeric characters in the password for the admin user 
> does not work when obfuscating the password and setting it through the 
> org.apache.manifoldcf.login.password.obfuscated parameter of the 
> properties.xml file.
> Alphanumeric characters work well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CONNECTORS-1661) Admin UI does not handle UTF8 passwords

2021-01-04 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1661:
---

 Summary: Admin UI does not handle UTF8 passwords
 Key: CONNECTORS-1661
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1661
 Project: ManifoldCF
  Issue Type: Bug
  Components: API
Affects Versions: ManifoldCF 2.17
Reporter: Julien Massiera


Setting UTF-8 non alphanumeric characters in the password for the admin user 
does not work when obfuscating the password and setting it through the 
org.apache.manifoldcf.login.password.obfuscated parameter of the properties.xml 
file.

Alphanumeric characters work well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1657) Web connector - Handle sitemap instruction in robot.txt

2020-10-22 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219079#comment-17219079
 ] 

Julien Massiera commented on CONNECTORS-1657:
-

Yes a warning in the log but an ERROR in the simple history. We should at least 
change the return code of the activity don't you agree ? 

> Web connector - Handle sitemap instruction in robot.txt
> ---
>
> Key: CONNECTORS-1657
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1657
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Priority: Major
>
> Currently the web connector does not understand when the robot.txt file 
> points a sitemap. As an example, for the site 
> [https://www.persee.fr,|https://www.persee.fr%2C/] in the simple history one 
> can find the following error:
> Unknown robots.txt line: 'Sitemap: [https://www.persee.fr/sitemap.xml']
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CONNECTORS-1657) Web connector - Handle sitemap instruction in robot.txt

2020-10-22 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1657:
---

 Summary: Web connector - Handle sitemap instruction in robot.txt
 Key: CONNECTORS-1657
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1657
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Web connector
Affects Versions: ManifoldCF 2.17
Reporter: Julien Massiera


Currently the web connector does not understand when the robot.txt file points 
a sitemap. As an example, for the site 
[https://www.persee.fr,|https://www.persee.fr%2C/] in the simple history one 
can find the following error:

Unknown robots.txt line: 'Sitemap: [https://www.persee.fr/sitemap.xml']

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1656) HTML extractor produces invalid XML

2020-10-21 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218358#comment-17218358
 ] 

Julien Massiera commented on CONNECTORS-1656:
-

Hi [~kwri...@metacarta.com],

The document produced identifies itself as XHTML. But even if it was HTML, the 
default HTML parser of Tika uses SAX to parse documents. 
 Here is the configuration of the Tika HTML parser (default configuration): 

HtmlParser

Class: org.apache.tika.parser.html.HtmlParser

Mime Types:

text/html
application/vnd.wap.xhtml+xm
application/x-asp
application/xhtml+xml

So as it handles html and xhtml, the processed files have to be XML valid anyway

> HTML extractor produces invalid XML
> ---
>
> Key: CONNECTORS-1656
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: HTML extractor
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
>
> The HTML extractor connector produces valid HTML doc (when the 'Strip HTML' 
> option is disabled) but invalid XML (some tags like img do not have closing 
> tag), and in some cases it is problematic. For example, when Tika is used 
> behind, it processes the document as an XML document and most of the time a 
> parse exception is raised.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CONNECTORS-1656) HTML extractor produces invalid XML

2020-10-20 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1656:
---

 Summary: HTML extractor produces invalid XML
 Key: CONNECTORS-1656
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
 Project: ManifoldCF
  Issue Type: Bug
  Components: HTML extractor
Affects Versions: ManifoldCF 2.17
Reporter: Julien Massiera


The HTML extractor connector produces valid HTML doc (when the 'Strip HTML' 
option is disabled) but invalid XML (some tags like img do not have closing 
tag), and in some cases it is problematic. For example, when Tika is used 
behind, it processes the document as an XML document and most of the time a 
parse exception is raised.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1655) Web connector - UnsupportedEncodingException utf-8

2020-10-16 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215375#comment-17215375
 ] 

Julien Massiera commented on CONNECTORS-1655:
-

Thanks for the fix !

> Web connector - UnsupportedEncodingException utf-8
> --
>
> Key: CONNECTORS-1655
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
> Fix For: ManifoldCF 2.18
>
>
> When crawling some sites (for instance this one: 
> [http://www.antibes-juanlespins.com/] ) the job manages to index some 
> documents, but the stops with the following error code:
> Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
> Here is one the MCF stacktrace: 
> Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
> filename=rseventspro_rss20_56.xml
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> Caused by: java.io.UnsupportedEncodingException: utf-8; 
> filename=rseventspro_rss20_56.xml
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
> ~[?:1.8.0_212]
> at java.io.InputStreamReader.(InputStreamReader.java:100) ~[?:1.8.0_212]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
>  ~[?:?]
> ... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1655) Web connector - UnsupportedEncodingException utf-8

2020-10-16 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215238#comment-17215238
 ] 

Julien Massiera commented on CONNECTORS-1655:
-

Hi [~kwri...@metacarta.com], I am using offical OpenJDK 11 installed from the 
Debian repo:
openjdk version "11.0.8" 2020-07-14
OpenJDK Runtime Environment 18.9 (build 11.0.8+10)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.8+10, mixed mode)

> Web connector - UnsupportedEncodingException utf-8
> --
>
> Key: CONNECTORS-1655
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Priority: Critical
>
> When crawling some sites (for instance this one: 
> [http://www.antibes-juanlespins.com/] ) the job manages to index some 
> documents, but the stops with the following error code:
> Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
> Here is one the MCF stacktrace: 
> Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
> filename=rseventspro_rss20_56.xml
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> Caused by: java.io.UnsupportedEncodingException: utf-8; 
> filename=rseventspro_rss20_56.xml
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
> ~[?:1.8.0_212]
> at java.io.InputStreamReader.(InputStreamReader.java:100) ~[?:1.8.0_212]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
>  ~[?:?]
> ... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CONNECTORS-1655) Web connector - UnsupportedEncodingException utf-8

2020-10-15 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1655:
---

 Summary: Web connector - UnsupportedEncodingException utf-8
 Key: CONNECTORS-1655
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
 Project: ManifoldCF
  Issue Type: Bug
  Components: Web connector
Affects Versions: ManifoldCF 2.17
Reporter: Julien Massiera


When crawling some sites (for instance this one: 
[http://www.antibes-juanlespins.com/] ) the job manages to index some 
documents, but the stops with the following error code:
Error: IO error: utf-8; filename=rseventspro_rss20_56.xml

Here is one the MCF stacktrace: 
Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
filename=rseventspro_rss20_56.xml
at 
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
 ~[?:?]
at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]
Caused by: java.io.UnsupportedEncodingException: utf-8; 
filename=rseventspro_rss20_56.xml
at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
~[?:1.8.0_212]
at java.io.InputStreamReader.(InputStreamReader.java:100) ~[?:1.8.0_212]
at 
org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
 ~[?:?]
at 
org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
 ~[?:?]
at 
org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
 ~[?:?]
at 
org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
 ~[?:?]
... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1105) Add maven delivery targets to poms

2020-06-24 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143855#comment-17143855
 ] 

Julien Massiera commented on CONNECTORS-1105:
-

[~kwri...@metacarta.com], [~schuch] any news about this ticket ? I am really 
interested to have at least the MCF jars pushed to the maven central repo

> Add maven delivery targets to poms
> --
>
> Key: CONNECTORS-1105
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1105
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: ManifoldCF 1.8
>Reporter: Karl Wright
>Assignee: Markus Schuch
>Priority: Major
> Fix For: ManifoldCF next
>
>
> We've been asked to deliver mcf jars and wars to maven central repository by 
> some developers.  This ticket represents that work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CONNECTORS-1637) New Confluence connector

2020-06-11 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1637.
-
Fix Version/s: ManifoldCF 2.16
   Resolution: Fixed

> New Confluence connector
> 
>
> Key: CONNECTORS-1637
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1637
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Confluence connector
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.16
>
>
> We need to address 3 main issues of the current Confluence connector :
> - it does not correctly implements the security 
> - it has performance problems when handling a huge dataset 
> - it generates a version string for documents that is not sufficient to 
> detect all changes
> To resolve some of these issues, the connector has to use the new confluence 
> API which is available from the v6. For that reason we need to release a new 
> connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CONNECTORS-1645) Identical login regex rules bug

2020-06-11 Thread Julien Massiera (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Massiera resolved CONNECTORS-1645.
-
Fix Version/s: ManifoldCF 2.17
   Resolution: Fixed

> Identical login regex rules bug
> ---
>
> Key: CONNECTORS-1645
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1645
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
> Fix For: ManifoldCF 2.17
>
>
> If a login sequence implies the same URL for different login types (ex: form 
> and redirect), you can't configure the same regex for each of them otherwise 
> they will override each other and only the last configured one will be 
> considered by the login sequence. 
> Currently the only workaround is to make a different regex for each login 
> type that matches the same URL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1645) Identical login regex rules bug

2020-06-10 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130915#comment-17130915
 ] 

Julien Massiera commented on CONNECTORS-1645:
-

[~kwri...@metacarta.com] the method "findNextOne" in the LoginParameterIterator 
class has a problem, it always returns the same currentOne so the "hasNext" 
method always returns "true". It results in an endless loop and I suppose this 
explains why the Unit tests of this connector never end. 

> Identical login regex rules bug
> ---
>
> Key: CONNECTORS-1645
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1645
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
>
> If a login sequence implies the same URL for different login types (ex: form 
> and redirect), you can't configure the same regex for each of them otherwise 
> they will override each other and only the last configured one will be 
> considered by the login sequence. 
> Currently the only workaround is to make a different regex for each login 
> type that matches the same URL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CONNECTORS-1645) Identical login regex rules bug

2020-06-02 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1645:
---

 Summary: Identical login regex rules bug
 Key: CONNECTORS-1645
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1645
 Project: ManifoldCF
  Issue Type: Bug
  Components: Web connector
Affects Versions: ManifoldCF 2.12
Reporter: Julien Massiera


If a login sequence implies the same URL for different login types (ex: form 
and redirect), you can't configure the same regex for each of them otherwise 
they will override each other and only the last configured one will be 
considered by the login sequence. 

Currently the only workaround is to make a different regex for each login type 
that matches the same URL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1637) New Confluence connector

2020-03-06 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053598#comment-17053598
 ] 

Julien Massiera commented on CONNECTORS-1637:
-

[~kwri...@metacarta.com] I fixed the ant build. The branch is done ! 

> New Confluence connector
> 
>
> Key: CONNECTORS-1637
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1637
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Confluence connector
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
>
> We need to address 3 main issues of the current Confluence connector :
> - it does not correctly implements the security 
> - it has performance problems when handling a huge dataset 
> - it generates a version string for documents that is not sufficient to 
> detect all changes
> To resolve some of these issues, the connector has to use the new confluence 
> API which is available from the v6. For that reason we need to release a new 
> connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >