[jira] [Commented] (CONNECTORS-1667) New Tika Service Connector

2022-06-08 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551454#comment-17551454
 ] 

Julien Massiera commented on CONNECTORS-1667:
-

Hi [~cguzel], no the Tika service connector does not correctly handle Tika 
server 2.x because of the metadata keys indeed. You should consider using the 
tika-service-rmeta-connector instead which is better in terms of performances 
and stability, and has been updated to be compatible with the latest version of 
Tika Server (see CONNECTORS-1703)

I am currently only maintaining that version of tika service connector by the 
way, because as you said, the maintenance cost is very limited, and having an 
external Tika instead of an embedded one is more reliable.

 

> New Tika Service Connector
> --
>
> Key: CONNECTORS-1667
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1667
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Tika service connector
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.20
>
>
> The current Tika Service Connector exploits the '/unpack/all' endpoint of a 
> Tika Server. This endpoint is not optimal to only extract document's metadata 
> and content.  We should develop a new connector based on the 'rmeta' endpoint 
> which is more suited for our needs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (CONNECTORS-1667) New Tika Service Connector

2022-06-04 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17547939#comment-17547939
 ] 

Karl Wright commented on CONNECTORS-1667:
-

[~cguzel], this ticket is about an EXTERNAL service where Tika runs as a 
separate stand-alone process, and the connector communicates to it.  I don't 
think there is any difference from a service standpoint whether you run Tika 
1.x or 2.x as that service - the protocol is likely the same, although I 
haven't researched it.

What you seem to be thinking is that the internal Tika connector should go to 
Tika 2.0.   This is a major, major deal because most of the connector 
dependencies we have to update are due to Tika.  I looked at it and found we'd 
need 4-5 weeks of a dedicated individual to do the port.  Are you volunteering? 
 If so I can advise you.  Otherwise we will be staying current with Tika 1.x 
releases for now, and that is all.


> New Tika Service Connector
> --
>
> Key: CONNECTORS-1667
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1667
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Tika service connector
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.20
>
>
> The current Tika Service Connector exploits the '/unpack/all' endpoint of a 
> Tika Server. This endpoint is not optimal to only extract document's metadata 
> and content.  We should develop a new connector based on the 'rmeta' endpoint 
> which is more suited for our needs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (CONNECTORS-1667) New Tika Service Connector

2022-06-03 Thread Cihad Guzel (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17547201#comment-17547201
 ] 

Cihad Guzel commented on CONNECTORS-1667:
-

It looks like some changes are needed. 
https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0

> New Tika Service Connector
> --
>
> Key: CONNECTORS-1667
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1667
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Tika service connector
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.20
>
>
> The current Tika Service Connector exploits the '/unpack/all' endpoint of a 
> Tika Server. This endpoint is not optimal to only extract document's metadata 
> and content.  We should develop a new connector based on the 'rmeta' endpoint 
> which is more suited for our needs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (CONNECTORS-1667) New Tika Service Connector

2022-06-03 Thread Cihad Guzel (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17547200#comment-17547200
 ] 

Cihad Guzel commented on CONNECTORS-1667:
-

Hi Jullien. 

Does this service support Tika2x?

> New Tika Service Connector
> --
>
> Key: CONNECTORS-1667
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1667
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Tika service connector
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.20
>
>
> The current Tika Service Connector exploits the '/unpack/all' endpoint of a 
> Tika Server. This endpoint is not optimal to only extract document's metadata 
> and content.  We should develop a new connector based on the 'rmeta' endpoint 
> which is more suited for our needs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (CONNECTORS-1667) New Tika Service Connector

2021-05-04 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339141#comment-17339141
 ] 

Julien Massiera commented on CONNECTORS-1667:
-

R1889497 on branch CONNECTORS-1667

> New Tika Service Connector
> --
>
> Key: CONNECTORS-1667
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1667
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Tika service connector
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
>
> The current Tika Service Connector exploits the '/unpack/all' endpoint of a 
> Tika Server. This endpoint is not optimal to only extract document's metadata 
> and content.  We should develop a new connector based on the 'rmeta' endpoint 
> which is more suited for our needs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)