[jira] [Updated] (CONNECTORS-1699) Upgrade to Tika 2.x

2022-06-04 Thread Cihad Guzel (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cihad Guzel updated CONNECTORS-1699:

Description: 
Tika has released 2.x. . We can support the new version instead of 1.x  . There 
is a migration document here: 
[https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0]

Changes can be found from here: 
[https://tika.apache.org/2.4.0/|https://tika.apache.org/2.3.0/]

  was:
Tika has a new version as 2.x . We can support the new version instead of 1.x  
. There is a migration document here: 
[https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0]

Tika has released 2.3.0. Changes can be found from here: 
https://tika.apache.org/2.3.0/


> Upgrade to Tika 2.x
> ---
>
> Key: CONNECTORS-1699
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1699
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Tika extractor
>Affects Versions: ManifoldCF 2.21
>Reporter: Cihad Guzel
>Priority: Major
> Fix For: ManifoldCF next
>
>
> Tika has released 2.x. . We can support the new version instead of 1.x  . 
> There is a migration document here: 
> [https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0]
> Changes can be found from here: 
> [https://tika.apache.org/2.4.0/|https://tika.apache.org/2.3.0/]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (CONNECTORS-1667) New Tika Service Connector

2022-06-04 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17547939#comment-17547939
 ] 

Karl Wright commented on CONNECTORS-1667:
-

[~cguzel], this ticket is about an EXTERNAL service where Tika runs as a 
separate stand-alone process, and the connector communicates to it.  I don't 
think there is any difference from a service standpoint whether you run Tika 
1.x or 2.x as that service - the protocol is likely the same, although I 
haven't researched it.

What you seem to be thinking is that the internal Tika connector should go to 
Tika 2.0.   This is a major, major deal because most of the connector 
dependencies we have to update are due to Tika.  I looked at it and found we'd 
need 4-5 weeks of a dedicated individual to do the port.  Are you volunteering? 
 If so I can advise you.  Otherwise we will be staying current with Tika 1.x 
releases for now, and that is all.


> New Tika Service Connector
> --
>
> Key: CONNECTORS-1667
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1667
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Tika service connector
>Reporter: Julien Massiera
>Assignee: Julien Massiera
>Priority: Major
> Fix For: ManifoldCF 2.20
>
>
> The current Tika Service Connector exploits the '/unpack/all' endpoint of a 
> Tika Server. This endpoint is not optimal to only extract document's metadata 
> and content.  We should develop a new connector based on the 'rmeta' endpoint 
> which is more suited for our needs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)