[jira] [Commented] (CONNECTORS-1667) New Tika Service Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551454#comment-17551454 ] Julien Massiera commented on CONNECTORS-1667: - Hi [~cguzel], no the Tika service connector does not correctly handle Tika server 2.x because of the metadata keys indeed. You should consider using the tika-service-rmeta-connector instead which is better in terms of performances and stability, and has been updated to be compatible with the latest version of Tika Server (see CONNECTORS-1703) I am currently only maintaining that version of tika service connector by the way, because as you said, the maintenance cost is very limited, and having an external Tika instead of an embedded one is more reliable. > New Tika Service Connector > -- > > Key: CONNECTORS-1667 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1667 > Project: ManifoldCF > Issue Type: New Feature > Components: Tika service connector >Reporter: Julien Massiera >Assignee: Julien Massiera >Priority: Major > Fix For: ManifoldCF 2.20 > > > The current Tika Service Connector exploits the '/unpack/all' endpoint of a > Tika Server. This endpoint is not optimal to only extract document's metadata > and content. We should develop a new connector based on the 'rmeta' endpoint > which is more suited for our needs. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (CONNECTORS-1667) New Tika Service Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17547939#comment-17547939 ] Karl Wright commented on CONNECTORS-1667: - [~cguzel], this ticket is about an EXTERNAL service where Tika runs as a separate stand-alone process, and the connector communicates to it. I don't think there is any difference from a service standpoint whether you run Tika 1.x or 2.x as that service - the protocol is likely the same, although I haven't researched it. What you seem to be thinking is that the internal Tika connector should go to Tika 2.0. This is a major, major deal because most of the connector dependencies we have to update are due to Tika. I looked at it and found we'd need 4-5 weeks of a dedicated individual to do the port. Are you volunteering? If so I can advise you. Otherwise we will be staying current with Tika 1.x releases for now, and that is all. > New Tika Service Connector > -- > > Key: CONNECTORS-1667 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1667 > Project: ManifoldCF > Issue Type: New Feature > Components: Tika service connector >Reporter: Julien Massiera >Assignee: Julien Massiera >Priority: Major > Fix For: ManifoldCF 2.20 > > > The current Tika Service Connector exploits the '/unpack/all' endpoint of a > Tika Server. This endpoint is not optimal to only extract document's metadata > and content. We should develop a new connector based on the 'rmeta' endpoint > which is more suited for our needs. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (CONNECTORS-1667) New Tika Service Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17547201#comment-17547201 ] Cihad Guzel commented on CONNECTORS-1667: - It looks like some changes are needed. https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0 > New Tika Service Connector > -- > > Key: CONNECTORS-1667 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1667 > Project: ManifoldCF > Issue Type: New Feature > Components: Tika service connector >Reporter: Julien Massiera >Assignee: Julien Massiera >Priority: Major > Fix For: ManifoldCF 2.20 > > > The current Tika Service Connector exploits the '/unpack/all' endpoint of a > Tika Server. This endpoint is not optimal to only extract document's metadata > and content. We should develop a new connector based on the 'rmeta' endpoint > which is more suited for our needs. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (CONNECTORS-1667) New Tika Service Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17547200#comment-17547200 ] Cihad Guzel commented on CONNECTORS-1667: - Hi Jullien. Does this service support Tika2x? > New Tika Service Connector > -- > > Key: CONNECTORS-1667 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1667 > Project: ManifoldCF > Issue Type: New Feature > Components: Tika service connector >Reporter: Julien Massiera >Assignee: Julien Massiera >Priority: Major > Fix For: ManifoldCF 2.20 > > > The current Tika Service Connector exploits the '/unpack/all' endpoint of a > Tika Server. This endpoint is not optimal to only extract document's metadata > and content. We should develop a new connector based on the 'rmeta' endpoint > which is more suited for our needs. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (CONNECTORS-1667) New Tika Service Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339141#comment-17339141 ] Julien Massiera commented on CONNECTORS-1667: - R1889497 on branch CONNECTORS-1667 > New Tika Service Connector > -- > > Key: CONNECTORS-1667 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1667 > Project: ManifoldCF > Issue Type: New Feature > Components: Tika service connector >Reporter: Julien Massiera >Assignee: Julien Massiera >Priority: Major > > The current Tika Service Connector exploits the '/unpack/all' endpoint of a > Tika Server. This endpoint is not optimal to only extract document's metadata > and content. We should develop a new connector based on the 'rmeta' endpoint > which is more suited for our needs. -- This message was sent by Atlassian Jira (v8.3.4#803005)