[
https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094735#comment-17094735
]
Tim Allison commented on TIKA-3094:
---
Thank you, [~bob]!
> Apache Tika fails to extract text for pptx
[
https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094638#comment-17094638
]
Abhishek Chauhan commented on TIKA-3094:
Glad ! Thanks for sharing this [~bob].
> Apache Tika
[
https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094614#comment-17094614
]
Bob Paulin commented on TIKA-3094:
--
Thanks [~abchauha] . The build process adds OSGi specific headers so
[
https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094594#comment-17094594
]
Abhishek Chauhan commented on TIKA-3094:
[~bob] Please find the .pptx file attached.
Just would
[
https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abhishek Chauhan updated TIKA-3094:
---
Attachment: Sample PPT.pptx
> Apache Tika fails to extract text for pptx extension.
>
[
https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094517#comment-17094517
]
Bob Paulin commented on TIKA-3094:
--
If SparseBitSet is embedded in the tika-bundle that the library
[
https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094466#comment-17094466
]
Tim Allison commented on TIKA-3094:
---
[~bobpaulin], is this something we can fix within Tika or do we
[
https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094407#comment-17094407
]
Abhishek Chauhan commented on TIKA-3094:
[~tallison] We are calling using OSGI bundle.
Also, the
[
https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094385#comment-17094385
]
Tim Allison commented on TIKA-3094:
---
How are you calling Tika? Are you using the osgi bundle or calling
Hi
I ran in to the issue again with Tika/Java taking more CPU, up to 200+ CPU%.
The scenario is that i have 3-4 long running processes calling Tika server
(Version 1.24) and occassionaly 3-4 additional shorter processes (2-3 hours)
starts up and calls the Tika server.
The scenario is being run
[
https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abhishek Chauhan updated TIKA-3094:
---
Description:
This is regressed from 1.23 version of Apache Tika. Text extraction for .pptx
Abhishek Chauhan created TIKA-3094:
--
Summary: Apache Tika fails to extract text for pptx extension.
Key: TIKA-3094
URL: https://issues.apache.org/jira/browse/TIKA-3094
Project: Tika
Issue
12 matches
Mail list logo