[jira] [Commented] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-04-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094735#comment-17094735 ] Tim Allison commented on TIKA-3094: --- Thank you, [~bob]! > Apache Tika fails to extract text for pptx

[jira] [Commented] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-04-28 Thread Abhishek Chauhan (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094638#comment-17094638 ] Abhishek Chauhan commented on TIKA-3094: Glad ! Thanks for sharing this [~bob].  > Apache Tika

[jira] [Commented] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-04-28 Thread Bob Paulin (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094614#comment-17094614 ] Bob Paulin commented on TIKA-3094: -- Thanks [~abchauha] .  The build process adds OSGi specific headers so

[jira] [Commented] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-04-28 Thread Abhishek Chauhan (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094594#comment-17094594 ] Abhishek Chauhan commented on TIKA-3094: [~bob] Please find the .pptx file attached.  Just would

[jira] [Updated] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-04-28 Thread Abhishek Chauhan (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Chauhan updated TIKA-3094: --- Attachment: Sample PPT.pptx > Apache Tika fails to extract text for pptx extension. >

[jira] [Commented] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-04-28 Thread Bob Paulin (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094517#comment-17094517 ] Bob Paulin commented on TIKA-3094: -- If SparseBitSet is embedded in the tika-bundle that the library

[jira] [Commented] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-04-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094466#comment-17094466 ] Tim Allison commented on TIKA-3094: --- [~bobpaulin], is this something we can fix within Tika or do we

[jira] [Commented] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-04-28 Thread Abhishek Chauhan (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094407#comment-17094407 ] Abhishek Chauhan commented on TIKA-3094: [~tallison] We are calling using OSGI bundle. Also, the

[jira] [Commented] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-04-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094385#comment-17094385 ] Tim Allison commented on TIKA-3094: --- How are you calling Tika? Are you using the osgi bundle or calling

Sv: Issue with > 200% CPU after bulk usage

2020-04-28 Thread hans.meijer
Hi I ran in to the issue again with Tika/Java taking more CPU, up to 200+ CPU%. The scenario is that i have 3-4 long running processes calling Tika server (Version 1.24) and occassionaly 3-4 additional shorter processes (2-3 hours) starts up and calls the Tika server. The scenario is being run

[jira] [Updated] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-04-28 Thread Abhishek Chauhan (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Chauhan updated TIKA-3094: --- Description: This is regressed from 1.23 version of Apache Tika. Text extraction for .pptx

[jira] [Created] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-04-28 Thread Abhishek Chauhan (Jira)
Abhishek Chauhan created TIKA-3094: -- Summary: Apache Tika fails to extract text for pptx extension. Key: TIKA-3094 URL: https://issues.apache.org/jira/browse/TIKA-3094 Project: Tika Issue