[jira] [Commented] (TIKA-3097) Out of memory while parsing docx

2020-05-07 Thread suchendra (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102248#comment-17102248 ] suchendra commented on TIKA-3097: - Adding one more file samplefile.txt same issue OOM (not

[jira] [Updated] (TIKA-3097) Out of memory while parsing docx

2020-05-07 Thread suchendra (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] suchendra updated TIKA-3097: Attachment: samplefile.txt > Out of memory while parsing docx > > >

[jira] [Commented] (TIKA-3098) Detecting embedded image

2020-05-07 Thread suchendra (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102246#comment-17102246 ] suchendra commented on TIKA-3098: - Thank you [~tallison], will look into it. > Detecting

[jira] [Commented] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-05-07 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102182#comment-17102182 ] Hudson commented on TIKA-3094: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1813 (See [

[jira] [Comment Edited] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-05-07 Thread Bob Paulin (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102137#comment-17102137 ] Bob Paulin edited comment on TIKA-3094 at 5/8/20, 1:02 AM: --- Look

[jira] [Comment Edited] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-05-07 Thread Bob Paulin (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102137#comment-17102137 ] Bob Paulin edited comment on TIKA-3094 at 5/8/20, 1:02 AM: --- Look

[jira] [Commented] (TIKA-3094) Apache Tika fails to extract text for pptx extension.

2020-05-07 Thread Bob Paulin (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102137#comment-17102137 ] Bob Paulin commented on TIKA-3094: -- Looks like the jaxb error is not so much an issue wit

[jira] [Commented] (TIKA-3098) Detecting embedded image

2020-05-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101828#comment-17101828 ] Tim Allison commented on TIKA-3098: --- This is a pretty good example of using the Recursiv

[jira] [Comment Edited] (TIKA-3098) Detecting embedded image

2020-05-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101828#comment-17101828 ] Tim Allison edited comment on TIKA-3098 at 5/7/20, 4:01 PM: Th

[jira] [Commented] (TIKA-3097) Out of memory while parsing docx

2020-05-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101826#comment-17101826 ] Tim Allison commented on TIKA-3097: --- Sorry, I commented too soon. After more than a cou

[jira] [Comment Edited] (TIKA-3098) Detecting embedded image

2020-05-07 Thread suchendra (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101792#comment-17101792 ] suchendra edited comment on TIKA-3098 at 5/7/20, 3:43 PM: -- Where

[jira] [Commented] (TIKA-3097) Out of memory while parsing docx

2020-05-07 Thread suchendra (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101794#comment-17101794 ] suchendra commented on TIKA-3097: - Even I tried opening in microsoft doc, that took almost

[jira] [Commented] (TIKA-3098) Detecting embedded image

2020-05-07 Thread suchendra (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101792#comment-17101792 ] suchendra commented on TIKA-3098: - How do I achieve this in the code ?  > Detecting embed

[jira] [Commented] (TIKA-3097) Out of memory while parsing docx

2020-05-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101612#comment-17101612 ] Tim Allison commented on TIKA-3097: --- java -Xmx128m -jar ~/Downloads/tika-app-1.24.jar --

[jira] [Commented] (TIKA-3097) Out of memory while parsing docx

2020-05-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101608#comment-17101608 ] Tim Allison commented on TIKA-3097: --- LibreOffice doesn't like this file... :( > Out of

[jira] [Updated] (TIKA-3097) Out of memory while parsing docx

2020-05-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3097: -- Attachment: Screenshot from 2020-05-07 08-14-25.png > Out of memory while parsing docx > ---

[jira] [Comment Edited] (TIKA-3098) Detecting embedded image

2020-05-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101588#comment-17101588 ] Tim Allison edited comment on TIKA-3098 at 5/7/20, 11:55 AM: -

[jira] [Commented] (TIKA-3098) Detecting embedded image

2020-05-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101588#comment-17101588 ] Tim Allison commented on TIKA-3098: --- There's a thumbnail under docProps. If you use /rm

[jira] [Commented] (TIKA-3097) Out of memory while parsing docx

2020-05-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101584#comment-17101584 ] Tim Allison commented on TIKA-3097: --- Uncompressed, you're looking at ~150MB for the file

[jira] [Closed] (TIKA-3096) detect image in any document

2020-05-07 Thread suchendra (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] suchendra closed TIKA-3096. --- Resolution: Invalid > detect image in any document > > > Key: TIK

[jira] [Created] (TIKA-3098) Detecting embedded image

2020-05-07 Thread suchendra (Jira)
suchendra created TIKA-3098: --- Summary: Detecting embedded image Key: TIKA-3098 URL: https://issues.apache.org/jira/browse/TIKA-3098 Project: Tika Issue Type: Bug Components: parser Af

[jira] [Created] (TIKA-3097) Out of memory while parsing docx

2020-05-07 Thread suchendra (Jira)
suchendra created TIKA-3097: --- Summary: Out of memory while parsing docx Key: TIKA-3097 URL: https://issues.apache.org/jira/browse/TIKA-3097 Project: Tika Issue Type: Bug Components: core,