[jira] [Closed] (TIKA-3202) Tika duplicates the ocr text

2020-09-22 Thread marek kapowicki (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marek kapowicki closed TIKA-3202. - Resolution: Works for Me > Tika duplicates the ocr text > > >

[GitHub] [tika] PeterAlfredLee edited a comment on pull request #356: Attempt to read zips with STORED data descriptors

2020-09-22 Thread GitBox
PeterAlfredLee edited a comment on pull request #356: URL: https://github.com/apache/tika/pull/356#issuecomment-696618151 This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [tika] PeterAlfredLee commented on pull request #356: Attempt to read zips with STORED data descriptors

2020-09-22 Thread GitBox
PeterAlfredLee commented on pull request #356: URL: https://github.com/apache/tika/pull/356#issuecomment-696618151 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[jira] [Comment Edited] (TIKA-3196) PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

2020-09-22 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200448#comment-17200448 ] Peter Lee edited comment on TIKA-3196 at 9/23/20, 2:13 AM: --- Hi [

[jira] [Updated] (TIKA-3203) MP4Parser temporary files are not deleted from Tomcat temp folder

2020-09-22 Thread Isabelle Giguere (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabelle Giguere updated TIKA-3203: --- Description: In our application, Tika is used as part of a Tomcat webapp. Tomcat sets its te

[jira] [Commented] (TIKA-3196) PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

2020-09-22 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200448#comment-17200448 ] Peter Lee commented on TIKA-3196: - Hi [~tallison] I wrote a test here : [https://github.

[jira] [Commented] (TIKA-3202) Tika duplicates the ocr text

2020-09-22 Thread marek kapowicki (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200397#comment-17200397 ] marek kapowicki commented on TIKA-3202: --- ONLY_OCR and no_ocr works fine. But now I c

[jira] [Updated] (TIKA-3203) MP4Parser temporary files are not deleted from Tomcat temp folder

2020-09-22 Thread Isabelle Giguere (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabelle Giguere updated TIKA-3203: --- Description: In our application, Tika is used as part of a Tomcat webapp. Tomcat sets its te

[jira] [Created] (TIKA-3203) MP4Parser temporary files are not deleted from Tomcat temp folder

2020-09-22 Thread Isabelle Giguere (Jira)
Isabelle Giguere created TIKA-3203: -- Summary: MP4Parser temporary files are not deleted from Tomcat temp folder Key: TIKA-3203 URL: https://issues.apache.org/jira/browse/TIKA-3203 Project: Tika

[jira] [Commented] (TIKA-3202) Tika duplicates the ocr text

2020-09-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200395#comment-17200395 ] Tim Allison commented on TIKA-3202: --- If I understand correctly, that's how it is designe

[jira] [Created] (TIKA-3202) Tika duplicates the ocr text

2020-09-22 Thread marek kapowicki (Jira)
marek kapowicki created TIKA-3202: - Summary: Tika duplicates the ocr text Key: TIKA-3202 URL: https://issues.apache.org/jira/browse/TIKA-3202 Project: Tika Issue Type: Bug Affects Version

[GitHub] [tika] PeterAlfredLee edited a comment on pull request #356: Attempt to read zips with STORED data descriptors

2020-09-22 Thread GitBox
PeterAlfredLee edited a comment on pull request #356: URL: https://github.com/apache/tika/pull/356#issuecomment-696721537 I forged a zip archive in memory that uses STORED and Data Descriptor at the same time. This could be easily used as a test case. BTW this PR could not pass this

[GitHub] [tika] PeterAlfredLee commented on pull request #356: Attempt to read zips with STORED data descriptors

2020-09-22 Thread GitBox
PeterAlfredLee commented on pull request #356: URL: https://github.com/apache/tika/pull/356#issuecomment-696721537 I forged a zip archive in memory that uses STORED and Data Descriptor at the same time. This could be easily used as a test case : ``` @Test public void te

[jira] [Commented] (TIKA-3196) PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

2020-09-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200077#comment-17200077 ] Tim Allison commented on TIKA-3196: --- Attaching file from https://bz.apache.org/ooo/show_

[jira] [Updated] (TIKA-3196) PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

2020-09-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3196: -- Attachment: OOO-107047-0.oxt-145.zip > PackageParser should attempt to parse entries from zip files with

[jira] [Commented] (TIKA-3196) PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

2020-09-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200071#comment-17200071 ] Tim Allison commented on TIKA-3196: --- If only we has some way of finding files that trigg

[GitHub] [tika] PeterAlfredLee edited a comment on pull request #356: Attempt to read zips with STORED data descriptors

2020-09-22 Thread GitBox
PeterAlfredLee edited a comment on pull request #356: URL: https://github.com/apache/tika/pull/356#issuecomment-696618151 > Do we have to reset the stream before reprocessing? +1. The stream should be `reset` or `relocation to the beginning of the file`. I think this is complicate

[GitHub] [tika] PeterAlfredLee commented on pull request #356: Attempt to read zips with STORED data descriptors

2020-09-22 Thread GitBox
PeterAlfredLee commented on pull request #356: URL: https://github.com/apache/tika/pull/356#issuecomment-696618151 > Do we have to reset the stream before reprocessing? +1. The stream should be `reset` or `relocation to the beginning of the file`. I think this is complicated here,