[jira] [Closed] (TIKA-3202) Tika duplicates the ocr text

2020-09-22 Thread marek kapowicki (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marek kapowicki closed TIKA-3202. - Resolution: Works for Me > Tika duplicates the ocr text > > >

[GitHub] [tika] PeterAlfredLee edited a comment on pull request #356: Attempt to read zips with STORED data descriptors

2020-09-22 Thread GitBox
PeterAlfredLee edited a comment on pull request #356: URL: https://github.com/apache/tika/pull/356#issuecomment-696618151 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [tika] PeterAlfredLee commented on pull request #356: Attempt to read zips with STORED data descriptors

2020-09-22 Thread GitBox
PeterAlfredLee commented on pull request #356: URL: https://github.com/apache/tika/pull/356#issuecomment-696618151 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[jira] [Comment Edited] (TIKA-3196) PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

2020-09-22 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200448#comment-17200448 ] Peter Lee edited comment on TIKA-3196 at 9/23/20, 2:13 AM: --- Hi [~tallison] I

[jira] [Updated] (TIKA-3203) MP4Parser temporary files are not deleted from Tomcat temp folder

2020-09-22 Thread Isabelle Giguere (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabelle Giguere updated TIKA-3203: --- Description: In our application, Tika is used as part of a Tomcat webapp. Tomcat sets its

[jira] [Commented] (TIKA-3196) PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

2020-09-22 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200448#comment-17200448 ] Peter Lee commented on TIKA-3196: - Hi [~tallison] I wrote a test here :

[jira] [Commented] (TIKA-3202) Tika duplicates the ocr text

2020-09-22 Thread marek kapowicki (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200397#comment-17200397 ] marek kapowicki commented on TIKA-3202: --- ONLY_OCR and no_ocr works fine. But now I can see how

[jira] [Updated] (TIKA-3203) MP4Parser temporary files are not deleted from Tomcat temp folder

2020-09-22 Thread Isabelle Giguere (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabelle Giguere updated TIKA-3203: --- Description: In our application, Tika is used as part of a Tomcat webapp. Tomcat sets its

[jira] [Created] (TIKA-3203) MP4Parser temporary files are not deleted from Tomcat temp folder

2020-09-22 Thread Isabelle Giguere (Jira)
Isabelle Giguere created TIKA-3203: -- Summary: MP4Parser temporary files are not deleted from Tomcat temp folder Key: TIKA-3203 URL: https://issues.apache.org/jira/browse/TIKA-3203 Project: Tika

[jira] [Created] (TIKA-3202) Tika duplicates the ocr text

2020-09-22 Thread marek kapowicki (Jira)
marek kapowicki created TIKA-3202: - Summary: Tika duplicates the ocr text Key: TIKA-3202 URL: https://issues.apache.org/jira/browse/TIKA-3202 Project: Tika Issue Type: Bug Affects

[GitHub] [tika] PeterAlfredLee commented on pull request #356: Attempt to read zips with STORED data descriptors

2020-09-22 Thread GitBox
PeterAlfredLee commented on pull request #356: URL: https://github.com/apache/tika/pull/356#issuecomment-696721537 I forged a zip archive in memory that uses STORED and Data Descriptor at the same time. This could be easily used as a test case : ``` @Test public void

[jira] [Updated] (TIKA-3196) PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

2020-09-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3196: -- Attachment: OOO-107047-0.oxt-145.zip > PackageParser should attempt to parse entries from zip files

[jira] [Commented] (TIKA-3196) PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

2020-09-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200071#comment-17200071 ] Tim Allison commented on TIKA-3196: --- If only we has some way of finding files that trigger this

[GitHub] [tika] PeterAlfredLee edited a comment on pull request #356: Attempt to read zips with STORED data descriptors

2020-09-22 Thread GitBox
PeterAlfredLee edited a comment on pull request #356: URL: https://github.com/apache/tika/pull/356#issuecomment-696618151 > Do we have to reset the stream before reprocessing? +1. The stream should be `reset` or `relocation to the beginning of the file`. I think this is

[GitHub] [tika] PeterAlfredLee commented on pull request #356: Attempt to read zips with STORED data descriptors

2020-09-22 Thread GitBox
PeterAlfredLee commented on pull request #356: URL: https://github.com/apache/tika/pull/356#issuecomment-696618151 > Do we have to reset the stream before reprocessing? +1. The stream should be `reset` or `relocation to the beginning of the file`. I think this is complicated