[
https://issues.apache.org/jira/browse/TIKA-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
marek kapowicki closed TIKA-3202.
-
Resolution: Works for Me
> Tika duplicates the ocr text
>
>
>
PeterAlfredLee edited a comment on pull request #356:
URL: https://github.com/apache/tika/pull/356#issuecomment-696618151
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
PeterAlfredLee commented on pull request #356:
URL: https://github.com/apache/tika/pull/356#issuecomment-696618151
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
[
https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200448#comment-17200448
]
Peter Lee edited comment on TIKA-3196 at 9/23/20, 2:13 AM:
---
Hi [~tallison]
I
[
https://issues.apache.org/jira/browse/TIKA-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Isabelle Giguere updated TIKA-3203:
---
Description:
In our application, Tika is used as part of a Tomcat webapp. Tomcat sets its
[
https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200448#comment-17200448
]
Peter Lee commented on TIKA-3196:
-
Hi [~tallison]
I wrote a test here :
[
https://issues.apache.org/jira/browse/TIKA-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200397#comment-17200397
]
marek kapowicki commented on TIKA-3202:
---
ONLY_OCR and no_ocr works fine. But now I can see how
[
https://issues.apache.org/jira/browse/TIKA-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Isabelle Giguere updated TIKA-3203:
---
Description:
In our application, Tika is used as part of a Tomcat webapp. Tomcat sets its
Isabelle Giguere created TIKA-3203:
--
Summary: MP4Parser temporary files are not deleted from Tomcat
temp folder
Key: TIKA-3203
URL: https://issues.apache.org/jira/browse/TIKA-3203
Project: Tika
marek kapowicki created TIKA-3202:
-
Summary: Tika duplicates the ocr text
Key: TIKA-3202
URL: https://issues.apache.org/jira/browse/TIKA-3202
Project: Tika
Issue Type: Bug
Affects
PeterAlfredLee commented on pull request #356:
URL: https://github.com/apache/tika/pull/356#issuecomment-696721537
I forged a zip archive in memory that uses STORED and Data Descriptor at the
same time. This could be easily used as a test case :
```
@Test
public void
[
https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-3196:
--
Attachment: OOO-107047-0.oxt-145.zip
> PackageParser should attempt to parse entries from zip files
[
https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200071#comment-17200071
]
Tim Allison commented on TIKA-3196:
---
If only we has some way of finding files that trigger this
PeterAlfredLee edited a comment on pull request #356:
URL: https://github.com/apache/tika/pull/356#issuecomment-696618151
> Do we have to reset the stream before reprocessing?
+1. The stream should be `reset` or `relocation to the beginning of the
file`.
I think this is
PeterAlfredLee commented on pull request #356:
URL: https://github.com/apache/tika/pull/356#issuecomment-696618151
> Do we have to reset the stream before reprocessing?
+1. The stream should be `reset` or `relocation to the beginning of the
file`.
I think this is complicated
15 matches
Mail list logo