Hi Tilman,
> expected: but was: charset=[windows-1252]>
I think this problem is caused by the charset detection strategy basing on line
separator(CRLF or LF) and the git autocrlf config. I also met this problem and
solved it like this :
Set autocrlf false by git config --global core.autocrlf
[
https://issues.apache.org/jira/browse/TIKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180814#comment-17180814
]
Tim Allison commented on TIKA-3129:
---
Are we not capturing that in the {{status:}} element in the
[
https://issues.apache.org/jira/browse/TIKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180814#comment-17180814
]
Tim Allison edited comment on TIKA-3129 at 8/19/20, 8:43 PM:
-
W00t! Great to
[
https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180782#comment-17180782
]
Akash edited comment on TIKA-3154 at 8/19/20, 7:57 PM:
---
Tried with below config.
[
https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180782#comment-17180782
]
Akash commented on TIKA-3154:
-
Tried with below config. Did not help
{code:java}
/
[
https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180782#comment-17180782
]
Akash edited comment on TIKA-3154 at 8/19/20, 7:56 PM:
---
Tried with below config.
After many weeks I checked out the "main" branch, and get these build
errors:
Failures:
TestMimeTypes.testArchiveDetection:395->assertTypeByData:1275
expected:<[application/x-archive]> but was:<[text/plain]>
[
https://issues.apache.org/jira/browse/TIKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180698#comment-17180698
]
Nicholas DiPiazza edited comment on TIKA-3129 at 8/19/20, 5:14 PM:
---
With
[
https://issues.apache.org/jira/browse/TIKA-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180703#comment-17180703
]
Nicholas DiPiazza commented on TIKA-3173:
-
I'll play around with that idea and create a PR if it
[
https://issues.apache.org/jira/browse/TIKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180698#comment-17180698
]
Nicholas DiPiazza edited comment on TIKA-3129 at 8/19/20, 5:11 PM:
---
With
[
https://issues.apache.org/jira/browse/TIKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180698#comment-17180698
]
Nicholas DiPiazza commented on TIKA-3129:
-
With respect to "last parsed" offset, I put this
[
https://issues.apache.org/jira/browse/TIKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180698#comment-17180698
]
Nicholas DiPiazza edited comment on TIKA-3129 at 8/19/20, 5:10 PM:
---
With
[
https://issues.apache.org/jira/browse/TIKA-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180550#comment-17180550
]
Tim Allison commented on TIKA-3174:
---
Can you attach an example file?
> tika解析ofd文档时,除了正文内容外,还出现了多余的数字。
[
https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180548#comment-17180548
]
Tim Allison commented on TIKA-3154:
---
https://cwiki.apache.org/confluence/display/TIKA/MSOfficeParsers
[
https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180544#comment-17180544
]
Tim Allison commented on TIKA-3154:
---
Turns out it is (or should be):
[
https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180512#comment-17180512
]
Akash commented on TIKA-3154:
-
Can we make this value configurable ?
> Exception while extracting msg files
>
天空 created TIKA-3174:
Summary: tika解析ofd文档时,除了正文内容外,还出现了多余的数字。
Key: TIKA-3174
URL: https://issues.apache.org/jira/browse/TIKA-3174
Project: Tika
Issue Type: Bug
Reporter: 天空
Hi Tim
It looks good. Perfect.
Do you plant to have tika-parsers reuse the new module as its dependencies
?
Cheers, Sergey
On Tue, Aug 18, 2020 at 3:41 PM Tim Allison wrote:
> If anyone has any time, please take a look here:
> https://github.com/apache/tika/tree/branch_2x/tika-parser-modules
[
https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180449#comment-17180449
]
Akash commented on TIKA-3172:
-
Thanks [~tallison] for the clarification.
> PDF Parser configuration enable
[
https://issues.apache.org/jira/browse/TIKA-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180425#comment-17180425
]
Tim Allison commented on TIKA-3173:
---
This is our OOM test:
I've disabled our old builds, and configured 4 on ci-builds.
main -- Java 8 and 11
branch_1z -- Java 8 and 11
Thank you, again, Tilman and Uwe!
On Fri, Aug 14, 2020 at 9:08 AM Tim Allison wrote:
> All,
> Tilman pointed out that on the builds list there was an announcement
> that Aug 15 is
[
https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180406#comment-17180406
]
Tim Allison commented on TIKA-3172:
---
[~akki1607], y, that's expected, because you're wrapping the
[
https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180375#comment-17180375
]
Hudson commented on TIKA-3172:
--
SUCCESS: Integrated in Jenkins build tika-main-jdk11 #1022 (See
[
https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180356#comment-17180356
]
Hudson commented on TIKA-3172:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #3 (See
The Apache Jenkins build system has built tika-main-jdk14 (build #12)
Status: Failure
Check console output at https://builds.apache.org/job/tika-main-jdk14/12/ to
view the results.
[
https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180343#comment-17180343
]
Akash edited comment on TIKA-3172 at 8/19/20, 7:46 AM:
---
[~tallison]
If we use
[
https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180343#comment-17180343
]
Akash edited comment on TIKA-3172 at 8/19/20, 7:46 AM:
---
[~tallison]
If we use
[
https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180343#comment-17180343
]
Akash commented on TIKA-3172:
-
[~tallison]
If we use above mentioned tika config file to extract, the parser
[
https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180333#comment-17180333
]
Tim Allison commented on TIKA-3172:
---
Thank you, [~tilman]. I'm fixing this now.
> PDF Parser
29 matches
Mail list logo