[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852876#comment-17852876
]
Tim Allison edited comment on TIKA-4243 at 6/6/24 5:39 PM:
---
I think our joint
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852876#comment-17852876
]
Tim Allison commented on TIKA-4243:
---
I think our joint recent PR on TIKA-4252 accomplishes the goals of
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852874#comment-17852874
]
Tim Allison commented on TIKA-4252:
---
K. I think we're at "good enough" here. [~ndipiazza], thank you and
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4252.
---
Resolution: Fixed
> PipesClient#process - seems to lose the Fetch input metadata?
>
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852808#comment-17852808
]
Tim Allison commented on TIKA-4243:
---
Oh, and documentation, lots of documentation. :LOL:
> tika
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852804#comment-17852804
]
Tim Allison edited comment on TIKA-4243 at 6/6/24 2:11 PM:
---
Current status on
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852804#comment-17852804
]
Tim Allison commented on TIKA-4243:
---
Current status on TIKA-4243 -- works up through and including
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852098#comment-17852098
]
Tim Allison commented on TIKA-4243:
---
Let me know if there are any objections to heading in this
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852097#comment-17852097
]
Tim Allison commented on TIKA-4243:
---
K, I chatted briefly with [~ndipiazza] this morning. Unless there
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727
]
Tim Allison edited comment on TIKA-4243 at 6/3/24 5:10 PM:
---
I spent a bit of
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727
]
Tim Allison edited comment on TIKA-4243 at 6/3/24 5:02 PM:
---
I spent a bit of
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727
]
Tim Allison edited comment on TIKA-4243 at 6/3/24 5:02 PM:
---
I spent a bit of
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727
]
Tim Allison edited comment on TIKA-4243 at 6/3/24 4:45 PM:
---
I spent a bit of
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727
]
Tim Allison edited comment on TIKA-4243 at 6/3/24 4:45 PM:
---
I spent a bit of
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727
]
Tim Allison commented on TIKA-4243:
---
I spent a bit of time trying to serialize ParseContext, and I now
[
https://issues.apache.org/jira/browse/TIKA-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4260.
---
Resolution: Duplicate
Turns out this is a duplicate. Onwards to TIKA-4243!
> Add parse context to
Tim Allison created TIKA-4266:
-
Summary: Improve multithreading and the xml parser pools in
XMLUtils
Key: TIKA-4266
URL: https://issues.apache.org/jira/browse/TIKA-4266
Project: Tika
Issue
[
https://issues.apache.org/jira/browse/TIKA-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4221.
---
Fix Version/s: 3.0.0
2.9.3
Resolution: Fixed
Many thanks to [~ggregory] and
[
https://issues.apache.org/jira/browse/TIKA-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4220.
---
Fix Version/s: 3.0.0
2.9.3
Resolution: Fixed
Many thanks to [~ggregory] and
[
https://issues.apache.org/jira/browse/TIKA-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850776#comment-17850776
]
Tim Allison commented on TIKA-4265:
---
It doesn't help at all if there's a modification in tika-core, even
[
https://issues.apache.org/jira/browse/TIKA-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850773#comment-17850773
]
Tim Allison commented on TIKA-4265:
---
I just pushed a demo to {{build-cache}}. This includes
Tim Allison created TIKA-4265:
-
Summary: Consider adding maven build cache extension
Key: TIKA-4265
URL: https://issues.apache.org/jira/browse/TIKA-4265
Project: Tika
Issue Type: Task
Tim Allison created TIKA-4261:
-
Summary: Add attachment type metadata filter
Key: TIKA-4261
URL: https://issues.apache.org/jira/browse/TIKA-4261
Project: Tika
Issue Type: Task
[
https://issues.apache.org/jira/browse/TIKA-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4259.
---
Fix Version/s: 3.0.0
Resolution: Fixed
> Decouple xml parser stuff from ParseContext
>
[
https://issues.apache.org/jira/browse/TIKA-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849298#comment-17849298
]
Tim Allison commented on TIKA-4260:
---
That PR currently only works on tika-core. More needs to be done
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849288#comment-17849288
]
Tim Allison commented on TIKA-4243:
---
[~ndipiazza], I added parseContext to fetchers and emitters on the
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849103#comment-17849103
]
Tim Allison edited comment on TIKA-4243 at 5/24/24 1:00 PM:
Proposed basic
Tim Allison created TIKA-4260:
-
Summary: Add parse context to the fetcher interface in 3.x
Key: TIKA-4260
URL: https://issues.apache.org/jira/browse/TIKA-4260
Project: Tika
Issue Type: Task
Tim Allison created TIKA-4259:
-
Summary: Decouple xml parser stuff from ParseContext
Key: TIKA-4259
URL: https://issues.apache.org/jira/browse/TIKA-4259
Project: Tika
Issue Type: Task
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849114#comment-17849114
]
Tim Allison commented on TIKA-4243:
---
I'm going to start working on PRs that will be generally helpful
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849108#comment-17849108
]
Tim Allison commented on TIKA-4243:
---
The downsides we see:
a) if we there's agreement to add
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849103#comment-17849103
]
Tim Allison commented on TIKA-4243:
---
Proposed basic roadmap:
Serialize ParseContext as is...
Allow for
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849101#comment-17849101
]
Tim Allison commented on TIKA-4243:
---
Fellow devs, in chatting with Nicholas, we're thinking that it
[
https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4258.
---
Resolution: Fixed
Just pushed 2.9.2.1/*-latest
Thank you, all!
> Multi-arch support for docker
[
https://issues.apache.org/jira/browse/TIKA-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847980#comment-17847980
]
Tim Allison commented on TIKA-4255:
---
Thank you for opening this PR. Are you able to add a small unit
[
https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4256.
---
Fix Version/s: 3.0.0
Resolution: Fixed
> Allow inlining of ocr'd text in container document
>
[
https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847950#comment-17847950
]
Tim Allison commented on TIKA-4258:
---
I'm sure I'll need to modify the PR when I actually go to run it,
[
https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847949#comment-17847949
]
Tim Allison commented on TIKA-4258:
---
Let's give it a day for fellow devs to weigh in. If there are no
[
https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847943#comment-17847943
]
Tim Allison commented on TIKA-4258:
---
And here's the full version:
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847931#comment-17847931
]
Tim Allison commented on TIKA-4243:
---
Separately, but related to this and also to TIKA-4252 -- should we
[
https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847883#comment-17847883
]
Tim Allison commented on TIKA-4258:
---
Helpful links from #infra:
[
https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847882#comment-17847882
]
Tim Allison commented on TIKA-4258:
---
If fellow devs with better knowledge of github actions and docker
Tim Allison created TIKA-4258:
-
Summary: Multi-arch support for docker images
Key: TIKA-4258
URL: https://issues.apache.org/jira/browse/TIKA-4258
Project: Tika
Issue Type: Task
[
https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4256:
--
Description:
For legacy tika, we're inlining all content from embedded files including ocr
content of
[
https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4256:
--
Description:
For legacy tika, we're inlining all content from embedded files including ocr
content of
Tim Allison created TIKA-4256:
-
Summary: Allow inlining of ocr'd text in container document
Key: TIKA-4256
URL: https://issues.apache.org/jira/browse/TIKA-4256
Project: Tika
Issue Type: Task
[
https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846697#comment-17846697
]
Tim Allison commented on TIKA-4137:
---
Y, done just now.
> Building current Tika main branch fails under
[
https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4137:
--
Fix Version/s: 2.9.3
> Building current Tika main branch fails under Java 20/21
>
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845081#comment-17845081
]
Tim Allison commented on TIKA-4252:
---
fetchRequestMetadata, fetchResponseMetadata?
> PipesClient#process
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845072#comment-17845072
]
Tim Allison edited comment on TIKA-4252 at 5/9/24 5:14 PM:
---
fetcher.fetch(String
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845072#comment-17845072
]
Tim Allison commented on TIKA-4252:
---
fetcher.fetch(String key, Metadata writeMetadata, Metadata
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845068#comment-17845068
]
Tim Allison commented on TIKA-4252:
---
Should we add an optional Metadata object to the FetchKey. We could
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845062#comment-17845062
]
Tim Allison commented on TIKA-4252:
---
K, but you don't want that coming back and being populated in the
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845051#comment-17845051
]
Tim Allison commented on TIKA-4252:
---
Or, if you mean that metadata gathered from the fetcher isn't
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845048#comment-17845048
]
Tim Allison commented on TIKA-4252:
---
My initial thought for injecting user metadata was to pass through
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845047#comment-17845047
]
Tim Allison commented on TIKA-4252:
---
I opened this branch: https://github.com/apache/tika/tree/TIKA-4252
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison reopened TIKA-4252:
---
I pointed you to the wrong part of the code ... sorry. The design goal was to
overwrite the extracted
[
https://issues.apache.org/jira/browse/TIKA-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845022#comment-17845022
]
Tim Allison commented on TIKA-4253:
---
This is happening in the unit tests because there are multiple
Tim Allison created TIKA-4253:
-
Summary: Duplicate parsers loaded in AutoDetectParser in 3.x at
least in some unit tests
Key: TIKA-4253
URL: https://issues.apache.org/jira/browse/TIKA-4253
Project: Tika
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844998#comment-17844998
]
Tim Allison commented on TIKA-4252:
---
Good catch:
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844976#comment-17844976
]
Tim Allison edited comment on TIKA-4250 at 5/9/24 12:59 PM:
libpst issue
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844976#comment-17844976
]
Tim Allison commented on TIKA-4250:
---
libpff issue opened: https://github.com/libyal/libpff/issues/128
[
https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4251:
--
Description:
I was recently working a bit on incubator-stormcrawler, and I noticed that they
are using
[
https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4251:
--
Summary: [DISCUSS] move to cosium's git-code-format-maven-plugin with
google-java-format (was:
Tim Allison created TIKA-4251:
-
Summary: [DISCUSS] move to cosium's git-code-format-maven-plugin
Key: TIKA-4251
URL: https://issues.apache.org/jira/browse/TIKA-4251
Project: Tika
Issue Type:
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843746#comment-17843746
]
Tim Allison edited comment on TIKA-4250 at 5/6/24 5:03 PM:
---
Wait, so, on
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843798#comment-17843798
]
Tim Allison edited comment on TIKA-4250 at 5/6/24 5:02 PM:
---
So, I caught an
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843798#comment-17843798
]
Tim Allison commented on TIKA-4250:
---
So, I caught an example of libpst not reading an attachment in our
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4250:
--
Attachment: 8.eml
> Add a libpst-based parser
> -
>
> Key:
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4250:
--
Attachment: 8.msg
> Add a libpst-based parser
> -
>
> Key:
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843740#comment-17843740
]
Tim Allison edited comment on TIKA-4250 at 5/6/24 1:02 PM:
---
Wow. This is super
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843428#comment-17843428
]
Tim Allison commented on TIKA-4250:
---
Given your experience, I think it would be valuable to add libpff
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843361#comment-17843361
]
Tim Allison commented on TIKA-4250:
---
Hahahahaha. I figured you'd have input on this [~lfcnassif]!
Y,
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843217#comment-17843217
]
Tim Allison commented on TIKA-4249:
---
> Crystal ball is murky on the timing of the next 2.x and 3.x
Tim Allison created TIKA-4250:
-
Summary: Add a libpst-based parser
Key: TIKA-4250
URL: https://issues.apache.org/jira/browse/TIKA-4250
Project: Tika
Issue Type: Task
Reporter: Tim
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842745#comment-17842745
]
Tim Allison commented on TIKA-4249:
---
Version numbers for the fix are noted above: 2.9.3 and 3.0.0
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842605#comment-17842605
]
Tim Allison commented on TIKA-4243:
---
Do we put it in tika-serialization or a new module?
> tika
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842604#comment-17842604
]
Tim Allison commented on TIKA-4249:
---
The example file shared was actually kind of weird. I looked like
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4249:
--
Summary: EML file is treating it as text file in 2.9.2 version (was: EML
file is treating it as text
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4249.
---
Fix Version/s: 3.0.0
2.9.3
Resolution: Fixed
> EML file is treating it as
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842405#comment-17842405
]
Tim Allison commented on TIKA-4249:
---
Files never cease to amaze!
Thank you. Onwards!
> EML file is
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842402#comment-17842402
]
Tim Allison commented on TIKA-4249:
---
Modifying the first hit from {{offset="0"}} to {{offset="0:3"}}
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842401#comment-17842401
]
Tim Allison commented on TIKA-4249:
---
I'm guessing you mean 2.9.0->2.9.2.
The challenge with this file
Tim Allison created TIKA-4248:
-
Summary: Improve PST handling of attachments
Key: TIKA-4248
URL: https://issues.apache.org/jira/browse/TIKA-4248
Project: Tika
Issue Type: Task
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841252#comment-17841252
]
Tim Allison commented on TIKA-4243:
---
https://json-schema.org/learn/getting-started-step-by-step
Yes,
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841242#comment-17841242
]
Tim Allison edited comment on TIKA-4243 at 4/26/24 1:32 PM:
I really, really
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841243#comment-17841243
]
Tim Allison commented on TIKA-4243:
---
Oh, sorry. Does this break anything? Can we add this as a new
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841242#comment-17841242
]
Tim Allison commented on TIKA-4243:
---
I really, really want to clean up our configuration, and moving to
[
https://issues.apache.org/jira/browse/TIKA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841221#comment-17841221
]
Tim Allison edited comment on TIKA-4245 at 4/26/24 1:23 PM:
Oops, sorry. I
[
https://issues.apache.org/jira/browse/TIKA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841221#comment-17841221
]
Tim Allison commented on TIKA-4245:
---
Oops, sorry. I didn't realize you sent your tika-config.xml. Y, one
[
https://issues.apache.org/jira/browse/TIKA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841220#comment-17841220
]
Tim Allison commented on TIKA-4245:
---
This is an ongoing area for improvement in Tika.
The algorithm is
[
https://issues.apache.org/jira/browse/TIKA-4244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4244.
---
Fix Version/s: 3.0.0
2.9.3
Resolution: Fixed
Thank you [~boomxlucifer]!
>
[
https://issues.apache.org/jira/browse/TIKA-4244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840852#comment-17840852
]
Tim Allison commented on TIKA-4244:
---
Thank you [~boomxlucifer] for finding this and reporting it. The
[
https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839780#comment-17839780
]
Tim Allison commented on TIKA-4166:
---
Thank you!
> dependency updates for Tika 3.0
>
[
https://issues.apache.org/jira/browse/TIKA-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4242.
---
Resolution: Fixed
> Tika depends on non-existing plexus-utils version
>
[
https://issues.apache.org/jira/browse/TIKA-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838260#comment-17838260
]
Tim Allison commented on TIKA-4242:
---
Looks like the reason we haven't found this problem is that we
[
https://issues.apache.org/jira/browse/TIKA-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837806#comment-17837806
]
Tim Allison commented on TIKA-4241:
---
They add a custom key in the trailer {{/AdditionalStreams}} whose
[
https://issues.apache.org/jira/browse/TIKA-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4241:
--
Attachment: testPDF_additionalStreams.pdf
> Consider handling LibreOffice's /AdditionalStreams "hybrid
Tim Allison created TIKA-4241:
-
Summary: Consider handling LibreOffice's /AdditionalStreams
"hybrid PDF" attachment embedding in PDFs
Key: TIKA-4241
URL: https://issues.apache.org/jira/browse/TIKA-4241
[
https://issues.apache.org/jira/browse/TIKA-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4241:
--
Description:
Some info here:
1 - 100 of 8855 matches
Mail list logo