[jira] [Resolved] (TIKA-4285) Invalid Link for changelog CHANGES.txt files

2024-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4285. --- Resolution: Fixed Thank you [~tom_1st] and [~tilman]! Should be fixed now. > Invalid Link for

[jira] [Assigned] (TIKA-4285) Invalid Link for changelog CHANGES.txt files

2024-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-4285: - Assignee: Tim Allison > Invalid Link for changelog CHANGES.txt files >

[jira] [Commented] (TIKA-4281) Fix javadoc plugin configuration

2024-07-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866843#comment-17866843 ] Tim Allison commented on TIKA-4281: --- For some reason, now, it looks like {{javadocs}} works fine with

[jira] [Commented] (TIKA-4281) Fix javadoc plugin configuration

2024-07-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866742#comment-17866742 ] Tim Allison commented on TIKA-4281: --- Well, that didn't work: {{javadoc: error - No source files for

[jira] [Created] (TIKA-4281) Fix javadoc plugin configuration

2024-07-16 Thread Tim Allison (Jira)
Tim Allison created TIKA-4281: - Summary: Fix javadoc plugin configuration Key: TIKA-4281 URL: https://issues.apache.org/jira/browse/TIKA-4281 Project: Tika Issue Type: Task Reporter:

[jira] [Updated] (TIKA-4280) Tasks for the 3.0.0 release

2024-07-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4280: -- Description: I'm too lazy to open separate tickets. Please do so if desired. Some items: * Before

[jira] [Updated] (TIKA-4280) Tasks for the 3.0.0 release

2024-07-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4280: -- Description: I'm too lazy to open separate tickets. Please do so if desired. Some items: * Before

[jira] [Created] (TIKA-4280) Tasks for the 3.0.0 release

2024-07-15 Thread Tim Allison (Jira)
Tim Allison created TIKA-4280: - Summary: Tasks for the 3.0.0 release Key: TIKA-4280 URL: https://issues.apache.org/jira/browse/TIKA-4280 Project: Tika Issue Type: Task Reporter: Tim

[jira] [Commented] (TIKA-4278) TextAndCSVParser doesn't detect semicolon separated file

2024-07-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866075#comment-17866075 ] Tim Allison commented on TIKA-4278: --- Thank you, [~tilman], y, that's probably an oversight on my part in

[jira] [Created] (TIKA-4275) Make tika-grpc a top-level module

2024-07-09 Thread Tim Allison (Jira)
Tim Allison created TIKA-4275: - Summary: Make tika-grpc a top-level module Key: TIKA-4275 URL: https://issues.apache.org/jira/browse/TIKA-4275 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4272) create tika docker image for tika-grpc

2024-06-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860241#comment-17860241 ] Tim Allison commented on TIKA-4272: --- Y, I concur, we should have a completely separate image. > create

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860035#comment-17860035 ] Tim Allison commented on TIKA-4251: --- W00t! > [DISCUSS] move to cosium's git-code-format-maven-plugin

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860020#comment-17860020 ] Tim Allison commented on TIKA-4251: --- Sounds great. My personal preference would be to move away from our

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860007#comment-17860007 ] Tim Allison commented on TIKA-4251: --- > we eat the 1-time-format cost That's where the vulnerability is.

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1785#comment-1785 ] Tim Allison commented on TIKA-4251: --- Makes sense. Tilman's observation is legit, and I don't see a way

[jira] [Comment Edited] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859739#comment-17859739 ] Tim Allison edited comment on TIKA-4251 at 6/25/24 6:19 PM: Y. I agree. When I

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859739#comment-17859739 ] Tim Allison commented on TIKA-4251: --- Y. I agree. When I started with checkstyle, it modified nearly

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853241#comment-17853241 ] Tim Allison commented on TIKA-4243: --- This is what the json currently looks like. {code:json} {

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853240#comment-17853240 ] Tim Allison commented on TIKA-4243: --- I opened a PR with some cleanup, fixes and a new unit test that

[jira] [Resolved] (TIKA-4268) Use title for embedded resource path in embedded msg files

2024-06-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4268. --- Fix Version/s: 3.0.0 Resolution: Fixed > Use title for embedded resource path in embedded msg

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853157#comment-17853157 ] Tim Allison commented on TIKA-4251: --- Unless there are any objections, I'll likely move forward with this

[jira] [Created] (TIKA-4268) Use title for embedded resource path in embedded msg files

2024-06-07 Thread Tim Allison (Jira)
Tim Allison created TIKA-4268: - Summary: Use title for embedded resource path in embedded msg files Key: TIKA-4268 URL: https://issues.apache.org/jira/browse/TIKA-4268 Project: Tika Issue Type:

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852876#comment-17852876 ] Tim Allison edited comment on TIKA-4243 at 6/6/24 5:39 PM: --- I think our joint

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852876#comment-17852876 ] Tim Allison commented on TIKA-4243: --- I think our joint recent PR on TIKA-4252 accomplishes the goals of

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852874#comment-17852874 ] Tim Allison commented on TIKA-4252: --- K. I think we're at "good enough" here. [~ndipiazza], thank you and

[jira] [Resolved] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4252. --- Resolution: Fixed > PipesClient#process - seems to lose the Fetch input metadata? >

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852808#comment-17852808 ] Tim Allison commented on TIKA-4243: --- Oh, and documentation, lots of documentation. :LOL: > tika

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852804#comment-17852804 ] Tim Allison edited comment on TIKA-4243 at 6/6/24 2:11 PM: --- Current status on

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852804#comment-17852804 ] Tim Allison commented on TIKA-4243: --- Current status on TIKA-4243 -- works up through and including

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-04 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852098#comment-17852098 ] Tim Allison commented on TIKA-4243: --- Let me know if there are any objections to heading in this

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-04 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852097#comment-17852097 ] Tim Allison commented on TIKA-4243: --- K, I chatted briefly with [~ndipiazza] this morning. Unless there

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 5:10 PM: --- I spent a bit of

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 5:02 PM: --- I spent a bit of

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 5:02 PM: --- I spent a bit of

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 4:45 PM: --- I spent a bit of

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 4:45 PM: --- I spent a bit of

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727 ] Tim Allison commented on TIKA-4243: --- I spent a bit of time trying to serialize ParseContext, and I now

[jira] [Resolved] (TIKA-4260) Add parse context to the fetcher interface in 3.x

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4260. --- Resolution: Duplicate Turns out this is a duplicate. Onwards to TIKA-4243! > Add parse context to

[jira] [Created] (TIKA-4266) Improve multithreading and the xml parser pools in XMLUtils

2024-05-30 Thread Tim Allison (Jira)
Tim Allison created TIKA-4266: - Summary: Improve multithreading and the xml parser pools in XMLUtils Key: TIKA-4266 URL: https://issues.apache.org/jira/browse/TIKA-4266 Project: Tika Issue

[jira] [Resolved] (TIKA-4221) Regression in pack200 parsing in commons-compress

2024-05-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4221. --- Fix Version/s: 3.0.0 2.9.3 Resolution: Fixed Many thanks to [~ggregory] and

[jira] [Resolved] (TIKA-4220) Commons-compress too lenient on headless tar detection

2024-05-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4220. --- Fix Version/s: 3.0.0 2.9.3 Resolution: Fixed Many thanks to [~ggregory] and

[jira] [Commented] (TIKA-4265) Consider adding maven build cache extension

2024-05-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850776#comment-17850776 ] Tim Allison commented on TIKA-4265: --- It doesn't help at all if there's a modification in tika-core, even

[jira] [Commented] (TIKA-4265) Consider adding maven build cache extension

2024-05-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850773#comment-17850773 ] Tim Allison commented on TIKA-4265: --- I just pushed a demo to {{build-cache}}. This includes

[jira] [Created] (TIKA-4265) Consider adding maven build cache extension

2024-05-30 Thread Tim Allison (Jira)
Tim Allison created TIKA-4265: - Summary: Consider adding maven build cache extension Key: TIKA-4265 URL: https://issues.apache.org/jira/browse/TIKA-4265 Project: Tika Issue Type: Task

[jira] [Created] (TIKA-4261) Add attachment type metadata filter

2024-05-24 Thread Tim Allison (Jira)
Tim Allison created TIKA-4261: - Summary: Add attachment type metadata filter Key: TIKA-4261 URL: https://issues.apache.org/jira/browse/TIKA-4261 Project: Tika Issue Type: Task

[jira] [Resolved] (TIKA-4259) Decouple xml parser stuff from ParseContext

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4259. --- Fix Version/s: 3.0.0 Resolution: Fixed > Decouple xml parser stuff from ParseContext >

[jira] [Commented] (TIKA-4260) Add parse context to the fetcher interface in 3.x

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849298#comment-17849298 ] Tim Allison commented on TIKA-4260: --- That PR currently only works on tika-core. More needs to be done

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849288#comment-17849288 ] Tim Allison commented on TIKA-4243: --- [~ndipiazza], I added parseContext to fetchers and emitters on the

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849103#comment-17849103 ] Tim Allison edited comment on TIKA-4243 at 5/24/24 1:00 PM: Proposed basic

[jira] [Created] (TIKA-4260) Add parse context to the fetcher interface in 3.x

2024-05-23 Thread Tim Allison (Jira)
Tim Allison created TIKA-4260: - Summary: Add parse context to the fetcher interface in 3.x Key: TIKA-4260 URL: https://issues.apache.org/jira/browse/TIKA-4260 Project: Tika Issue Type: Task

[jira] [Created] (TIKA-4259) Decouple xml parser stuff from ParseContext

2024-05-23 Thread Tim Allison (Jira)
Tim Allison created TIKA-4259: - Summary: Decouple xml parser stuff from ParseContext Key: TIKA-4259 URL: https://issues.apache.org/jira/browse/TIKA-4259 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849114#comment-17849114 ] Tim Allison commented on TIKA-4243: --- I'm going to start working on PRs that will be generally helpful

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849108#comment-17849108 ] Tim Allison commented on TIKA-4243: --- The downsides we see: a) if we there's agreement to add

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849103#comment-17849103 ] Tim Allison commented on TIKA-4243: --- Proposed basic roadmap: Serialize ParseContext as is... Allow for

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849101#comment-17849101 ] Tim Allison commented on TIKA-4243: --- Fellow devs, in chatting with Nicholas, we're thinking that it

[jira] [Resolved] (TIKA-4258) Multi-arch support for docker images

2024-05-21 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4258. --- Resolution: Fixed Just pushed 2.9.2.1/*-latest Thank you, all! > Multi-arch support for docker

[jira] [Commented] (TIKA-4255) TextAndCSVParser ignores Metadata.CONTENT_ENCODING

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847980#comment-17847980 ] Tim Allison commented on TIKA-4255: --- Thank you for opening this PR. Are you able to add a small unit

[jira] [Resolved] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4256. --- Fix Version/s: 3.0.0 Resolution: Fixed > Allow inlining of ocr'd text in container document >

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847950#comment-17847950 ] Tim Allison commented on TIKA-4258: --- I'm sure I'll need to modify the PR when I actually go to run it,

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847949#comment-17847949 ] Tim Allison commented on TIKA-4258: --- Let's give it a day for fellow devs to weigh in. If there are no

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847943#comment-17847943 ] Tim Allison commented on TIKA-4258: --- And here's the full version:

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847931#comment-17847931 ] Tim Allison commented on TIKA-4243: --- Separately, but related to this and also to TIKA-4252 -- should we

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847883#comment-17847883 ] Tim Allison commented on TIKA-4258: --- Helpful links from #infra:

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847882#comment-17847882 ] Tim Allison commented on TIKA-4258: --- If fellow devs with better knowledge of github actions and docker

[jira] [Created] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
Tim Allison created TIKA-4258: - Summary: Multi-arch support for docker images Key: TIKA-4258 URL: https://issues.apache.org/jira/browse/TIKA-4258 Project: Tika Issue Type: Task

[jira] [Updated] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4256: -- Description: For legacy tika, we're inlining all content from embedded files including ocr content of

[jira] [Updated] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4256: -- Description: For legacy tika, we're inlining all content from embedded files including ocr content of

[jira] [Created] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-16 Thread Tim Allison (Jira)
Tim Allison created TIKA-4256: - Summary: Allow inlining of ocr'd text in container document Key: TIKA-4256 URL: https://issues.apache.org/jira/browse/TIKA-4256 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4137) Building current Tika main branch fails under Java 20/21

2024-05-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846697#comment-17846697 ] Tim Allison commented on TIKA-4137: --- Y, done just now. > Building current Tika main branch fails under

[jira] [Updated] (TIKA-4137) Building current Tika main branch fails under Java 20/21

2024-05-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4137: -- Fix Version/s: 2.9.3 > Building current Tika main branch fails under Java 20/21 >

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845081#comment-17845081 ] Tim Allison commented on TIKA-4252: --- fetchRequestMetadata, fetchResponseMetadata? > PipesClient#process

[jira] [Comment Edited] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845072#comment-17845072 ] Tim Allison edited comment on TIKA-4252 at 5/9/24 5:14 PM: --- fetcher.fetch(String

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845072#comment-17845072 ] Tim Allison commented on TIKA-4252: --- fetcher.fetch(String key, Metadata writeMetadata, Metadata

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845068#comment-17845068 ] Tim Allison commented on TIKA-4252: --- Should we add an optional Metadata object to the FetchKey. We could

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845062#comment-17845062 ] Tim Allison commented on TIKA-4252: --- K, but you don't want that coming back and being populated in the

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845051#comment-17845051 ] Tim Allison commented on TIKA-4252: --- Or, if you mean that metadata gathered from the fetcher isn't

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845048#comment-17845048 ] Tim Allison commented on TIKA-4252: --- My initial thought for injecting user metadata was to pass through

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845047#comment-17845047 ] Tim Allison commented on TIKA-4252: --- I opened this branch: https://github.com/apache/tika/tree/TIKA-4252

[jira] [Reopened] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-4252: --- I pointed you to the wrong part of the code ... sorry. The design goal was to overwrite the extracted

[jira] [Commented] (TIKA-4253) Duplicate parsers loaded in AutoDetectParser in 3.x at least in some unit tests

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845022#comment-17845022 ] Tim Allison commented on TIKA-4253: --- This is happening in the unit tests because there are multiple

[jira] [Created] (TIKA-4253) Duplicate parsers loaded in AutoDetectParser in 3.x at least in some unit tests

2024-05-09 Thread Tim Allison (Jira)
Tim Allison created TIKA-4253: - Summary: Duplicate parsers loaded in AutoDetectParser in 3.x at least in some unit tests Key: TIKA-4253 URL: https://issues.apache.org/jira/browse/TIKA-4253 Project: Tika

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844998#comment-17844998 ] Tim Allison commented on TIKA-4252: --- Good catch:

[jira] [Comment Edited] (TIKA-4250) Add a libpst-based parser

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844976#comment-17844976 ] Tim Allison edited comment on TIKA-4250 at 5/9/24 12:59 PM: libpst issue

[jira] [Commented] (TIKA-4250) Add a libpst-based parser

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844976#comment-17844976 ] Tim Allison commented on TIKA-4250: --- libpff issue opened: https://github.com/libyal/libpff/issues/128

[jira] [Updated] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-05-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4251: -- Description: I was recently working a bit on incubator-stormcrawler, and I noticed that they are using

[jira] [Updated] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-05-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4251: -- Summary: [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format (was:

[jira] [Created] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin

2024-05-06 Thread Tim Allison (Jira)
Tim Allison created TIKA-4251: - Summary: [DISCUSS] move to cosium's git-code-format-maven-plugin Key: TIKA-4251 URL: https://issues.apache.org/jira/browse/TIKA-4251 Project: Tika Issue Type:

[jira] [Comment Edited] (TIKA-4250) Add a libpst-based parser

2024-05-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843746#comment-17843746 ] Tim Allison edited comment on TIKA-4250 at 5/6/24 5:03 PM: --- Wait, so, on

[jira] [Comment Edited] (TIKA-4250) Add a libpst-based parser

2024-05-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843798#comment-17843798 ] Tim Allison edited comment on TIKA-4250 at 5/6/24 5:02 PM: --- So, I caught an

[jira] [Commented] (TIKA-4250) Add a libpst-based parser

2024-05-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843798#comment-17843798 ] Tim Allison commented on TIKA-4250: --- So, I caught an example of libpst not reading an attachment in our

[jira] [Updated] (TIKA-4250) Add a libpst-based parser

2024-05-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4250: -- Attachment: 8.eml > Add a libpst-based parser > - > > Key:

[jira] [Updated] (TIKA-4250) Add a libpst-based parser

2024-05-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4250: -- Attachment: 8.msg > Add a libpst-based parser > - > > Key:

[jira] [Comment Edited] (TIKA-4250) Add a libpst-based parser

2024-05-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843740#comment-17843740 ] Tim Allison edited comment on TIKA-4250 at 5/6/24 1:02 PM: --- Wow. This is super

[jira] [Commented] (TIKA-4250) Add a libpst-based parser

2024-05-04 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843428#comment-17843428 ] Tim Allison commented on TIKA-4250: --- Given your experience, I think it would be valuable to add libpff

[jira] [Commented] (TIKA-4250) Add a libpst-based parser

2024-05-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843361#comment-17843361 ] Tim Allison commented on TIKA-4250: --- Hahahahaha. I figured you'd have input on this [~lfcnassif]! Y,

[jira] [Commented] (TIKA-4249) EML file is treating it as text file in 2.9.2 version

2024-05-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843217#comment-17843217 ] Tim Allison commented on TIKA-4249: --- > Crystal ball is murky on the timing of the next 2.x and 3.x

[jira] [Created] (TIKA-4250) Add a libpst-based parser

2024-05-02 Thread Tim Allison (Jira)
Tim Allison created TIKA-4250: - Summary: Add a libpst-based parser Key: TIKA-4250 URL: https://issues.apache.org/jira/browse/TIKA-4250 Project: Tika Issue Type: Task Reporter: Tim

[jira] [Commented] (TIKA-4249) EML file is treating it as text file in 2.9.2 version

2024-05-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842745#comment-17842745 ] Tim Allison commented on TIKA-4249: --- Version numbers for the fix are noted above: 2.9.3 and 3.0.0

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842605#comment-17842605 ] Tim Allison commented on TIKA-4243: --- Do we put it in tika-serialization or a new module? > tika

[jira] [Commented] (TIKA-4249) EML file is treating it as text file in 3.9.2 version

2024-05-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842604#comment-17842604 ] Tim Allison commented on TIKA-4249: --- The example file shared was actually kind of weird. I looked like

  1   2   3   4   5   6   7   8   9   10   >