[jira] [Resolved] (TIKA-4285) Invalid Link for changelog CHANGES.txt files

2024-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4285. --- Resolution: Fixed Thank you [~tom_1st] and [~tilman]! Should be fixed now. > Invalid L

[jira] [Assigned] (TIKA-4285) Invalid Link for changelog CHANGES.txt files

2024-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-4285: - Assignee: Tim Allison > Invalid Link for changelog CHANGES.txt fi

[jira] [Commented] (TIKA-4281) Fix javadoc plugin configuration

2024-07-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866843#comment-17866843 ] Tim Allison commented on TIKA-4281: --- For some reason, now, it looks like {{javadocs}} works fine

[jira] [Commented] (TIKA-4281) Fix javadoc plugin configuration

2024-07-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866742#comment-17866742 ] Tim Allison commented on TIKA-4281: --- Well, that didn't work: {{javadoc: error - No source files

[ANNOUNCE] Apache Tika 3.0.0-BETA2 released

2024-07-16 Thread Tim Allison
0.0. -- Tim Allison, on behalf of the Apache Tika community

[jira] [Created] (TIKA-4281) Fix javadoc plugin configuration

2024-07-16 Thread Tim Allison (Jira)
Tim Allison created TIKA-4281: - Summary: Fix javadoc plugin configuration Key: TIKA-4281 URL: https://issues.apache.org/jira/browse/TIKA-4281 Project: Tika Issue Type: Task Reporter

[jira] [Updated] (TIKA-4280) Tasks for the 3.0.0 release

2024-07-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4280: -- Description: I'm too lazy to open separate tickets. Please do so if desired. Some items: * Before

[jira] [Updated] (TIKA-4280) Tasks for the 3.0.0 release

2024-07-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4280: -- Description: I'm too lazy to open separate tickets. Please do so if desired. Some items: * Before

Re: [RESULT][VOTE] Release Apache Tika 3.0.0-BETA2 Candidate #1

2024-07-15 Thread Tim Allison
I released the artifacts and built the docker images. I'll work on the site and announcement tomorrow. On Mon, Jul 15, 2024 at 1:50 PM Tim Allison wrote: > > The vote has passed with 3 PMC +1s, 2 non-binding +1s and no -1s. > > +1s (binding) > Tim Allison > Nicholas DiPiazza

[jira] [Created] (TIKA-4280) Tasks for the 3.0.0 release

2024-07-15 Thread Tim Allison (Jira)
Tim Allison created TIKA-4280: - Summary: Tasks for the 3.0.0 release Key: TIKA-4280 URL: https://issues.apache.org/jira/browse/TIKA-4280 Project: Tika Issue Type: Task Reporter: Tim

[RESULT][VOTE] Release Apache Tika 3.0.0-BETA2 Candidate #1

2024-07-15 Thread Tim Allison
The vote has passed with 3 PMC +1s, 2 non-binding +1s and no -1s. +1s (binding) Tim Allison Nicholas DiPiazza Tilman Hausherr +1s (non-binding) Kiran Bachu Gary Gregory I'll release the artifacts shortly and update the website. Thank you, all! Best, Tim On Fri, Jul 12, 2024 at 12:08

[jira] [Commented] (TIKA-4278) TextAndCSVParser doesn't detect semicolon separated file

2024-07-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866075#comment-17866075 ] Tim Allison commented on TIKA-4278: --- Thank you, [~tilman], y, that's probably an oversight on my part

Re: [VOTE] Release Apache Tika 3.0.0-BETA2 Candidate #1

2024-07-12 Thread Tim Allison
; dependencies > (I've added these so we support these other projects by testing them), > and decide about the ffmpeg issue and the hdf5 issue. > > Tilman > > On 12.07.2024 18:08, Tim Allison wrote: > > A candidate for the Tika 3.0.0-BETA2 release is available at: > &g

[VOTE] Release Apache Tika 3.0.0-BETA2 Candidate #1

2024-07-12 Thread Tim Allison
A candidate for the Tika 3.0.0-BETA2 release is available at: https://dist.apache.org/repos/dist/dev/tika/3.0.0-BETA2 The release candidate is a zip archive of the sources in: https://github.com/apache/tika/tree/3.0.0-BETA2-rc1/ The SHA-512 checksum of the archive is

Re: maven-deploy pulling extraneous dependency's metadata?!

2024-07-10 Thread Tim Allison
ve the same stuff INSTALLed as well, see line 32! > Looking more... > > > On Wed, Jul 10, 2024 at 9:44 PM Tim Allison wrote: > > > Apache Maven 3.9.7 (8b094c9513efc1b9ce2d952b3b9c8eaedaf8cbf0) > > Maven home: /apache/apache-maven-3.9.7 > > Java version: 11.0.23, vendor

Re: maven-deploy pulling extraneous dependency's metadata?!

2024-07-10 Thread Tim Allison
Sorry, should have been dev@tika earlier, not private@tika. I rolled back the deploy-plugin to 3.1.1, which was successful for our last deployment of 2.9.2. That worked then. It does not work now with this new tika-grpc module. On Wed, Jul 10, 2024 at 3:42 PM Tim Allison wrote: > Apache Ma

Re: Release of Beta2?

2024-07-09 Thread Tim Allison
Let's aim for tomorrow after review of TIKA-4275? Any other fellow devs want to join? On Tue, Jul 9, 2024 at 4:46 PM Tim Allison wrote: > Doh. Sorry. Starting now... > > On Tue, Jul 9, 2024 at 12:47 PM Nicholas DiPiazza < > nicholas.dipia...@gmail.com> wrote: > >>

[jira] [Created] (TIKA-4275) Make tika-grpc a top-level module

2024-07-09 Thread Tim Allison (Jira)
Tim Allison created TIKA-4275: - Summary: Make tika-grpc a top-level module Key: TIKA-4275 URL: https://issues.apache.org/jira/browse/TIKA-4275 Project: Tika Issue Type: Task Reporter

Re: Release of Beta2?

2024-07-09 Thread Tim Allison
Doh. Sorry. Starting now... On Tue, Jul 9, 2024 at 12:47 PM Nicholas DiPiazza < nicholas.dipia...@gmail.com> wrote: > Hi all, > > Just seeing if we were planning to build Beta2 today? I'd like to tag along > and see how it's done if ya'll don't mind! > > -Nicholas >

3.0.0-BETA2 next week?

2024-07-03 Thread Tim Allison
All, I think it is time to go for a 3.0.0-BETA2. What do you think about cutting that release this Friday or maybe next week? Best, Tim

Re: how do i build a new beta version?

2024-06-27 Thread Tim Allison
cripts. how do i go > > about getting that created any idea? > > > > On Wed, Jun 26, 2024 at 2:41 PM Tim Allison > > wrote:If we > > > >> LIke a 3.0.0-BETA2 release? > >> > >> On Wed, Jun 26, 2024 at 12:06 PM Nicholas DiPiazza

[jira] [Commented] (TIKA-4272) create tika docker image for tika-grpc

2024-06-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860241#comment-17860241 ] Tim Allison commented on TIKA-4272: --- Y, I concur, we should have a completely separate image. > cre

Re: how do i build a new beta version?

2024-06-26 Thread Tim Allison
LIke a 3.0.0-BETA2 release? On Wed, Jun 26, 2024 at 12:06 PM Nicholas DiPiazza < nicholas.dipia...@gmail.com> wrote: > At some point I would like to build a 3.0.0 beta version. > > How can I go about this? > > -Nicholas >

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860035#comment-17860035 ] Tim Allison commented on TIKA-4251: --- W00t! > [DISCUSS] move to cosium's git-code-format-maven-plu

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860020#comment-17860020 ] Tim Allison commented on TIKA-4251: --- Sounds great. My personal preference would be to move away from our

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860007#comment-17860007 ] Tim Allison commented on TIKA-4251: --- > we eat the 1-time-format cost That's where the vulnerabil

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1785#comment-1785 ] Tim Allison commented on TIKA-4251: --- Makes sense. Tilman's observation is legit, and I don't see a way

[jira] [Comment Edited] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859739#comment-17859739 ] Tim Allison edited comment on TIKA-4251 at 6/25/24 6:19 PM: Y. I agree. When I

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859739#comment-17859739 ] Tim Allison commented on TIKA-4251: --- Y. I agree. When I started with checkstyle, it modified nearly

Re: Automatically applying checkstyle fixes

2024-06-24 Thread Tim Allison
pia...@gmail.com> wrote: > I just started using it for a big project and it is awesome > > On Sat, Jun 22, 2024, 6:11 AM Tim Allison wrote: > > > https://issues.apache.org/jira/browse/TIKA-4251 > > > > Anything that works and doesn't allow wildcard imports I'

Re: Automatically applying checkstyle fixes

2024-06-22 Thread Tim Allison
https://issues.apache.org/jira/browse/TIKA-4251 Anything that works and doesn't allow wildcard imports I'm good with. Have you had luck with OpenRewrite? On Wed, Jun 19, 2024 at 12:55 PM Nicholas DiPiazza < nicholas.dipia...@gmail.com> wrote: > Hey Tim and Team: > > I remember someone stating

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853241#comment-17853241 ] Tim Allison commented on TIKA-4243: --- This is what the json currently looks like. {code:json

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853240#comment-17853240 ] Tim Allison commented on TIKA-4243: --- I opened a PR with some cleanup, fixes and a new unit test

[jira] [Resolved] (TIKA-4268) Use title for embedded resource path in embedded msg files

2024-06-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4268. --- Fix Version/s: 3.0.0 Resolution: Fixed > Use title for embedded resource path in embedded

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853157#comment-17853157 ] Tim Allison commented on TIKA-4251: --- Unless there are any objections, I'll likely move forward

[jira] [Created] (TIKA-4268) Use title for embedded resource path in embedded msg files

2024-06-07 Thread Tim Allison (Jira)
Tim Allison created TIKA-4268: - Summary: Use title for embedded resource path in embedded msg files Key: TIKA-4268 URL: https://issues.apache.org/jira/browse/TIKA-4268 Project: Tika Issue Type

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852876#comment-17852876 ] Tim Allison edited comment on TIKA-4243 at 6/6/24 5:39 PM: --- I think our joint

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852876#comment-17852876 ] Tim Allison commented on TIKA-4243: --- I think our joint recent PR on TIKA-4252 accomplishes the goals

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852874#comment-17852874 ] Tim Allison commented on TIKA-4252: --- K. I think we're at "good enough" here. [~ndipiazza],

[jira] [Resolved] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4252. --- Resolution: Fixed > PipesClient#process - seems to lose the Fetch input metad

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852808#comment-17852808 ] Tim Allison commented on TIKA-4243: --- Oh, and documentation, lots of documentation. :LOL: > t

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852804#comment-17852804 ] Tim Allison edited comment on TIKA-4243 at 6/6/24 2:11 PM: --- Current status

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852804#comment-17852804 ] Tim Allison commented on TIKA-4243: --- Current status on TIKA-4243 -- works up through and including tika

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-04 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852098#comment-17852098 ] Tim Allison commented on TIKA-4243: --- Let me know if there are any objections to heading

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-04 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852097#comment-17852097 ] Tim Allison commented on TIKA-4243: --- K, I chatted briefly with [~ndipiazza] this morning. Unless

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 5:10 PM: --- I spent a bit

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 5:02 PM: --- I spent a bit

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 5:02 PM: --- I spent a bit

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 4:45 PM: --- I spent a bit

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 4:45 PM: --- I spent a bit

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851727#comment-17851727 ] Tim Allison commented on TIKA-4243: --- I spent a bit of time trying to serialize ParseContext, and I now

[jira] [Resolved] (TIKA-4260) Add parse context to the fetcher interface in 3.x

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4260. --- Resolution: Duplicate Turns out this is a duplicate. Onwards to TIKA-4243! > Add parse cont

[jira] [Created] (TIKA-4266) Improve multithreading and the xml parser pools in XMLUtils

2024-05-30 Thread Tim Allison (Jira)
Tim Allison created TIKA-4266: - Summary: Improve multithreading and the xml parser pools in XMLUtils Key: TIKA-4266 URL: https://issues.apache.org/jira/browse/TIKA-4266 Project: Tika Issue Type

[jira] [Resolved] (TIKA-4221) Regression in pack200 parsing in commons-compress

2024-05-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4221. --- Fix Version/s: 3.0.0 2.9.3 Resolution: Fixed Many thanks to [~ggregory

[jira] [Resolved] (TIKA-4220) Commons-compress too lenient on headless tar detection

2024-05-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4220. --- Fix Version/s: 3.0.0 2.9.3 Resolution: Fixed Many thanks to [~ggregory

[jira] [Commented] (TIKA-4265) Consider adding maven build cache extension

2024-05-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850776#comment-17850776 ] Tim Allison commented on TIKA-4265: --- It doesn't help at all if there's a modification in tika-core, even

[jira] [Commented] (TIKA-4265) Consider adding maven build cache extension

2024-05-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850773#comment-17850773 ] Tim Allison commented on TIKA-4265: --- I just pushed a demo to {{build-cache}}. This includes

[jira] [Created] (TIKA-4265) Consider adding maven build cache extension

2024-05-30 Thread Tim Allison (Jira)
Tim Allison created TIKA-4265: - Summary: Consider adding maven build cache extension Key: TIKA-4265 URL: https://issues.apache.org/jira/browse/TIKA-4265 Project: Tika Issue Type: Task

[jira] [Created] (TIKA-4261) Add attachment type metadata filter

2024-05-24 Thread Tim Allison (Jira)
Tim Allison created TIKA-4261: - Summary: Add attachment type metadata filter Key: TIKA-4261 URL: https://issues.apache.org/jira/browse/TIKA-4261 Project: Tika Issue Type: Task

[jira] [Resolved] (TIKA-4259) Decouple xml parser stuff from ParseContext

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4259. --- Fix Version/s: 3.0.0 Resolution: Fixed > Decouple xml parser stuff from ParseCont

[jira] [Commented] (TIKA-4260) Add parse context to the fetcher interface in 3.x

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849298#comment-17849298 ] Tim Allison commented on TIKA-4260: --- That PR currently only works on tika-core. More needs to be done

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849288#comment-17849288 ] Tim Allison commented on TIKA-4243: --- [~ndipiazza], I added parseContext to fetchers and emitters

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849103#comment-17849103 ] Tim Allison edited comment on TIKA-4243 at 5/24/24 1:00 PM: Proposed basic

[jira] [Created] (TIKA-4260) Add parse context to the fetcher interface in 3.x

2024-05-23 Thread Tim Allison (Jira)
Tim Allison created TIKA-4260: - Summary: Add parse context to the fetcher interface in 3.x Key: TIKA-4260 URL: https://issues.apache.org/jira/browse/TIKA-4260 Project: Tika Issue Type: Task

[jira] [Created] (TIKA-4259) Decouple xml parser stuff from ParseContext

2024-05-23 Thread Tim Allison (Jira)
Tim Allison created TIKA-4259: - Summary: Decouple xml parser stuff from ParseContext Key: TIKA-4259 URL: https://issues.apache.org/jira/browse/TIKA-4259 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849114#comment-17849114 ] Tim Allison commented on TIKA-4243: --- I'm going to start working on PRs that will be generally helpful

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849108#comment-17849108 ] Tim Allison commented on TIKA-4243: --- The downsides we see: a) if we there's agreement to add jackson

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849103#comment-17849103 ] Tim Allison commented on TIKA-4243: --- Proposed basic roadmap: Serialize ParseContext as is... Allow

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849101#comment-17849101 ] Tim Allison commented on TIKA-4243: --- Fellow devs, in chatting with Nicholas, we're thinking

[jira] [Resolved] (TIKA-4258) Multi-arch support for docker images

2024-05-21 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4258. --- Resolution: Fixed Just pushed 2.9.2.1/*-latest Thank you, all! > Multi-arch support for doc

multi-arch support for tika-docker!

2024-05-21 Thread Tim Allison
All, Many thanks to the many community members who helped figure this out and get it out the door! As of tika-docker 2.9.2.1, we now have multi-arch support (and on noble!). Let us know if there are any surprises. Thank you, again! Cheers, Tim Ref:

[jira] [Commented] (TIKA-4255) TextAndCSVParser ignores Metadata.CONTENT_ENCODING

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847980#comment-17847980 ] Tim Allison commented on TIKA-4255: --- Thank you for opening this PR. Are you able to add a small unit

[jira] [Resolved] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4256. --- Fix Version/s: 3.0.0 Resolution: Fixed > Allow inlining of ocr'd text in container docum

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847950#comment-17847950 ] Tim Allison commented on TIKA-4258: --- I'm sure I'll need to modify the PR when I actually go to run

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847949#comment-17847949 ] Tim Allison commented on TIKA-4258: --- Let's give it a day for fellow devs to weigh

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847943#comment-17847943 ] Tim Allison commented on TIKA-4258: --- And here's the full version: https://hub.docker.com/layers/apache

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847931#comment-17847931 ] Tim Allison commented on TIKA-4243: --- Separately, but related to this and also to TIKA-4252 -- should we

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847883#comment-17847883 ] Tim Allison commented on TIKA-4258: --- Helpful links from #infra: https://infra.apache.org/docker-hub

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847882#comment-17847882 ] Tim Allison commented on TIKA-4258: --- If fellow devs with better knowledge of github actions and docker

[jira] [Created] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
Tim Allison created TIKA-4258: - Summary: Multi-arch support for docker images Key: TIKA-4258 URL: https://issues.apache.org/jira/browse/TIKA-4258 Project: Tika Issue Type: Task

[jira] [Updated] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4256: -- Description: For legacy tika, we're inlining all content from embedded files including ocr content

[jira] [Updated] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4256: -- Description: For legacy tika, we're inlining all content from embedded files including ocr content

[jira] [Created] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-16 Thread Tim Allison (Jira)
Tim Allison created TIKA-4256: - Summary: Allow inlining of ocr'd text in container document Key: TIKA-4256 URL: https://issues.apache.org/jira/browse/TIKA-4256 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4137) Building current Tika main branch fails under Java 20/21

2024-05-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846697#comment-17846697 ] Tim Allison commented on TIKA-4137: --- Y, done just now. > Building current Tika main branch fails un

[jira] [Updated] (TIKA-4137) Building current Tika main branch fails under Java 20/21

2024-05-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4137: -- Fix Version/s: 2.9.3 > Building current Tika main branch fails under Java 20

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845081#comment-17845081 ] Tim Allison commented on TIKA-4252: --- fetchRequestMetadata, fetchResponseMetadata? > PipesClient#proc

[jira] [Comment Edited] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845072#comment-17845072 ] Tim Allison edited comment on TIKA-4252 at 5/9/24 5:14 PM: --- fetcher.fetch(String

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845072#comment-17845072 ] Tim Allison commented on TIKA-4252: --- fetcher.fetch(String key, Metadata writeMetadata, Metadata

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845068#comment-17845068 ] Tim Allison commented on TIKA-4252: --- Should we add an optional Metadata object to the FetchKey. We could

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845062#comment-17845062 ] Tim Allison commented on TIKA-4252: --- K, but you don't want that coming back and being populated

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845051#comment-17845051 ] Tim Allison commented on TIKA-4252: --- Or, if you mean that metadata gathered from the fetcher isn't

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845048#comment-17845048 ] Tim Allison commented on TIKA-4252: --- My initial thought for injecting user metadata was to pass through

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845047#comment-17845047 ] Tim Allison commented on TIKA-4252: --- I opened this branch: https://github.com/apache/tika/tree/TIKA-4252

[jira] [Reopened] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-4252: --- I pointed you to the wrong part of the code ... sorry. The design goal was to overwrite the extracted

[jira] [Commented] (TIKA-4253) Duplicate parsers loaded in AutoDetectParser in 3.x at least in some unit tests

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845022#comment-17845022 ] Tim Allison commented on TIKA-4253: --- This is happening in the unit tests because there are multiple

[jira] [Created] (TIKA-4253) Duplicate parsers loaded in AutoDetectParser in 3.x at least in some unit tests

2024-05-09 Thread Tim Allison (Jira)
Tim Allison created TIKA-4253: - Summary: Duplicate parsers loaded in AutoDetectParser in 3.x at least in some unit tests Key: TIKA-4253 URL: https://issues.apache.org/jira/browse/TIKA-4253 Project: Tika

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844998#comment-17844998 ] Tim Allison commented on TIKA-4252: --- Good catch: https://github.com/apache/tika/blob/main/tika-core/src

[jira] [Comment Edited] (TIKA-4250) Add a libpst-based parser

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844976#comment-17844976 ] Tim Allison edited comment on TIKA-4250 at 5/9/24 12:59 PM: libpst issue

[jira] [Commented] (TIKA-4250) Add a libpst-based parser

2024-05-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844976#comment-17844976 ] Tim Allison commented on TIKA-4250: --- libpff issue opened: https://github.com/libyal/libpff/issues/128

3.0.0-BETA2 release?

2024-05-07 Thread Tim Allison
All, I'd like to go for another 3.x beta release and then move fairly quickly to a 3.0.0 release. I was hoping that https://issues.apache.org/jira/browse/TIKA-4221 would be wrapped up soon. It hasn't been, but I can add the workaround we did in 2.x. What do you think? Any blockers?

  1   2   3   4   5   6   7   8   9   10   >