RE: Adding arguments to configure tika from the rest calls

2023-02-20 Thread Julien Massiera
entation: https://cwiki.apache.org/confluence/display/TIKA/Configuring+Parsers+At+Parse+Time+in+tika-server Please let me know if you have any questions or want write access to improve the documentation! On Wed, Feb 15, 2023 at 11:07 AM Julien Massiera wrote: > > Hi Tim, > > bounc

RE: Adding arguments to configure tika from the rest calls

2023-02-15 Thread Julien Massiera
Hi Tim, bouncing back on our mail thread, could you share more documentation on how to use the header to configure the PDFParser on the fly ? Thanks, Julien -Message d'origine- De : Julien Massiera Envoyé : vendredi 3 février 2023 13:08 À : dev@tika.apache.org Objet : RE: A

RE: Adding arguments to configure tika from the rest calls

2023-02-03 Thread Julien Massiera
Hi Tim, The NER Parse config via headers like the PDFParserConfig sounds an interesting approach but I have just discovered that feature thanks to your reply and I tried to find a documentation about this, unfortunately the only thing I found was a TBD note on that page https://cwiki.apache.or

[jira] [Created] (TIKA-3958) Add tika-parser-nlp-package to release artifacts

2023-01-18 Thread Julien Massiera (Jira)
Julien Massiera created TIKA-3958: - Summary: Add tika-parser-nlp-package to release artifacts Key: TIKA-3958 URL: https://issues.apache.org/jira/browse/TIKA-3958 Project: Tika Issue Type

NER parsers package for Tika Server

2023-01-10 Thread Julien Massiera
Hi Tim, First, I would like to wish you all the best for 2023 ! I am writing because I am using NER parsers with Tika Server, but to do so, I had to build the NER package myself from the Tika repository. Indeed, for Tika Server 2.x, I did not find any NER pre-made package to add to the cla

RE: Re: Next releases?

2022-09-05 Thread Julien Massiera
Hi Tim, +1 for new tika releases on my side Regards, Julien On 2022/08/29 18:24:40 Tim Allison wrote: > Are we in decent shape to start the release processes for 1.x and 2.x? > > Maybe start 1.x this week and 2.x next week? > > Any blockers? > > Best, > > Tim > > On Wed, Aug 1

RE: [DISCUSS] support for Java 8?

2022-03-25 Thread Julien Massiera
Hi Tim, from our side we already dropped java 8 and only support java 11, so it would not be a problem for us Cheers, Julien -Message d'origine- De : Tim Allison Envoyé : vendredi 25 mars 2022 15:47 À : Objet : [DISCUSS] support for Java 8? All, I'm somewhat interested in moving

[jira] [Commented] (TIKA-3695) LimitingMetadataFilter

2022-03-23 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511180#comment-17511180 ] Julien Massiera commented on TIKA-3695: --- It is a good idea to set a hard limi

[jira] [Commented] (TIKA-3695) LimitingMetadataFilter

2022-03-22 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510715#comment-17510715 ] Julien Massiera commented on TIKA-3695: --- Indeed, I get the following result fil

[jira] [Commented] (TIKA-3695) LimitingMetadataFilter

2022-03-22 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510706#comment-17510706 ] Julien Massiera commented on TIKA-3695: --- Yes, I was about to test and it is

[jira] [Commented] (TIKA-3695) LimitingMetadataFilter

2022-03-19 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509316#comment-17509316 ] Julien Massiera commented on TIKA-3695: --- Concerning the X-TIKA:EXCEPTION

[jira] [Commented] (TIKA-3695) LimitingMetadataFilter

2022-03-19 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509315#comment-17509315 ] Julien Massiera commented on TIKA-3695: --- I think that the minimum is to count

[jira] [Commented] (TIKA-3695) LimitingMetadataFilter

2022-03-19 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509249#comment-17509249 ] Julien Massiera commented on TIKA-3695: --- Thanks Tim, with the conf you provid

RE: Problems with pingTimeoutMillis, pingPulseMillis and javaHome parameters

2022-03-19 Thread Julien Massiera
1.x into config.xml in 2.x, I didn't imagine users would want more complexity. I can add them back if you want them. On Fri, Mar 18, 2022 at 2:17 PM Julien Massiera wrote: > > Hi, > > > > I am currently testing the current trunk Tika server-standard 2.3.1. > Everyt

[jira] [Commented] (TIKA-3695) LimitingMetadataFilter

2022-03-18 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509041#comment-17509041 ] Julien Massiera commented on TIKA-3695: --- I am not sure I understood how it work

Problems with pingTimeoutMillis, pingPulseMillis and javaHome parameters

2022-03-18 Thread Julien Massiera
Hi, I am currently testing the current trunk Tika server-standard 2.3.1. Everything works fine with the config tika-config.xml except for three parameters : pingTimeoutMillis, pingPulseMillis and javaHome Indeed, when I try to set them in the config file like specified in the documentation

[jira] [Commented] (TIKA-3695) LimitingMetadataFilter

2022-03-17 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508129#comment-17508129 ] Julien Massiera commented on TIKA-3695: --- I took a better look at the code

[jira] [Commented] (TIKA-3695) LimitingMetadataFilter

2022-03-16 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507662#comment-17507662 ] Julien Massiera commented on TIKA-3695: --- [~tallison] concerning that point &

[jira] [Created] (TIKA-3695) LimitingMetadataFilter

2022-03-08 Thread Julien Massiera (Jira)
Julien Massiera created TIKA-3695: - Summary: LimitingMetadataFilter Key: TIKA-3695 URL: https://issues.apache.org/jira/browse/TIKA-3695 Project: Tika Issue Type: New Feature

Tika server limit metadata

2022-03-07 Thread Julien Massiera
Hi Tim, We identified cases where pdf files may contain abnormaly big metadata (several MB, be it for the metadata values, the metadata names, but also for the total amount of metadata). Some time ago, I proposed the creation of a "writeLimit" header in Tika Server (and you accepted to implemen

[jira] [Commented] (TIKA-3372) Fix writelimit in recursiveparserhandler

2021-05-03 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338281#comment-17338281 ] Julien Massiera commented on TIKA-3372: --- [~tallison] the fix works !  So for

[jira] [Comment Edited] (TIKA-3372) Fix writelimit in recursiveparserhandler

2021-04-30 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337445#comment-17337445 ] Julien Massiera edited comment on TIKA-3372 at 4/30/21, 3:1

[jira] [Commented] (TIKA-3372) Fix writelimit in recursiveparserhandler

2021-04-30 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337445#comment-17337445 ] Julien Massiera commented on TIKA-3372: --- [~tallison] so I tested on a 1.27 b

[jira] [Commented] (TIKA-3372) Fix writelimit in recursiveparserhandler

2021-04-27 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333168#comment-17333168 ] Julien Massiera commented on TIKA-3372: --- Concerning the behavior you desc

[jira] [Comment Edited] (TIKA-3372) Fix writelimit in recursiveparserhandler

2021-04-27 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333027#comment-17333027 ] Julien Massiera edited comment on TIKA-3372 at 4/27/21, 8:0

[jira] [Commented] (TIKA-3372) Fix writelimit in recursiveparserhandler

2021-04-27 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333027#comment-17333027 ] Julien Massiera commented on TIKA-3372: --- [~tallison] here is my use case :  I

[jira] [Commented] (TIKA-3325) Add header to limit extracted content

2021-03-18 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304306#comment-17304306 ] Julien Massiera commented on TIKA-3325: --- Indeed, I see no reason one would wan

[jira] [Commented] (TIKA-3325) Add header to limit extracted content

2021-03-16 Thread Julien Massiera (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302577#comment-17302577 ] Julien Massiera commented on TIKA-3325: --- [~tallison] yes please, it would be re

[jira] [Created] (TIKA-3325) Add header to limit extracted content

2021-03-16 Thread Julien Massiera (Jira)
Julien Massiera created TIKA-3325: - Summary: Add header to limit extracted content Key: TIKA-3325 URL: https://issues.apache.org/jira/browse/TIKA-3325 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-2881) Obsolete MircrosoftTranslator implementation

2019-05-23 Thread Julien Massiera (JIRA)
Julien Massiera created TIKA-2881: - Summary: Obsolete MircrosoftTranslator implementation Key: TIKA-2881 URL: https://issues.apache.org/jira/browse/TIKA-2881 Project: Tika Issue Type: Bug

[jira] [Created] (TIKA-2753) ChildProcess does not use the JAVA_HOME

2018-10-12 Thread Julien Massiera (JIRA)
Julien Massiera created TIKA-2753: - Summary: ChildProcess does not use the JAVA_HOME Key: TIKA-2753 URL: https://issues.apache.org/jira/browse/TIKA-2753 Project: Tika Issue Type: Bug

[jira] [Created] (TIKA-2371) Check properties presence - PDFParser

2017-05-18 Thread Julien Massiera (JIRA)
Julien Massiera created TIKA-2371: - Summary: Check properties presence - PDFParser Key: TIKA-2371 URL: https://issues.apache.org/jira/browse/TIKA-2371 Project: Tika Issue Type: Improvement