[jira] [Commented] (TIKA-3351) Make list of parsers in metadata unique

2021-10-08 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426434#comment-17426434 ] Peter Kronenberg commented on TIKA-3351: [~tallison] ended up checking in a fix > Make list of

[jira] [Commented] (TIKA-3361) Improve intelligence of OCRStrategy=AUTO

2021-07-16 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382256#comment-17382256 ] Peter Kronenberg commented on TIKA-3361: Finally got a chance to finish this Pull Request >

[jira] [Commented] (TIKA-3361) Improve intelligence of OCRStrategy=AUTO

2021-05-20 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17348945#comment-17348945 ] Peter Kronenberg commented on TIKA-3361: The code already explicitly checks for that.  But I'll

[jira] [Commented] (TIKA-3361) Improve intelligence of OCRStrategy=AUTO

2021-05-20 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17348844#comment-17348844 ] Peter Kronenberg commented on TIKA-3361: No problem, I can take care of it.  What kind of range

[jira] [Commented] (TIKA-3361) Improve intelligence of OCRStrategy=AUTO

2021-05-20 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17348689#comment-17348689 ] Peter Kronenberg commented on TIKA-3361: [~tallison] Still thinking? :) > Improve intelligence

[jira] [Commented] (TIKA-3361) Improve intelligence of OCRStrategy=AUTO

2021-05-11 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342725#comment-17342725 ] Peter Kronenberg commented on TIKA-3361: Yes, I agree it would be great to have additional

[jira] [Commented] (TIKA-3361) Improve intelligence of OCRStrategy=AUTO

2021-05-07 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341027#comment-17341027 ] Peter Kronenberg commented on TIKA-3361: So I'd like to try to restart this conversation. Here is

[jira] [Comment Edited] (TIKA-3361) Improve intelligence of OCRStrategy=AUTO

2021-04-20 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325904#comment-17325904 ] Peter Kronenberg edited comment on TIKA-3361 at 4/20/21, 3:25 PM: -- Yes,

[jira] [Commented] (TIKA-3361) Improve intelligence of OCRStrategy=AUTO

2021-04-20 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325904#comment-17325904 ] Peter Kronenberg commented on TIKA-3361: Yes, theoretically, you're correct.  You could argue that

[jira] [Created] (TIKA-3361) Improve intelligence of OCRStrategy=AUTO

2021-04-17 Thread Peter Kronenberg (Jira)
Peter Kronenberg created TIKA-3361: -- Summary: Improve intelligence of OCRStrategy=AUTO Key: TIKA-3361 URL: https://issues.apache.org/jira/browse/TIKA-3361 Project: Tika Issue Type:

[jira] [Commented] (TIKA-3351) Make list of parsers in metadata unique

2021-04-13 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320286#comment-17320286 ] Peter Kronenberg commented on TIKA-3351: Sure, can't wait to see how you fixed it > Make list of

[jira] [Commented] (TIKA-3351) Make list of parsers in metadata unique

2021-04-12 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319574#comment-17319574 ] Peter Kronenberg commented on TIKA-3351: Not sure if this was the best way, but I had to create a

[jira] [Created] (TIKA-3351) Make list of parsers in metadata unique

2021-04-12 Thread Peter Kronenberg (Jira)
Peter Kronenberg created TIKA-3351: -- Summary: Make list of parsers in metadata unique Key: TIKA-3351 URL: https://issues.apache.org/jira/browse/TIKA-3351 Project: Tika Issue Type:

[jira] [Commented] (TIKA-3343) Remove Tika custom lang detection for 2.x

2021-03-31 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312405#comment-17312405 ] Peter Kronenberg commented on TIKA-3343: If I don't have Optimaize in my pom, then the

[jira] [Commented] (TIKA-3343) Remove Tika custom lang detection for 2.x

2021-03-31 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312366#comment-17312366 ] Peter Kronenberg commented on TIKA-3343: I might be mistaken, due to my confusion at Tika having

[jira] [Commented] (TIKA-3343) Remove Tika custom lang detection for 2.x

2021-03-30 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311785#comment-17311785 ] Peter Kronenberg commented on TIKA-3343: I'm using this functionality. I don't care if it's

[jira] [Commented] (TIKA-3313) Improve performance and usability of RereadableInputStream

2021-03-15 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17301635#comment-17301635 ] Peter Kronenberg commented on TIKA-3313: Just created a pull request for this issue. Any

[jira] [Commented] (TIKA-3310) MP4 video detected as application/mp4

2021-03-11 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299574#comment-17299574 ] Peter Kronenberg commented on TIKA-3310: [~nick] What's the delay in merging this? > MP4 video

[jira] [Commented] (TIKA-3310) MP4 video detected as application/mp4

2021-03-08 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297730#comment-17297730 ] Peter Kronenberg commented on TIKA-3310: Can you approve the Pull Request?  I can backport it if

[jira] [Commented] (TIKA-3310) MP4 video detected as application/mp4

2021-03-08 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297417#comment-17297417 ] Peter Kronenberg commented on TIKA-3310: [~nick] are you good with the latest version? > MP4

[jira] [Commented] (TIKA-3310) MP4 video detected as application/mp4

2021-03-05 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296246#comment-17296246 ] Peter Kronenberg commented on TIKA-3310: Ok, I've gone ahead and separated them. It searches for

[jira] [Commented] (TIKA-3310) MP4 video detected as application/mp4

2021-03-05 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296122#comment-17296122 ] Peter Kronenberg commented on TIKA-3310: oh yeah, you're right > MP4 video detected as

[jira] [Commented] (TIKA-3310) MP4 video detected as application/mp4

2021-03-05 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296085#comment-17296085 ] Peter Kronenberg commented on TIKA-3310: done > MP4 video detected as application/mp4 >

[jira] [Commented] (TIKA-3310) MP4 video detected as application/mp4

2021-03-05 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296080#comment-17296080 ] Peter Kronenberg commented on TIKA-3310: Ah, now I understand what you're saying. Ok, I will

[jira] [Commented] (TIKA-3310) MP4 video detected as application/mp4

2021-03-05 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296065#comment-17296065 ] Peter Kronenberg commented on TIKA-3310: [~nick] What are your current thoughts? Do you still

[jira] [Updated] (TIKA-3313) Improve performance and usability of RereadableInputStream

2021-03-04 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Kronenberg updated TIKA-3313: --- Description: I was challenged by the following comment in RereadableInputStream: {code:java}

[jira] [Created] (TIKA-3313) Improve performance and usability of RereadableInputStream

2021-03-04 Thread Peter Kronenberg (Jira)
Peter Kronenberg created TIKA-3313: -- Summary: Improve performance and usability of RereadableInputStream Key: TIKA-3313 URL: https://issues.apache.org/jira/browse/TIKA-3313 Project: Tika

[jira] [Comment Edited] (TIKA-3310) MP4 video detected as application/mp4

2021-03-04 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295360#comment-17295360 ] Peter Kronenberg edited comment on TIKA-3310 at 3/4/21, 3:43 PM: -  

[jira] [Commented] (TIKA-3310) MP4 video detected as application/mp4

2021-03-04 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295360#comment-17295360 ] Peter Kronenberg commented on TIKA-3310:   ||Major Brand matches||Compatible Brand

[jira] [Commented] (TIKA-3310) MP4 video detected as application/mp4

2021-03-04 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295359#comment-17295359 ] Peter Kronenberg commented on TIKA-3310: Well, if both a major and compatible brands match, that

[jira] [Commented] (TIKA-3310) MP4 video detected as application/mp4

2021-03-04 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295341#comment-17295341 ] Peter Kronenberg commented on TIKA-3310: I'm going to delete my fork on Github and re-create it.

[jira] [Commented] (TIKA-3310) MP4 video detected as application/mp4

2021-03-04 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295335#comment-17295335 ] Peter Kronenberg commented on TIKA-3310: Yeah, the InputStream stuff is my commit from yesterday,

[jira] [Commented] (TIKA-3310) MP4 video detected as application/mp4

2021-03-04 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295325#comment-17295325 ] Peter Kronenberg commented on TIKA-3310: I went ahead and created a pull request, on the hope that

[jira] [Created] (TIKA-3310) MP4 video detected as application/mp4

2021-03-02 Thread Peter Kronenberg (Jira)
Peter Kronenberg created TIKA-3310: -- Summary: MP4 video detected as application/mp4 Key: TIKA-3310 URL: https://issues.apache.org/jira/browse/TIKA-3310 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-3309) Add additional construtors for RereadableInputStream

2021-03-02 Thread Peter Kronenberg (Jira)
Peter Kronenberg created TIKA-3309: -- Summary: Add additional construtors for RereadableInputStream Key: TIKA-3309 URL: https://issues.apache.org/jira/browse/TIKA-3309 Project: Tika Issue

[jira] [Commented] (TIKA-3255) Parsing MP3 file with record size > 100000 fails

2021-03-01 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293024#comment-17293024 ] Peter Kronenberg commented on TIKA-3255: Confirmed.  Thank you > Parsing MP3 file with record

[jira] [Commented] (TIKA-3255) Parsing MP3 file with record size > 100000 fails

2021-02-27 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292241#comment-17292241 ] Peter Kronenberg commented on TIKA-3255: Is there a resolution on this problem? > Parsing MP3

[jira] [Commented] (TIKA-94) Speech recognition

2021-02-19 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17287077#comment-17287077 ] Peter Kronenberg commented on TIKA-94: -- Fyi, one of the reasons I couldn't use Amazon Transcribe or

[jira] [Commented] (TIKA-94) Speech recognition

2021-02-19 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17287069#comment-17287069 ] Peter Kronenberg commented on TIKA-94: -- The documentation leaves much to be desired. You can download

[jira] [Commented] (TIKA-3298) Add a "preloadLangs" parameter to TesseractOCRParser

2021-02-16 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285432#comment-17285432 ] Peter Kronenberg commented on TIKA-3298: ok, better :).  Was still hoping there was a way to do

[jira] [Commented] (TIKA-94) Speech recognition

2021-02-12 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284051#comment-17284051 ] Peter Kronenberg commented on TIKA-94: -- I've been doing my own research on speech recognition,

[jira] [Commented] (TIKA-3298) Add a "preloadLangs" parameter to TesseractOCRParser

2021-02-12 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284023#comment-17284023 ] Peter Kronenberg commented on TIKA-3298: Admittedly not being as familiar with the code I didn't

[jira] [Commented] (TIKA-3298) Add a "preloadLangs" parameter to TesseractOCRParser

2021-02-12 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283978#comment-17283978 ] Peter Kronenberg commented on TIKA-3298: [~tallison] This code is assuming that

[jira] [Commented] (TIKA-3263) WriteLimitReachedException is not public

2021-02-12 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283860#comment-17283860 ] Peter Kronenberg commented on TIKA-3263: I might be willing to work on this if we can agree on a

[jira] [Commented] (TIKA-3298) Add a "preloadLangs" parameter to TesseractOCRParser

2021-02-11 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283163#comment-17283163 ] Peter Kronenberg commented on TIKA-3298: So will it be possible to externalize this?  A class I

[jira] [Comment Edited] (TIKA-3298) Add a "preloadLangs" parameter to TesseractOCRParser

2021-02-11 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283163#comment-17283163 ] Peter Kronenberg edited comment on TIKA-3298 at 2/11/21, 4:48 PM: -- So

[jira] [Commented] (TIKA-3298) Add a "preloadLangs" parameter to TesseractOCRParser

2021-02-11 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283133#comment-17283133 ] Peter Kronenberg commented on TIKA-3298: ah, maybe that was the problem.  Random quantum

[jira] [Commented] (TIKA-3298) Add a "preloadLangs" parameter to TesseractOCRParser

2021-02-11 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283038#comment-17283038 ] Peter Kronenberg commented on TIKA-3298: Hmm, seems to work now.  Not sure what happened  

[jira] [Updated] (TIKA-3298) Add a "preloadLangs" parameter to TesseractOCRParser

2021-02-11 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Kronenberg updated TIKA-3298: --- Attachment: image-2021-02-11-08-56-38-712.png > Add a "preloadLangs" parameter to

[jira] [Commented] (TIKA-3298) Add a "preloadLangs" parameter to TesseractOCRParser

2021-02-10 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282787#comment-17282787 ] Peter Kronenberg commented on TIKA-3298: For some reason, your code is leaving out the scripts.  I

[jira] [Updated] (TIKA-3298) Add a "preloadLangs" parameter to TesseractOCRParser

2021-02-10 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Kronenberg updated TIKA-3298: --- Attachment: image-2021-02-10-19-00-10-691.png > Add a "preloadLangs" parameter to

[jira] [Updated] (TIKA-3298) Add a "preloadLangs" parameter to TesseractOCRParser

2021-02-10 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Kronenberg updated TIKA-3298: --- Attachment: image-2021-02-10-18-59-47-793.png > Add a "preloadLangs" parameter to

[jira] [Commented] (TIKA-3298) Add a "preloadLangs" parameter to TesseractOCRParser

2021-02-10 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282780#comment-17282780 ] Peter Kronenberg commented on TIKA-3298: Great idea!   Can this be externalized so I can call

[jira] [Commented] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-10 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282613#comment-17282613 ] Peter Kronenberg commented on TIKA-3297: I also would like to be able to use getTesseractPath(),

[jira] [Comment Edited] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-10 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282543#comment-17282543 ] Peter Kronenberg edited comment on TIKA-3297 at 2/10/21, 4:15 PM: -- The

[jira] [Commented] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-10 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282543#comment-17282543 ] Peter Kronenberg commented on TIKA-3297: The reason I want to explicitly set the path to Tika is

[jira] [Comment Edited] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-10 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282503#comment-17282503 ] Peter Kronenberg edited comment on TIKA-3297 at 2/10/21, 3:21 PM: -- The

[jira] [Commented] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-10 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282503#comment-17282503 ] Peter Kronenberg commented on TIKA-3297: The use case for setting the paths at runtime is that I

[jira] [Commented] (TIKA-3296) Allow tesseract/tessdata path to be specified by environment variables

2021-02-09 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282115#comment-17282115 ] Peter Kronenberg commented on TIKA-3296: Because I want to be able to package the jar to run in

[jira] [Commented] (TIKA-3296) Allow tesseract/tessdata path to be specified by environment variables

2021-02-08 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281441#comment-17281441 ] Peter Kronenberg commented on TIKA-3296: Well, if I can get the tika-config working properly then

[jira] [Comment Edited] (TIKA-3296) Allow tesseract/tessdata path to be specified by environment variables

2021-02-08 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281365#comment-17281365 ] Peter Kronenberg edited comment on TIKA-3296 at 2/8/21, 8:58 PM: - The

[jira] [Commented] (TIKA-3296) Allow tesseract/tessdata path to be specified by environment variables

2021-02-08 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281367#comment-17281367 ] Peter Kronenberg commented on TIKA-3296: Of course, the downside is having to maintain the

[jira] [Comment Edited] (TIKA-3296) Allow tesseract/tessdata path to be specified by environment variables

2021-02-08 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281365#comment-17281365 ] Peter Kronenberg edited comment on TIKA-3296 at 2/8/21, 8:50 PM: - The

[jira] [Commented] (TIKA-3296) Allow tesseract/tessdata path to be specified by environment variables

2021-02-08 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281365#comment-17281365 ] Peter Kronenberg commented on TIKA-3296: The problem I originally had with tika-config is there

[jira] [Comment Edited] (TIKA-3296) Allow tesseract/tessdata path to be specified by environment variables

2021-02-08 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281336#comment-17281336 ] Peter Kronenberg edited comment on TIKA-3296 at 2/8/21, 7:57 PM: - Would

[jira] [Commented] (TIKA-3296) Allow tesseract/tessdata path to be specified by environment variables

2021-02-08 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281336#comment-17281336 ] Peter Kronenberg commented on TIKA-3296: Would like to get some thoughts on this.  I originally

[jira] [Commented] (TIKA-3296) Allow tesseract/tessdata path to be specified by environment variables

2021-02-08 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281176#comment-17281176 ] Peter Kronenberg commented on TIKA-3296: Created pull request > Allow tesseract/tessdata path to

[jira] [Created] (TIKA-3296) Allow tesseract/tessdata path to be specified by environment variables

2021-02-08 Thread Peter Kronenberg (Jira)
Peter Kronenberg created TIKA-3296: -- Summary: Allow tesseract/tessdata path to be specified by environment variables Key: TIKA-3296 URL: https://issues.apache.org/jira/browse/TIKA-3296 Project: Tika

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-04 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279238#comment-17279238 ] Peter Kronenberg commented on TIKA-3286: Ok that's what I get for trying to prematurely optimize

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-04 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279134#comment-17279134 ] Peter Kronenberg commented on TIKA-3286: Ah, ok.  I had purposely made that static because I

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-04 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279044#comment-17279044 ] Peter Kronenberg commented on TIKA-3286: Well, the nature of the most recent change is that there

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-04 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279042#comment-17279042 ] Peter Kronenberg commented on TIKA-3286: I can't reproduce.  Which directories are you specifying

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-04 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279030#comment-17279030 ] Peter Kronenberg commented on TIKA-3286: Hmm, let me take a look.  Must be a regression error >

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-04 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278933#comment-17278933 ] Peter Kronenberg commented on TIKA-3286: So does that mean you're more comfortable with what I

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-04 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278827#comment-17278827 ] Peter Kronenberg commented on TIKA-3286: I've updated the PR > Tika does not issue an error when

[jira] [Comment Edited] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-03 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278380#comment-17278380 ] Peter Kronenberg edited comment on TIKA-3286 at 2/3/21, 10:04 PM: --  This

[jira] [Comment Edited] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-03 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278380#comment-17278380 ] Peter Kronenberg edited comment on TIKA-3286 at 2/3/21, 9:56 PM: -  This is

[jira] [Comment Edited] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-03 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278380#comment-17278380 ] Peter Kronenberg edited comment on TIKA-3286 at 2/3/21, 9:55 PM: -  This is

[jira] [Comment Edited] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-03 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278380#comment-17278380 ] Peter Kronenberg edited comment on TIKA-3286 at 2/3/21, 9:54 PM: -  This is

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-03 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278380#comment-17278380 ] Peter Kronenberg commented on TIKA-3286:  ** This is just looking for the same executable that we

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-03 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278362#comment-17278362 ] Peter Kronenberg commented on TIKA-3286: Would you feel better if, instead of having the

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-03 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278321#comment-17278321 ] Peter Kronenberg commented on TIKA-3286: Unfortunately, there is no default location for Windows,

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-03 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278304#comment-17278304 ] Peter Kronenberg commented on TIKA-3286: I went ahead and submitted my PR.  Take a look and see

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-03 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278296#comment-17278296 ] Peter Kronenberg commented on TIKA-3286: Well, it's a marginal improvement.  But I still feel that

[jira] [Comment Edited] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-03 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278027#comment-17278027 ] Peter Kronenberg edited comment on TIKA-3286 at 2/3/21, 2:31 PM: - It does,

[jira] [Comment Edited] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-03 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278027#comment-17278027 ] Peter Kronenberg edited comment on TIKA-3286 at 2/3/21, 2:30 PM: - It does,

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-03 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278027#comment-17278027 ] Peter Kronenberg commented on TIKA-3286: It does, but as you can see in my screenshot. it issues 4

[jira] [Comment Edited] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-02 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276335#comment-17276335 ] Peter Kronenberg edited comment on TIKA-3286 at 2/2/21, 3:27 PM: -

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-02-01 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276335#comment-17276335 ] Peter Kronenberg commented on TIKA-3286: [~tallison] Assuming you are not vehemently opposed to

[jira] [Comment Edited] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-01-29 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275265#comment-17275265 ] Peter Kronenberg edited comment on TIKA-3286 at 1/29/21, 7:05 PM: -- Well,

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-01-29 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275265#comment-17275265 ] Peter Kronenberg commented on TIKA-3286: Well, it would be even earlier if we throw the exception

[jira] [Commented] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-01-29 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275230#comment-17275230 ] Peter Kronenberg commented on TIKA-3286: I figured out that the reason it appeared that Tika is

[jira] [Updated] (TIKA-3286) Tika does not issue an error when language file doesn't exist; not supporting script files

2021-01-28 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Kronenberg updated TIKA-3286: --- Summary: Tika does not issue an error when language file doesn't exist; not supporting script

[jira] [Updated] (TIKA-3286) Tika not issue an error when language file doesn't exist; not supporting script files

2021-01-28 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Kronenberg updated TIKA-3286: --- Description: Tika uses a regular expression to validate the language string, assuming it is

[jira] [Updated] (TIKA-3286) Tika not issue an error when language file doesn't exist; not supporting script files

2021-01-28 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Kronenberg updated TIKA-3286: --- Description: Tika uses a regular expression to validate the language string, assuming it is

[jira] [Updated] (TIKA-3286) Tika not issue an error when language file doesn't exist; not supporting script files

2021-01-28 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Kronenberg updated TIKA-3286: --- Description: Tika uses a regular expression to validate the language string, assuming it is

[jira] [Updated] (TIKA-3286) Tika not issue an error when language file doesn't exist; not supporting script files

2021-01-28 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Kronenberg updated TIKA-3286: --- Description: Tika uses a regular expression to validate the language string, assuming it is

[jira] [Updated] (TIKA-3286) Tika not issue an error when language file doesn't exist; not supporting script files

2021-01-28 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Kronenberg updated TIKA-3286: --- Attachment: nolang.png > Tika not issue an error when language file doesn't exist; not

[jira] [Updated] (TIKA-3286) Tika not issue an error when language file doesn't exist; not supporting script files

2021-01-28 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Kronenberg updated TIKA-3286: --- Description: Tika uses a regular expression to validate the language string, assuming it is

[jira] [Updated] (TIKA-3286) Tika not issue an error when language file doesn't exist; not supporting script files

2021-01-28 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Kronenberg updated TIKA-3286: --- Attachment: (was: image-2021-01-28-13-57-44-888.png) > Tika not issue an error when

  1   2   >