[jira] [Created] (TIKA-3497) Update README for installing Tika Server as a service for 2.0 release

2021-07-24 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-3497: - Summary: Update README for installing Tika Server as a service for 2.0 release Key: TIKA-3497 URL: https://issues.apache.org/jira/browse/TIKA-3497 Project: Tika

[jira] [Commented] (TIKA-3495) parent-child in solr emitter doesn't seem to include parent id (_nest_parent_)

2021-07-23 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386334#comment-17386334 ] David Eric Pugh commented on TIKA-3495: --- Looking at that json file you linked to, nest_parent is of

[jira] [Comment Edited] (TIKA-3495) parent-child in solr emitter doesn't seem to include parent id (_nest_parent_)

2021-07-23 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386315#comment-17386315 ] David Eric Pugh edited comment on TIKA-3495 at 7/23/21, 3:44 PM: - This

[jira] [Commented] (TIKA-3495) parent-child in solr emitter doesn't seem to include parent id (_nest_parent_)

2021-07-23 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386315#comment-17386315 ] David Eric Pugh commented on TIKA-3495: --- This area of Solr has been changing a bit.  According to

[jira] [Commented] (TIKA-1570) Seeking a stop method for better use with Apache Commons Daemon

2021-05-13 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343965#comment-17343965 ] David Eric Pugh commented on TIKA-1570: --- The associated pr seems reasonable, would be nice to have

[jira] [Commented] (TIKA-1570) Seeking a stop method for better use with Apache Commons Daemon

2021-05-13 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343963#comment-17343963 ] David Eric Pugh commented on TIKA-1570: --- I might suggest trying to go down the docker on windows

[jira] [Commented] (TIKA-1570) Seeking a stop method for better use with Apache Commons Daemon

2021-05-13 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343962#comment-17343962 ] David Eric Pugh commented on TIKA-1570: --- Unfortunately they are Linux only. However I have used

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259809#comment-17259809 ] David Eric Pugh commented on TIKA-3258: --- I'm thinking that this is a pointer towards two general

[jira] [Commented] (TIKA-3166) Actually maven-modularize the packages for 2.0

2020-08-20 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181262#comment-17181262 ] David Eric Pugh commented on TIKA-3166: --- I did a diff, and while I can't say that I read through it

[jira] [Commented] (TIKA-3093) Enable tika-server to forward parse results to another endpoint

2020-04-24 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091703#comment-17091703 ] David Eric Pugh commented on TIKA-3093: --- Out of curiosity, is this type of behavior, the "Let me

[jira] [Commented] (TIKA-2368) Clean up SentimentParser dependencies

2020-04-06 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076673#comment-17076673 ] David Eric Pugh commented on TIKA-2368: --- I'm actually not sure I touched {{SentimentParser}}, as

[jira] [Commented] (TIKA-2368) Clean up SentimentParser dependencies

2020-04-06 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076501#comment-17076501 ] David Eric Pugh commented on TIKA-2368: --- In [https://github.com/apache/tika/pull/316] I messed with

[jira] [Commented] (TIKA-3075) Add an HTTP parser

2020-03-19 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062619#comment-17062619 ] David Eric Pugh commented on TIKA-3075: --- Not sure I understand what this issue is about? As in be

[jira] [Commented] (TIKA-3035) Tika-app --extract mode outputs to stderr instead of stdout

2020-02-25 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044533#comment-17044533 ] David Eric Pugh commented on TIKA-3035: --- Tried it with tika-app-1.23.jar and worked great. It

[jira] [Commented] (TIKA-3035) Tika-app --extract mode outputs to stderr instead of stdout

2020-02-25 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044406#comment-17044406 ] David Eric Pugh commented on TIKA-3035: --- Here is my command: java -cp tika-app-1.23-SNAPSHOT.jar

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-24 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043796#comment-17043796 ] David Eric Pugh commented on TIKA-3037: --- [~tallison]did you see the gettingstarted.apt patch file?

[jira] [Commented] (TIKA-3038) Miredot license key expired

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030925#comment-17030925 ] David Eric Pugh commented on TIKA-3038: --- Also, the url for the plugin has changed from https to just

[jira] [Created] (TIKA-3038) Miredot license key expired

2020-02-05 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-3038: - Summary: Miredot license key expired Key: TIKA-3038 URL: https://issues.apache.org/jira/browse/TIKA-3038 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-2253) Obtain new Miredot license key and upgrade plugin version in tika-server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030904#comment-17030904 ] David Eric Pugh commented on TIKA-2253: --- Hi all...The license has expired ;-) > Obtain new

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030900#comment-17030900 ] David Eric Pugh commented on TIKA-3037: --- Okay, I've attached a SVN DIFF patch file to the

[jira] [Updated] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Eric Pugh updated TIKA-3037: -- Attachment: gettingstarted.apt.patch > Tika Docs should highlight Tika-Server >

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030862#comment-17030862 ] David Eric Pugh commented on TIKA-3037: --- Okay, in

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030806#comment-17030806 ] David Eric Pugh commented on TIKA-3037: --- I put some edits into the wiki at

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030766#comment-17030766 ] David Eric Pugh commented on TIKA-3037: --- Thanks [~nick] > Tika Docs should highlight Tika-Server >

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030760#comment-17030760 ] David Eric Pugh commented on TIKA-3037: --- Another comment, so the page

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030748#comment-17030748 ] David Eric Pugh commented on TIKA-3037: --- So... Where does the HTML for the website live? What is

[jira] [Created] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-3037: - Summary: Tika Docs should highlight Tika-Server Key: TIKA-3037 URL: https://issues.apache.org/jira/browse/TIKA-3037 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-3010) Tika needs service installation script

2019-12-12 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995174#comment-16995174 ] David Eric Pugh commented on TIKA-3010: --- Made more progress. Now, when you run the `package` goal

[jira] [Updated] (TIKA-3010) Tika needs service installation script

2019-12-12 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Eric Pugh updated TIKA-3010: -- Flags: Patch,Important (was: Important) > Tika needs service installation script >

[jira] [Created] (TIKA-3010) Tika needs service installation script

2019-12-12 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-3010: - Summary: Tika needs service installation script Key: TIKA-3010 URL: https://issues.apache.org/jira/browse/TIKA-3010 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-2968) Display specific command for Tesseract if you are running in Verbose mode

2019-10-23 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957801#comment-16957801 ] David Eric Pugh commented on TIKA-2968: --- And on a related aspect, maybe, if we want the Verbose mode

[jira] [Commented] (TIKA-2968) Display specific command for Tesseract if you are running in Verbose mode

2019-10-23 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957799#comment-16957799 ] David Eric Pugh commented on TIKA-2968: --- Hey community, any chance of this being added for 1.23, or

[jira] [Created] (TIKA-2971) Link to download OpenNLP models needs to be http not https

2019-10-22 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-2971: - Summary: Link to download OpenNLP models needs to be http not https Key: TIKA-2971 URL: https://issues.apache.org/jira/browse/TIKA-2971 Project: Tika

[jira] [Commented] (TIKA-2624) Rendering PDFs for OCR with Tesseract uses different DPI than claimed

2019-10-22 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957204#comment-16957204 ] David Eric Pugh commented on TIKA-2624: --- I am rereading this thread via JIRA versus the github PR,

[jira] [Commented] (TIKA-2970) Configuring Tesseract for OCR of PDF via Tika Config is not working

2019-10-20 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955612#comment-16955612 ] David Eric Pugh commented on TIKA-2970: --- It's a work in progress, however here is a unit test:

[jira] [Created] (TIKA-2970) Configuring Tesseract for OCR of PDF via Tika Config is not working

2019-10-20 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-2970: - Summary: Configuring Tesseract for OCR of PDF via Tika Config is not working Key: TIKA-2970 URL: https://issues.apache.org/jira/browse/TIKA-2970 Project: Tika

[jira] [Commented] (TIKA-2705) Allow configuration of TesseractOCRParser as we do for other parsers

2019-10-20 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955515#comment-16955515 ] David Eric Pugh commented on TIKA-2705: --- I know this is marked as resolved, but I'm definitly not

[jira] [Commented] (TIKA-2969) Unit test for TesseractOCRParserTest.java has confusing behavior when Tesseract not on path

2019-10-20 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955498#comment-16955498 ] David Eric Pugh commented on TIKA-2969: --- I noticed that when I run `mvn test` the output is: ```

[jira] [Created] (TIKA-2969) Unit test for TesseractOCRParserTest.java has confusing behavior when Tesseract not on path

2019-10-20 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-2969: - Summary: Unit test for TesseractOCRParserTest.java has confusing behavior when Tesseract not on path Key: TIKA-2969 URL: https://issues.apache.org/jira/browse/TIKA-2969

[jira] [Created] (TIKA-2968) Display specific command for Tesseract if you are running in Verbose mode

2019-10-18 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-2968: - Summary: Display specific command for Tesseract if you are running in Verbose mode Key: TIKA-2968 URL: https://issues.apache.org/jira/browse/TIKA-2968 Project:

[jira] [Commented] (TIKA-2931) Tika CLI shouldn't log with System.out.println

2019-08-29 Thread Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918723#comment-16918723 ] Eric Pugh commented on TIKA-2931: - Okay, I've made a PR that fixes this problem, with a test.

[jira] [Commented] (TIKA-2931) Tika CLI shouldn't log with System.out.println

2019-08-28 Thread Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918137#comment-16918137 ] Eric Pugh commented on TIKA-2931: - Looks like the TikaCLI test does rely on this behavior...

[jira] [Created] (TIKA-2931) Tika CLI shouldn't log with System.out.println

2019-08-28 Thread Eric Pugh (Jira)
Eric Pugh created TIKA-2931: --- Summary: Tika CLI shouldn't log with System.out.println Key: TIKA-2931 URL: https://issues.apache.org/jira/browse/TIKA-2931 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-2106) "hocr" case on Linux fails, but works on OSX. Related to TIKA-2093

2016-09-30 Thread Eric Pugh (JIRA)
Eric Pugh created TIKA-2106: --- Summary: "hocr" case on Linux fails, but works on OSX. Related to TIKA-2093 Key: TIKA-2106 URL: https://issues.apache.org/jira/browse/TIKA-2106 Project: Tika Issue

[jira] [Comment Edited] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-29 Thread Eric Pugh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534613#comment-15534613 ] Eric Pugh edited comment on TIKA-2093 at 9/30/16 12:52 AM: --- BTW, just got to

[jira] [Commented] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-29 Thread Eric Pugh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534613#comment-15534613 ] Eric Pugh commented on TIKA-2093: - BTW, just got to updating my project with the latest 1.14-SNAPSHOT, and

[jira] [Commented] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-23 Thread Eric Pugh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516177#comment-15516177 ] Eric Pugh commented on TIKA-2093: - Thanks for this, and the addition of the HOCRPassthroughHandler, I'll

[jira] [Created] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-22 Thread Eric Pugh (JIRA)
Eric Pugh created TIKA-2093: --- Summary: Add hOCR output type to the TesseractOCRParser Key: TIKA-2093 URL: https://issues.apache.org/jira/browse/TIKA-2093 Project: Tika Issue Type: Improvement