[jira] [Updated] (TIKA-2291) REST API documentation is down

2017-03-07 Thread Mike Liu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Liu updated TIKA-2291: --- Issue Type: Bug (was: New Feature) > REST API documentation is down > -- > >

[jira] [Commented] (TIKA-2291) REST API documentation is down

2017-03-07 Thread Mike Liu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900850#comment-15900850 ] Mike Liu commented on TIKA-2291: Please see links, https://tika.apache.org/1.14/miredot/

Re: Query Regarding Apache Tika Language Ditector

2017-03-07 Thread Mattmann, Chris A (3010)
Resending this to dev@tika.apache.org rather than dev-owner. ++ Chris Mattmann, Ph.D. Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF & Open Source Projects Formulat

[jira] [Updated] (TIKA-2290) PDFParser 'ocr' properties cannot be set via headers when using Tika JAXRS

2017-03-07 Thread Kevin Oberlag (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Oberlag updated TIKA-2290: Thanks for getting to it so quickly! > PDFParser 'ocr' properties cannot be set via headers when usin

[jira] [Resolved] (TIKA-1879) Extract recipient information in MSG files with more granularity

2017-03-07 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1879. --- Resolution: Fixed Fix Version/s: 1.15 2.0 I removed the exchange processing.

[jira] [Resolved] (TIKA-2290) PDFParser 'ocr' properties cannot be set via headers when using Tika JAXRS

2017-03-07 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2290. --- Resolution: Fixed Fix Version/s: 1.15 2.0 Thank you for opening this. The pr

[jira] [Commented] (TIKA-2288) Remove metadata within body-element in OutlookExtractor

2017-03-07 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900140#comment-15900140 ] Nick Burch commented on TIKA-2288: -- I've got a feeling that this was partly because we did

[jira] [Commented] (TIKA-2288) Remove metadata within body-element in OutlookExtractor

2017-03-07 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900117#comment-15900117 ] Tim Allison commented on TIKA-2288: --- That goes back to the very beginnings of OutlookExtr

[jira] [Commented] (TIKA-2208) Catch missing libraires

2017-03-07 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900095#comment-15900095 ] Tim Allison commented on TIKA-2208: --- Should have time in the next few days to turn back t

[jira] [Commented] (TIKA-1879) Extract recipient information in MSG files with more granularity

2017-03-07 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900088#comment-15900088 ] Tim Allison commented on TIKA-1879: --- For "from", I assumed a single sender (which isn't a

[jira] [Assigned] (TIKA-2288) Remove metadata within body-element in OutlookExtractor

2017-03-07 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-2288: - Assignee: Tim Allison > Remove metadata within body-element in OutlookExtractor >

RE: Tess4j API for TIKA OCR parser

2017-03-07 Thread Thamme Gowda
yes, we can try tika-eval to see the difference. Perfect! Best, TG On Mar 7, 2017 7:44 AM, "Allison, Timothy B." wrote: Y and why not give the new tika-eval module a trial to evaluate the differences in output? :) -Original Message- From: Thamme Gowda [mailto:thammego...@apache.org] S

RE: Tess4j API for TIKA OCR parser

2017-03-07 Thread Allison, Timothy B.
Y and why not give the new tika-eval module a trial to evaluate the differences in output? :) -Original Message- From: Thamme Gowda [mailto:thammego...@apache.org] Sent: Tuesday, March 7, 2017 10:38 AM To: Thejan Wijesinghe Cc: dev@tika.apache.org Subject: Re: Tess4j API for TIKA OCR p

RE: Tess4j API for TIKA OCR parser

2017-03-07 Thread Allison, Timothy B.
+1 Same experience, of same vintage. :) -Original Message- From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com] Sent: Tuesday, March 7, 2017 10:34 AM To: dev@tika.apache.org Subject: Re: Tess4j API for TIKA OCR parser Hi Thejan, Before the first version of TesseractOcrParser was commi

Re: Tess4j API for TIKA OCR parser

2017-03-07 Thread Luís Filipe Nassif
Hi Thejan, Before the first version of TesseractOcrParser was commited I tried to use Tess4j, that was 4 years ago. Unfortunatelly that time I run into some problems like permanent hangs with tesseract/Tess4j and, even worse, Jvm crashes because of bugs into native code (pointers to crazy adresses

Re: Tess4j API for TIKA OCR parser

2017-03-07 Thread Thamme Gowda
Thanks Nick for the reply. Thejan, I am glad to know your progress. Rewriting the TesseractOCRParser would be the ultimate goal if using Tess4j proves to be better than the way it is done currently. But, for now, please consider these: + Rename your class to *Tess4jOCRParser*. It is a new parser

Re: Tess4j API for TIKA OCR parser

2017-03-07 Thread Thejan Wijesinghe
Hi Nick, I thought the same thing. I will try to keep the public method signatures unchanged and will send updates on my progress. On Tue, Mar 7, 2017 at 5:48 PM, Nick Burch wrote: > On Tue, 7 Mar 2017, Thejan Wijesinghe wrote: > >> I have already use the Tess4j API to rewrite the TesseractOCRP

Re: Tess4j API for TIKA OCR parser

2017-03-07 Thread Nick Burch
On Tue, 7 Mar 2017, Thejan Wijesinghe wrote: I have already use the Tess4j API to rewrite the TesseractOCRParser class, Although It successfully extracts content from most of the file types, it fails some particular unit tests in the TesseractOCRParserTest class. I can solve that. However, I want

[jira] [Commented] (TIKA-2262) Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types

2017-03-07 Thread Avtar Singh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899340#comment-15899340 ] Avtar Singh commented on TIKA-2262: --- Hello, My name is Avtar, I have developed a fact bas

[jira] [Created] (TIKA-2292) Update CXF version to 3.0.12

2017-03-07 Thread Sergey Beryozkin (JIRA)
Sergey Beryozkin created TIKA-2292: -- Summary: Update CXF version to 3.0.12 Key: TIKA-2292 URL: https://issues.apache.org/jira/browse/TIKA-2292 Project: Tika Issue Type: Task Compon

Re: Tess4j API for TIKA OCR parser

2017-03-07 Thread Thejan Wijesinghe
Hi Thamme, I did minimal changes to the TesseractOCRParser class. I basically changed the doOCR() private method. But the existing unit tests get failed even though the content and metadata get extracted. Could you provide me with any guidance on resolving these errors by running the test cases. I

Re: CXF Rs Client dependency in tika-parsers

2017-03-07 Thread Sergey Beryozkin
That's fine, just wanted to make sure it was a required dependency, we can deal with blocking it at the CXF master level (to avoid it interfering with the newer version of same module) easily... Cheers, Sergey On 06/03/17 23:13, Allison, Timothy B. wrote: Grobid, IIRC. -Original Message--