[jira] [Assigned] (TIKA-2030) A space is suppressed when parsing Odt file

2016-07-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-2030: - Assignee: Tim Allison > A space is suppressed when parsing Odt file >

Re: TIKA-1164

2016-07-08 Thread Chris Mattmann
Hi Samuel, I myself haven’t had a chance to look into this yet - maybe someone else on the dev list? Cheers, Chris On 7/8/16, 5:33 AM, "scatherine@gouv.mc" wrote: >Hi, > >Excuse me to this mail but have you seen my problem ? > >Regards, > >Samuel Catherine > > > >Samuel > CATHERINE---05

[jira] [Resolved] (TIKA-2030) A space is suppressed when parsing Odt file

2016-07-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2030. --- Resolution: Fixed [~dpilato], thank you for opening this and supplying a triggering document. This is

[jira] [Commented] (TIKA-2030) A space is suppressed when parsing Odt file

2016-07-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368176#comment-15368176 ] Hudson commented on TIKA-2030: -- UNSTABLE: Integrated in Tika-trunk #1080 (See [https://builds

[jira] [Commented] (TIKA-2021) Improving accuracy of Tesseract parser for Serial Number and Part Number (Numeric) Extraction

2016-07-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368198#comment-15368198 ] Tim Allison commented on TIKA-2021: --- Any chance you could make the check for python stati

tika-2.x-windows - Build # 25 - Still Failing

2016-07-08 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x-windows (build #25) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x-windows/25/ to view the results.

[jira] [Commented] (TIKA-2030) A space is suppressed when parsing Odt file

2016-07-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368233#comment-15368233 ] Hudson commented on TIKA-2030: -- FAILURE: Integrated in tika-2.x-windows #25 (See [https://bui

RE: TIKA-1164

2016-07-08 Thread Allison, Timothy B.
Y, this makes sense. Detector detector = TikaConfig.getDefaultConfig().getDetector(); File file = new File("testPDFVarious.pdf"); try (FileInputStream is = new FileInputStream(file)) { try (InputStream tis = TikaInputStream.get(is)) { System.out.

[jira] [Commented] (TIKA-1164) InputStream get modified by content type detection

2016-07-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368275#comment-15368275 ] Tim Allison commented on TIKA-1164: --- For anyone stumbling across this issue. It is expec

[jira] [Comment Edited] (TIKA-2021) Improving accuracy of Tesseract parser for Serial Number and Part Number (Numeric) Extraction

2016-07-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368198#comment-15368198 ] Tim Allison edited comment on TIKA-2021 at 7/8/16 7:33 PM: --- Any c

[jira] [Updated] (TIKA-2029) Add link string to hrefs in PDF

2016-07-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2029: -- Description: The PDFParser is not including any text in the annotations. It would be great if we could c

[jira] [Commented] (TIKA-2030) A space is suppressed when parsing Odt file

2016-07-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368337#comment-15368337 ] Hudson commented on TIKA-2030: -- SUCCESS: Integrated in Tika-trunk #1081 (See [https://builds.

[jira] [Commented] (TIKA-2030) A space is suppressed when parsing Odt file

2016-07-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368354#comment-15368354 ] Hudson commented on TIKA-2030: -- SUCCESS: Integrated in tika-2.x #121 (See [https://builds.apa

[jira] [Commented] (TIKA-2021) Improving accuracy of Tesseract parser for Serial Number and Part Number (Numeric) Extraction

2016-07-08 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368508#comment-15368508 ] Chris A. Mattmann commented on TIKA-2021: - Tim this would be great. [~Zarana Parekh

[jira] [Commented] (TIKA-2021) Improving accuracy of Tesseract parser for Serial Number and Part Number (Numeric) Extraction

2016-07-08 Thread Zarana Parekh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368620#comment-15368620 ] Zarana Parekh commented on TIKA-2021: - Thank you [~talli...@mitre.org] and [~chrismattm