Re: TikaConfig.getDetector?

2011-10-18 Thread Nick Burch
On Tue, 18 Oct 2011, Jukka Zitting wrote: If we do add such a TikaConfig.getDetector() method, then the equivalent code in the Tika(TikaConfig) constructor should be replaced to call that method to avoid duplication. Sure Also, is there a reason why the Tika facade creates an

[jira] [Created] (TIKA-754) Automatic line break insertion (BR element) instead of '\n' in XHTMLContentHandler

2011-10-18 Thread Pablo Queixalos (Created) (JIRA)
Automatic line break insertion (BR element) instead of '\n' in XHTMLContentHandler -- Key: TIKA-754 URL: https://issues.apache.org/jira/browse/TIKA-754 Project: Tika

[jira] [Updated] (TIKA-754) Automatic line break insertion (BR element) instead of '\n' in XHTMLContentHandler

2011-10-18 Thread Pablo Queixalos (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Queixalos updated TIKA-754: - Attachment: TIKA-754.poc.patch Proof of concept: works fine but breaks tests with 33 Failures and

Re: TikaConfig.getDetector?

2011-10-18 Thread Jukka Zitting
Hi, On Tue, Oct 18, 2011 at 11:07 AM, Nick Burch nick.bu...@alfresco.com wrote: Also, is there a reason why the Tika facade creates an AutoDetectParser (plus DefaultDetector), while TikaConfig will by default create a DefaultParser? Not really, we should unify also that code. For my case,

[jira] [Commented] (TIKA-754) Automatic line break insertion (BR element) instead of '\n' in XHTMLContentHandler

2011-10-18 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129637#comment-13129637 ] Jukka Zitting commented on TIKA-754: I don't think it's necessarily a good idea to make

[jira] [Created] (TIKA-755) Add getDetector() method to TikaConfig

2011-10-18 Thread Nick Burch (Created) (JIRA)
Add getDetector() method to TikaConfig -- Key: TIKA-755 URL: https://issues.apache.org/jira/browse/TIKA-755 Project: Tika Issue Type: Improvement Components: config Affects Versions: 0.10

Re: TikaConfig.getDetector?

2011-10-18 Thread Nick Burch
On Tue, 18 Oct 2011, Jukka Zitting wrote: For my case, the class already takes a TikaConfig object, as it sometimes needs to do mimetype heirarchy and similar related stuff. Rather than wrapping that internally in a Tika object, it occured to me that parser and detector should possibly be made

[jira] [Commented] (TIKA-738) Tika fails to extract text from PDF annotations

2011-10-18 Thread Michael McCandless (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129889#comment-13129889 ] Michael McCandless commented on TIKA-738: - I opened PDFBOX-1143 to improve

tika-parsers maven dependencies (commons-logging)

2011-10-18 Thread gross
Hi, folks! I see that tika-parsers depends on two logging systems: slf4j-api and commons-logging throught pdfbox. I think, commons-logging should be excluded, and jcl-over-slf4j added to ensure that all logging will be passed to slf4j. And than to any backing engine selected by developer that

[jira] [Created] (TIKA-756) XMP output from Tika CLI

2011-10-18 Thread Jukka Zitting (Created) (JIRA)
XMP output from Tika CLI Key: TIKA-756 URL: https://issues.apache.org/jira/browse/TIKA-756 Project: Tika Issue Type: New Feature Components: cli, metadata Reporter: Jukka Zitting

[jira] [Commented] (TIKA-756) XMP output from Tika CLI

2011-10-18 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129957#comment-13129957 ] Jukka Zitting commented on TIKA-756: Rough first version committed in revision 1185805.

[jira] [Resolved] (TIKA-718) PDF bookmark text isn't extracted

2011-10-18 Thread Michael McCandless (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved TIKA-718. - Resolution: Invalid My bad: I'm using OS X's preview to view PDFs, which lets you add

[jira] [Commented] (TIKA-755) Add getDetector() method to TikaConfig

2011-10-18 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129971#comment-13129971 ] Jukka Zitting commented on TIKA-755: Hmm, I looked at the interaction between Tika and

[jira] [Resolved] (TIKA-755) Add getDetector() method to TikaConfig

2011-10-18 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-755. - Resolution: Fixed Add getDetector() method to TikaConfig --