On Tue, 18 Oct 2011, Jukka Zitting wrote:
If we do add such a TikaConfig.getDetector() method, then the equivalent
code in the Tika(TikaConfig) constructor should be replaced to call that
method to avoid duplication.
Sure
Also, is there a reason why the Tika facade creates an
Automatic line break insertion (BR element) instead of '\n' in
XHTMLContentHandler
--
Key: TIKA-754
URL: https://issues.apache.org/jira/browse/TIKA-754
Project: Tika
[
https://issues.apache.org/jira/browse/TIKA-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pablo Queixalos updated TIKA-754:
-
Attachment: TIKA-754.poc.patch
Proof of concept: works fine but breaks tests with 33 Failures and
Hi,
On Tue, Oct 18, 2011 at 11:07 AM, Nick Burch nick.bu...@alfresco.com wrote:
Also, is there a reason why the Tika facade creates an AutoDetectParser
(plus DefaultDetector), while TikaConfig will by default create a
DefaultParser?
Not really, we should unify also that code.
For my case,
[
https://issues.apache.org/jira/browse/TIKA-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129637#comment-13129637
]
Jukka Zitting commented on TIKA-754:
I don't think it's necessarily a good idea to make
Add getDetector() method to TikaConfig
--
Key: TIKA-755
URL: https://issues.apache.org/jira/browse/TIKA-755
Project: Tika
Issue Type: Improvement
Components: config
Affects Versions: 0.10
On Tue, 18 Oct 2011, Jukka Zitting wrote:
For my case, the class already takes a TikaConfig object, as it sometimes
needs to do mimetype heirarchy and similar related stuff. Rather than
wrapping that internally in a Tika object, it occured to me that parser and
detector should possibly be made
[
https://issues.apache.org/jira/browse/TIKA-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129889#comment-13129889
]
Michael McCandless commented on TIKA-738:
-
I opened PDFBOX-1143 to improve
Hi, folks!
I see that tika-parsers depends on two logging systems: slf4j-api and
commons-logging throught pdfbox.
I think, commons-logging should be excluded, and jcl-over-slf4j added
to ensure that all logging will be passed to slf4j. And than to any
backing engine selected by developer that
XMP output from Tika CLI
Key: TIKA-756
URL: https://issues.apache.org/jira/browse/TIKA-756
Project: Tika
Issue Type: New Feature
Components: cli, metadata
Reporter: Jukka Zitting
[
https://issues.apache.org/jira/browse/TIKA-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129957#comment-13129957
]
Jukka Zitting commented on TIKA-756:
Rough first version committed in revision 1185805.
[
https://issues.apache.org/jira/browse/TIKA-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved TIKA-718.
-
Resolution: Invalid
My bad: I'm using OS X's preview to view PDFs, which lets you add
[
https://issues.apache.org/jira/browse/TIKA-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129971#comment-13129971
]
Jukka Zitting commented on TIKA-755:
Hmm, I looked at the interaction between Tika and
[
https://issues.apache.org/jira/browse/TIKA-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-755.
-
Resolution: Fixed
Add getDetector() method to TikaConfig
--
14 matches
Mail list logo