[
https://issues.apache.org/jira/browse/TIKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reassigned TIKA-736:
---
Assignee: Michael McCandless
OpenOffice parser: master footer text isn't
[
https://issues.apache.org/jira/browse/TIKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated TIKA-736:
Attachment: TIKA-736.patch
This turned out to be fairly simple to fix, so I worked out a
[
https://issues.apache.org/jira/browse/TIKA-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Žygimantas Medelis updated TIKA-582:
Attachment: lt.ngp
The previous file had wrong ngrams, they included quote symbols. Place
[
https://issues.apache.org/jira/browse/TIKA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135885#comment-13135885
]
Ingo Renner commented on TIKA-761:
--
hmm, can't get it to work for me, stream is null.
What
[
https://issues.apache.org/jira/browse/TIKA-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reopened TIKA-582:
-
Assignee: Michael McCandless (was: Jukka Zitting)
Reopen to switch to fixed ngp.
[
https://issues.apache.org/jira/browse/TIKA-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135914#comment-13135914
]
Michael McCandless commented on TIKA-582:
-
Thanks Žygimantas!
When testing Tika's
[
https://issues.apache.org/jira/browse/TIKA-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved TIKA-582.
-
Resolution: Fixed
Fix Version/s: (was: 0.9)
1.0
Thansk
[
https://issues.apache.org/jira/browse/TIKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136070#comment-13136070
]
Michael McCandless commented on TIKA-736:
-
bq. Can you also check that parsing
On Tue, Oct 25, 2011 at 5:40 PM, Rob Weir robw...@apache.org wrote:
Is there a list of the complete set of tags you use, or a schema or something?
Hmm, I think technically any tags that are valid XHTML is fair game,
but in practice the parsers seems to use a very limited set of tags
i have also compared tika performance with the nutch language detector
in version 1.0.
it seems that nutch is far better in performance than tika ( 5 to 6
times faster than nutch).
but my use case is so special (short texts ~ 140 characters length) and
i dont have time to investigate, so i have
Hi,
On Thu, Oct 20, 2011 at 2:26 PM, Michael McCandless
luc...@mikemccandless.com wrote:
But I think API changes, issues a user has hit, new features, changes
in behavior, we really should include. Generally, when I'm unsure, I
try to err on the side of being verbose.
See revision 1189334
See https://builds.apache.org/job/Tika-trunk/692/changes
Changes:
[jukka] Summarize changelog entries by feature rather than by issue
[jukka] TIKA-565: Improved OSGi bundling
Don't use the context class loader of the current thread as the default.
This helps prevent underterministic results in
EXIF extraction from PNG images
---
Key: TIKA-762
URL: https://issues.apache.org/jira/browse/TIKA-762
Project: Tika
Issue Type: New Feature
Components: parser
Affects Versions: 1.0
[
https://issues.apache.org/jira/browse/TIKA-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch updated TIKA-762:
Attachment: training.png
The attached file training.png is an example PNG which contains EXIF metadata.
14 matches
Mail list logo