[jira] [Assigned] (TIKA-736) OpenOffice parser: master footer text isn't extracted

2011-10-26 Thread Michael McCandless (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned TIKA-736: --- Assignee: Michael McCandless OpenOffice parser: master footer text isn't

[jira] [Updated] (TIKA-736) OpenOffice parser: master footer text isn't extracted

2011-10-26 Thread Michael McCandless (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated TIKA-736: Attachment: TIKA-736.patch This turned out to be fairly simple to fix, so I worked out a

[jira] [Updated] (TIKA-582) Lithuanian language identification

2011-10-26 Thread Updated
[ https://issues.apache.org/jira/browse/TIKA-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Žygimantas Medelis updated TIKA-582: Attachment: lt.ngp The previous file had wrong ngrams, they included quote symbols. Place

[jira] [Commented] (TIKA-761) Provide version number by CLI argument -V

2011-10-26 Thread Ingo Renner (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135885#comment-13135885 ] Ingo Renner commented on TIKA-761: -- hmm, can't get it to work for me, stream is null. What

[jira] [Reopened] (TIKA-582) Lithuanian language identification

2011-10-26 Thread Michael McCandless (Reopened) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened TIKA-582: - Assignee: Michael McCandless (was: Jukka Zitting) Reopen to switch to fixed ngp.

[jira] [Commented] (TIKA-582) Lithuanian language identification

2011-10-26 Thread Michael McCandless (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135914#comment-13135914 ] Michael McCandless commented on TIKA-582: - Thanks Žygimantas! When testing Tika's

[jira] [Resolved] (TIKA-582) Lithuanian language identification

2011-10-26 Thread Michael McCandless (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved TIKA-582. - Resolution: Fixed Fix Version/s: (was: 0.9) 1.0 Thansk

[jira] [Commented] (TIKA-736) OpenOffice parser: master footer text isn't extracted

2011-10-26 Thread Michael McCandless (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136070#comment-13136070 ] Michael McCandless commented on TIKA-736: - bq. Can you also check that parsing

Re: Tika is waiting for ODFToolkit to improve ODF file format processing

2011-10-26 Thread Michael McCandless
On Tue, Oct 25, 2011 at 5:40 PM, Rob Weir robw...@apache.org wrote: Is there a list of the complete set of tags you use, or a schema or something? Hmm, I think technically any tags that are valid XHTML is fair game, but in practice the parsers seems to use a very limited set of tags

Re: Google's Compact Language Detector

2011-10-26 Thread reinhard schwab
i have also compared tika performance with the nutch language detector in version 1.0. it seems that nutch is far better in performance than tika ( 5 to 6 times faster than nutch). but my use case is so special (short texts ~ 140 characters length) and i dont have time to investigate, so i have

Re: Updating CHANGES.txt?

2011-10-26 Thread Jukka Zitting
Hi, On Thu, Oct 20, 2011 at 2:26 PM, Michael McCandless luc...@mikemccandless.com wrote: But I think API changes, issues a user has hit, new features, changes in behavior, we really should include.  Generally, when I'm unsure, I try to err on the side of being verbose. See revision 1189334

Build failed in Jenkins: Tika-trunk #692

2011-10-26 Thread Apache Jenkins Server
See https://builds.apache.org/job/Tika-trunk/692/changes Changes: [jukka] Summarize changelog entries by feature rather than by issue [jukka] TIKA-565: Improved OSGi bundling Don't use the context class loader of the current thread as the default. This helps prevent underterministic results in

[jira] [Created] (TIKA-762) EXIF extraction from PNG images

2011-10-26 Thread Nick Burch (Created) (JIRA)
EXIF extraction from PNG images --- Key: TIKA-762 URL: https://issues.apache.org/jira/browse/TIKA-762 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.0

[jira] [Updated] (TIKA-762) EXIF extraction from PNG images

2011-10-26 Thread Nick Burch (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-762: Attachment: training.png The attached file training.png is an example PNG which contains EXIF metadata.