[jira] [Resolved] (TIKA-816) (XLS/XLSX) Improperly formatted date/time in text content.

2012-04-03 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-816. - Resolution: Fixed As of r1309005 we've upgraded to POI 3.8 Final, which includes the required fixes

[jira] [Resolved] (TIKA-792) NoSuchMethodException "CTMarkupImpl.(org.apache.xmlbeans.SchemaType, boolean)" processing a OOXML document

2012-04-03 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-792. - Resolution: Fixed Fix Version/s: 1.2 > NoSuchMethodException "CTMarkupImpl.(org.apache.xmlbeans.

[jira] [Resolved] (TIKA-700) Upgrade to POI 3.8 as available

2012-04-03 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-700. - Resolution: Fixed Fix Version/s: 1.2 > Upgrade to POI 3.8 as available > ---

[jira] [Resolved] (TIKA-890) Improve detection of Android Packages (APK)

2012-04-05 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-890. - Resolution: Fixed > Improve detection of Android Packages (APK) > -

[jira] [Resolved] (TIKA-622) Switch from POIFSFileSystem to NPOIFSFileSystem, for speed and memory improvements

2011-10-05 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-622. - Resolution: Fixed Fix Version/s: 0.10 This was fixed back in April in r1091046 > Sw

[jira] [Resolved] (TIKA-745) MP3 parser should handle genres not in ID3v1

2011-10-06 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-745. - Resolution: Fixed Fixed in r1179669. We try to map to the normalised ID3v1 form, but use the genre as-is i

[jira] [Resolved] (TIKA-749) Avoid using POI's LittleEndian in non-POI parsers

2011-10-07 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-749. - Resolution: Fixed > Avoid using POI's LittleEndian in non-POI parsers > ---

[jira] [Resolved] (TIKA-755) Add getDetector() method to TikaConfig

2011-10-18 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-755. - Resolution: Fixed > Add getDetector() method to TikaConfig > -- > >

[jira] [Resolved] (TIKA-779) Detection of Microsoft Works 2000 Word Processor files

2011-11-15 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-779. - Resolution: Fixed Fix Version/s: 1.1 > Detection of Microsoft Works 2000 Word Processor files >

[jira] [Resolved] (TIKA-785) TikaCLI should include a --list-detectors option similar to --list-parsers

2011-11-20 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-785. - Resolution: Fixed > TikaCLI should include a --list-detectors option similar to --list-parsers > --

[jira] [Resolved] (TIKA-784) Mimetype entry for DITA

2011-11-20 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-784. - Resolution: Fixed Fix Version/s: 1.1 > Mimetype entry for DITA > --- > >

[jira] [Resolved] (TIKA-786) Tika CLI --detect returns incorrect content-type for files with altered extensions

2011-11-21 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-786. - Resolution: Fixed Fix Version/s: 1.1 Explanation added to CHANGES in r1204479, so I think this is no

[jira] [Resolved] (TIKA-789) Microsoft Project (MPP) basic support

2011-11-25 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-789. - Resolution: Fixed Fix Version/s: 1.1 > Microsoft Project (MPP) basic support > -

[jira] [Resolved] (TIKA-697) Tika reports the content type of AR archives as "text/plain"

2011-11-27 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-697. - Resolution: Fixed Fix Version/s: 1.1 > Tika reports the content type of AR archives as "text/pla

[jira] [Resolved] (TIKA-794) Mime magic logic for Little16 is incorrect

2011-11-27 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-794. - Resolution: Fixed Fix Version/s: 1.1 > Mime magic logic for Little16 is incorrect >

[jira] [Resolved] (TIKA-790) Reduce duplication between POIFSDocumentType (in OfficeParser) and POIFSContainerDetector

2011-11-28 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-790. - Resolution: Fixed Fix Version/s: 1.1 > Reduce duplication between POIFSDocumentType (in OfficePa

[jira] [Resolved] (TIKA-797) MimeType.getExtension for application/vnd.ms-powerpoint returns ppz. I'd expect ppt.

2011-12-02 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-797. - Resolution: Fixed Fix Version/s: 1.1 > MimeType.getExtension for application/vnd.ms-powerpoint r

[jira] [Resolved] (TIKA-795) [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()

2011-12-04 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-795. - Resolution: Duplicate Resolving as a duplicate of TIKA-700, as the change was deliberate and the patch on

[jira] [Resolved] (TIKA-410) textbox content extaction for word documents

2011-12-04 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-410. - Resolution: Fixed Fix Version/s: 1.1 > textbox content extaction for word documents > --

[jira] [Resolved] (TIKA-800) mark/reset not supported from POIFSContainerDetector

2011-12-05 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-800. - Resolution: Fixed Fix Version/s: 1.1 > mark/reset not supported from POIFSContainerDetector > --

[jira] [Resolved] (TIKA-804) Parsing outlook format template (.oft )

2011-12-10 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-804. - Resolution: Not A Problem > Parsing outlook format template (.oft ) > -

[jira] [Resolved] (TIKA-809) IndexOutOfBoundsException with TikaGUI

2011-12-11 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-809. - Resolution: Fixed Fix Version/s: 1.1 > IndexOutOfBoundsException with TikaGUI >

[jira] [Resolved] (TIKA-803) Outlook parser to mark the message body in some special way

2011-12-12 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-803. - Resolution: Fixed Fix Version/s: 1.1 > Outlook parser to mark the message body in some special w

[jira] [Resolved] (TIKA-423) Parse docx and output to text file missing words

2011-12-19 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-423. - Resolution: Fixed Fix Version/s: 1.1 Having upgraded POI, as of r1221115 the smart tags text is now

[jira] [Resolved] (TIKA-822) MediaType fails to parse charset that has quoted value

2011-12-20 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-822. - Resolution: Fixed Fix Version/s: 1.1 > MediaType fails to parse charset that has quoted value >

[jira] [Resolved] (TIKA-829) Tika lacks preconditions on its input, causing some potential misuse of the API

2011-12-25 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-829. - Resolution: Fixed Fix Version/s: 1.1 > Tika lacks preconditions on its input, causing some poten

[jira] [Resolved] (TIKA-831) ForkClient doesn't report error due to widening conversion issue

2011-12-26 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-831. - Resolution: Fixed Fix Version/s: 1.1 > ForkClient doesn't report error due to widening conversio

[jira] [Resolved] (TIKA-793) Invalid ASCII character (65533) when retriving MP3 metadata

2011-12-29 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-793. - Resolution: Fixed Fix Version/s: 1.1 > Invalid ASCII character (65533) when retriving MP3 metada

[jira] [Resolved] (TIKA-526) OOXMLParser fails to extract text from within smart tags

2012-01-01 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-526. - Resolution: Fixed Fix Version/s: 1.1 > OOXMLParser fails to extract text from within smart tags

[jira] [Resolved] (TIKA-826) TikaException / OfficeXmlFileException with .xlsb files

2012-01-02 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-826. - Resolution: Fixed Fix Version/s: 1.1 > TikaException / OfficeXmlFileException with .xlsb files >

[jira] [Resolved] (TIKA-837) Make inner classes static for performance reasons

2012-01-02 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-837. - Resolution: Fixed Fix Version/s: 1.1 > Make inner classes static for performance reasons > -

[jira] [Resolved] (TIKA-695) Custom properties on xlsx, docx, pptx

2012-01-12 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-695. - Resolution: Fixed Fix Version/s: 1.1 > Custom properties on xlsx, docx, pptx > -

[jira] [Resolved] (TIKA-840) OOXML parser content type setting

2012-01-13 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-840. - Resolution: Fixed Fix Version/s: 1.1 > OOXML parser content type setting > -

[jira] [Resolved] (TIKA-805) improvements in XSLFPowerPointExtractorDecorator

2012-01-16 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-805. - Resolution: Fixed Fix Version/s: 1.1 > improvements in XSLFPowerPointExtractorDecorator > -

[jira] [Resolved] (TIKA-87) MimeTypes should allow modification of MIME types

2012-01-16 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-87. Resolution: Fixed Fix Version/s: 1.1 TIKA-746 provides a clean way to do this, as documented in http:/

[jira] [Resolved] (TIKA-841) User supplied parsers should be preferred

2012-01-18 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-841. - Resolution: Fixed Fix Version/s: 1.1 > User supplied parsers should be preferred > -

[jira] [Resolved] (TIKA-844) Ability to Define an Internal Text Bag Property

2012-01-23 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-844. - Resolution: Fixed > Ability to Define an Internal Text Bag Property > -

[jira] [Resolved] (TIKA-843) Support for Date without a Time Component

2012-01-23 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-843. - Resolution: Fixed > Support for Date without a Time Component > ---

[jira] [Resolved] (TIKA-846) Ability to Parse RDF Bag Elements in XML

2012-01-23 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-846. - Resolution: Fixed > Ability to Parse RDF Bag Elements in XML >

[jira] [Resolved] (TIKA-845) Check for Existing Value in Multi-Value Fields in XML Metadata Handler

2012-01-23 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-845. - Resolution: Fixed > Check for Existing Value in Multi-Value Fields in XML Metadata Handler > --

[jira] [Resolved] (TIKA-839) TikaException with testPPT.potm in Tika GUI / CLI

2012-01-24 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-839. - Resolution: Fixed Fix Version/s: 1.1 > TikaException with testPPT.potm in Tika GUI / CLI > -

[jira] [Resolved] (TIKA-802) NullPointerException when parsing iWork files

2012-01-24 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-802. - Resolution: Cannot Reproduce > NullPointerException when parsing iWork files > ---

[jira] [Resolved] (TIKA-760) NPE XHTMLContentHandler in characters Method

2012-01-24 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-760. - Resolution: Fixed Fix Version/s: 1.1 > NPE XHTMLContentHandler in characters Method > --

[jira] [Resolved] (TIKA-643) tika hangs parsing doc file (attached)

2012-01-24 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-643. - Resolution: Fixed Fix Version/s: 1.0 I believe this was fixed in Tika 1.0, by a POI upgrade

[jira] [Resolved] (TIKA-616) ArrayIndexOutOfBoundsException from POI

2012-01-24 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-616. - Resolution: Fixed Fix Version/s: 1.0 I believe this was fixed in Tika 1.0 by a POI upgrade (it's cer

[jira] [Resolved] (TIKA-637) Need API to get list of embedded documents

2012-01-24 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-637. - Resolution: Not A Problem Closing as "Not A Problem", as this is handled by supplying a recursing parser o

[jira] [Resolved] (TIKA-195) MSWORD: Tika ignores text from Pieces

2012-01-24 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-195. - Resolution: Later I believe that all text from Word files is now extracted, and has been for at least a li

[jira] [Resolved] (TIKA-851) M4V and M4A detection invalid

2012-01-27 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-851. - Resolution: Fixed Fix Version/s: 1.1 > M4V and M4A detection invalid > -

[jira] [Resolved] (TIKA-818) Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer to allow for a memory vs performance tradeoff

2012-02-10 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-818. - Resolution: Fixed Fix Version/s: 1.1 > Allow PDFBox to be used with RandomAccessFile vs RandomAc

[jira] [Resolved] (TIKA-850) Consistent way to supply document passwords to parsers

2012-02-16 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-850. - Resolution: Fixed Fix Version/s: 1.1 > Consistent way to supply document passwords to parsers >

[jira] [Resolved] (TIKA-886) OOXMLExtractorFactory can leave files open

2012-03-28 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-886. - Resolution: Fixed > OOXMLExtractorFactory can leave files open > --