[jira] [Created] (TIKA-897) UTF-8 encoded XML is detected as text/plain because of UTF-8 BOM

2012-04-20 Thread Wade Taylor (Created) (JIRA)
UTF-8 encoded XML is detected as text/plain because of UTF-8 BOM Key: TIKA-897 URL: https://issues.apache.org/jira/browse/TIKA-897 Project: Tika Issue Type: Bug

[jira] [Created] (TIKA-895) Empty title element makes Tika-generated HTML documents not open

2012-04-19 Thread Benoit MAGGI (Created) (JIRA)
Empty title element makes Tika-generated HTML documents not open Key: TIKA-895 URL: https://issues.apache.org/jira/browse/TIKA-895 Project: Tika Issue Type: Bug

[jira] [Created] (TIKA-896) OSGi deployment without declarative services

2012-04-19 Thread Created
OSGi deployment without declarative services Key: TIKA-896 URL: https://issues.apache.org/jira/browse/TIKA-896 Project: Tika Issue Type: Improvement Components: packaging Affects

[jira] [Created] (TIKA-894) Add webapp mode for Tika Server, simplifies deployment

2012-04-17 Thread Chris Wilson (Created) (JIRA)
Add webapp mode for Tika Server, simplifies deployment -- Key: TIKA-894 URL: https://issues.apache.org/jira/browse/TIKA-894 Project: Tika Issue Type: Improvement Components:

[jira] [Created] (TIKA-893) Tika-server bundle includes wrong META-INF/services/org.apache.tika.parser.Parser, doesn't work

2012-04-16 Thread Chris Wilson (Created) (JIRA)
Tika-server bundle includes wrong META-INF/services/org.apache.tika.parser.Parser, doesn't work --- Key: TIKA-893 URL: https://issues.apache.org/jira/browse/TIKA-893

[jira] [Created] (TIKA-888) NetCDF parser uses Java 6 JAR file and test/compilation fails with Java 1.5, although TIKA is Java 1.5

2012-03-30 Thread Uwe Schindler (Created) (JIRA)
NetCDF parser uses Java 6 JAR file and test/compilation fails with Java 1.5, although TIKA is Java 1.5 -- Key: TIKA-888 URL:

[jira] [Created] (TIKA-889) XHTMLContentHandler wont emit newline when html element matches ENDLINE set

2012-03-30 Thread John Conwell (Created) (JIRA)
XHTMLContentHandler wont emit newline when html element matches ENDLINE set --- Key: TIKA-889 URL: https://issues.apache.org/jira/browse/TIKA-889 Project: Tika Issue

[jira] [Created] (TIKA-887) Tika fails to parse some MP3 tags correctly and produces null characters in value

2012-03-29 Thread Created
. This happpens with files downloaded from www.jamendo.com, for example this one: http://storage.newjamendo.com/download/track/450545/mp32/Swansong.mp3 It may be that the tags are not created properly on this site, but at least tools like mp3tag display them correctly. The extracted value looks like

[jira] [Created] (TIKA-886) OOXMLExtractorFactory can leave files open

2012-03-28 Thread Nick Burch (Created) (JIRA)
Reporter: Nick Burch Assignee: Nick Burch Fix For: 1.2 As identified in an Alfresco bug (ALF-13106), OOXMLExtractorFactory doesn't currently allow the closing of OPCPackage instances created from Files. This is because the OPCPackage isn't associated

[jira] [Created] (TIKA-884) Dynamic loading of Parser and Detector services

2012-03-27 Thread Jukka Zitting (Created) (JIRA)
Dynamic loading of Parser and Detector services --- Key: TIKA-884 URL: https://issues.apache.org/jira/browse/TIKA-884 Project: Tika Issue Type: Improvement Affects Versions: 1.1

[jira] [Created] (TIKA-885) Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader

2012-03-27 Thread Luis Filipe Nassif (Created) (JIRA)
Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader --- Key: TIKA-885 URL: https://issues.apache.org/jira/browse/TIKA-885

[jira] [Created] (TIKA-882) IllegalArgumentException: No part found for relationship

2012-03-22 Thread Maxim Valyanskiy (Created) (JIRA)
IllegalArgumentException: No part found for relationship Key: TIKA-882 URL: https://issues.apache.org/jira/browse/TIKA-882 Project: Tika Issue Type: Bug Components: parser

[jira] [Created] (TIKA-879) Detection problem: message/rfc822 file is detected as text/plain.

2012-03-21 Thread Kostya Gribov (Created) (JIRA)
Detection problem: message/rfc822 file is detected as text/plain. - Key: TIKA-879 URL: https://issues.apache.org/jira/browse/TIKA-879 Project: Tika Issue Type: Bug

[jira] [Created] (TIKA-880) while integrating microsoft parser it is giving error

2012-03-21 Thread Somenath Mukhopadhyay (Created) (JIRA)
while integrating microsoft parser it is giving error - Key: TIKA-880 URL: https://issues.apache.org/jira/browse/TIKA-880 Project: Tika Issue Type: Wish Components: parser

[jira] [Created] (TIKA-878) Reuse computed MapMediaType, Parser inside CompositeParser

2012-03-19 Thread Luis Filipe Nassif (Created) (JIRA)
Reuse computed MapMediaType, Parser inside CompositeParser Key: TIKA-878 URL: https://issues.apache.org/jira/browse/TIKA-878 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-877) Embedded document not extracted (regression)

2012-03-18 Thread Daniel Bonniot de Ruisselet (Created) (JIRA)
Embedded document not extracted (regression) Key: TIKA-877 URL: https://issues.apache.org/jira/browse/TIKA-877 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.1

[jira] [Created] (TIKA-876) Signed pdf parsing

2012-03-14 Thread Fausto Cruzeiro de Moraes (Created) (JIRA)
Signed pdf parsing -- Key: TIKA-876 URL: https://issues.apache.org/jira/browse/TIKA-876 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.0 Environment: Java 6.0, Ubuntu

[jira] [Created] (TIKA-874) Identify FITS (Flexible Image Transport System) files

2012-03-12 Thread Peter May (Created) (JIRA)
Affects Versions: 1.1, 1.2 Reporter: Peter May Priority: Minor Tika does not have a defined signature for application/fits files. I have created a patch (based on file(1) magic) to address identification of such files, including a simple unit test. This patch only

[jira] [Created] (TIKA-872) Tika --extract fails for RTF

2012-03-09 Thread Albert L. (Created) (JIRA)
Tika --extract fails for RTF Key: TIKA-872 URL: https://issues.apache.org/jira/browse/TIKA-872 Project: Tika Issue Type: New Feature Components: general Affects Versions: 1.0 Environment:

[jira] [Created] (TIKA-873) Tika --extract fails for DOC

2012-03-09 Thread Albert L. (Created) (JIRA)
Tika --extract fails for DOC Key: TIKA-873 URL: https://issues.apache.org/jira/browse/TIKA-873 Project: Tika Issue Type: Bug Components: general Affects Versions: 1.0 Environment: Windows

[jira] [Created] (TIKA-871) Text in nested groups within a pptx not parsed

2012-03-08 Thread Curtis Hyder (Created) (JIRA)
Text in nested groups within a pptx not parsed -- Key: TIKA-871 URL: https://issues.apache.org/jira/browse/TIKA-871 Project: Tika Issue Type: Bug Components: parser Affects Versions:

[jira] [Created] (TIKA-869) IdentityHtmlMapper.mapSafeElement() needs to return lower-cased incoming name

2012-03-07 Thread Ken Krugler (Created) (JIRA)
IdentityHtmlMapper.mapSafeElement() needs to return lower-cased incoming name - Key: TIKA-869 URL: https://issues.apache.org/jira/browse/TIKA-869 Project: Tika

[jira] [Created] (TIKA-870) Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call

2012-03-07 Thread Shay Banon (Created) (JIRA)
Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call - Key: TIKA-870 URL:

[jira] [Created] (TIKA-868) TXT parser does not honour the specified encoding

2012-02-24 Thread Daniel Bonniot de Ruisselet (Created) (JIRA)
TXT parser does not honour the specified encoding - Key: TIKA-868 URL: https://issues.apache.org/jira/browse/TIKA-868 Project: Tika Issue Type: Bug Reporter: Daniel Bonniot de

[jira] [Created] (TIKA-867) UTF-8 encoding does not work on windows

2012-02-23 Thread Created
UTF-8 encoding does not work on windows --- Key: TIKA-867 URL: https://issues.apache.org/jira/browse/TIKA-867 Project: Tika Issue Type: Bug Components: cli Affects Versions: 1.0

[jira] [Created] (TIKA-866) Incomplete configuration file causes OutOfMemoryException

2012-02-17 Thread Created
Incomplete configuration file causes OutOfMemoryException - Key: TIKA-866 URL: https://issues.apache.org/jira/browse/TIKA-866 Project: Tika Issue Type: Bug Components: config

[jira] [Created] (TIKA-862) JPSS HDF5 files not being detected appropriately

2012-02-16 Thread Chris A. Mattmann (Created) (JIRA)
JPSS HDF5 files not being detected appropriately Key: TIKA-862 URL: https://issues.apache.org/jira/browse/TIKA-862 Project: Tika Issue Type: Bug Reporter: Richard Yu

[jira] [Created] (TIKA-863) MailContentHandler should not create AutoDetectParser on each call

2012-02-16 Thread Andrzej Bialecki (Created) (JIRA)
MailContentHandler should not create AutoDetectParser on each call -- Key: TIKA-863 URL: https://issues.apache.org/jira/browse/TIKA-863 Project: Tika Issue Type: Bug

[jira] [Created] (TIKA-864) Metadata.formatDate should use ThreadLocal

2012-02-16 Thread Andrzej Bialecki (Created) (JIRA)
Metadata.formatDate should use ThreadLocal -- Key: TIKA-864 URL: https://issues.apache.org/jira/browse/TIKA-864 Project: Tika Issue Type: Improvement Components: metadata

[jira] [Created] (TIKA-865) MimeTypes.forName should avoid method-level synchronization

2012-02-16 Thread Andrzej Bialecki (Created) (JIRA)
MimeTypes.forName should avoid method-level synchronization --- Key: TIKA-865 URL: https://issues.apache.org/jira/browse/TIKA-865 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-860) Make ZIP bomb detection configureable

2012-02-10 Thread Uwe Schindler (Created) (JIRA)
Make ZIP bomb detection configureable - Key: TIKA-860 URL: https://issues.apache.org/jira/browse/TIKA-860 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.0

[jira] [Created] (TIKA-857) Tika TrueTypeParser add metadata from Naming tables

2012-02-02 Thread Craig Stires (Created) (JIRA)
Tika TrueTypeParser add metadata from Naming tables --- Key: TIKA-857 URL: https://issues.apache.org/jira/browse/TIKA-857 Project: Tika Issue Type: Improvement Components: parser

[jira] [Created] (TIKA-858) Tika add parsing support for ANPA-1312 news wire feeds

2012-02-02 Thread Craig Stires (Created) (JIRA)
Tika add parsing support for ANPA-1312 news wire feeds -- Key: TIKA-858 URL: https://issues.apache.org/jira/browse/TIKA-858 Project: Tika Issue Type: New Feature Components:

[jira] [Created] (TIKA-859) DublinCore Metadata Keys Should be Prefixed and Property Objects

2012-02-02 Thread Ray Gauss II (Created) (JIRA)
DublinCore Metadata Keys Should be Prefixed and Property Objects Key: TIKA-859 URL: https://issues.apache.org/jira/browse/TIKA-859 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-855) Language Detection not working for Japanese and Chinese.

2012-02-01 Thread James Sullivan (Created) (JIRA)
Language Detection not working for Japanese and Chinese. Key: TIKA-855 URL: https://issues.apache.org/jira/browse/TIKA-855 Project: Tika Issue Type: Bug Components:

[jira] [Created] (TIKA-856) Support CJK (Chinese, Japanese and Korean) language detection

2012-02-01 Thread James Sullivan (Created) (JIRA)
Support CJK (Chinese, Japanese and Korean) language detection - Key: TIKA-856 URL: https://issues.apache.org/jira/browse/TIKA-856 Project: Tika Issue Type: New Feature

[jira] [Created] (TIKA-854) No text extraction Word macroenabled template

2012-01-31 Thread Maxim Valyanskiy (Created) (JIRA)
No text extraction Word macroenabled template - Key: TIKA-854 URL: https://issues.apache.org/jira/browse/TIKA-854 Project: Tika Issue Type: Bug Affects Versions: 1.1 Reporter:

[jira] [Created] (TIKA-853) java.io.IOException with TikaGUI and testMP4.m4a

2012-01-29 Thread John Mastarone (Created) (JIRA)
java.io.IOException with TikaGUI and testMP4.m4a Key: TIKA-853 URL: https://issues.apache.org/jira/browse/TIKA-853 Project: Tika Issue Type: Bug Components: gui, parser Affects

[jira] [Created] (TIKA-852) Quicktime / MP4 Metadata Parser

2012-01-28 Thread Nick Burch (Created) (JIRA)
Quicktime / MP4 Metadata Parser --- Key: TIKA-852 URL: https://issues.apache.org/jira/browse/TIKA-852 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.0

[jira] [Created] (TIKA-851) M4V magic detection invalid

2012-01-27 Thread Alexander Chow (Created) (JIRA)
M4V magic detection invalid --- Key: TIKA-851 URL: https://issues.apache.org/jira/browse/TIKA-851 Project: Tika Issue Type: Bug Components: mime Affects Versions: 1.0 Reporter: Alexander

[jira] [Created] (TIKA-850) Consistent way to supply document passwords to parsers

2012-01-24 Thread Nick Burch (Created) (JIRA)
Consistent way to supply document passwords to parsers -- Key: TIKA-850 URL: https://issues.apache.org/jira/browse/TIKA-850 Project: Tika Issue Type: Improvement Components:

[jira] [Created] (TIKA-848) NullPointerException in SecurityHandler.addDictionaryAndSubDictionary(SecurityHandler.java:185)

2012-01-22 Thread Tom Field (Created) (JIRA)
NullPointerException in SecurityHandler.addDictionaryAndSubDictionary(SecurityHandler.java:185) --- Key: TIKA-848 URL: https://issues.apache.org/jira/browse/TIKA-848

[jira] [Created] (TIKA-841) User supplied parsers should be preferred

2012-01-16 Thread Nick Burch (Created) (JIRA)
User supplied parsers should be preferred - Key: TIKA-841 URL: https://issues.apache.org/jira/browse/TIKA-841 Project: Tika Issue Type: Improvement Components: parser Affects Versions:

[jira] [Created] (TIKA-843) Support for Date without a Time Component

2012-01-16 Thread Ray Gauss II (Created) (JIRA)
Support for Date without a Time Component - Key: TIKA-843 URL: https://issues.apache.org/jira/browse/TIKA-843 Project: Tika Issue Type: Improvement Components: metadata Affects Versions:

[jira] [Created] (TIKA-844) Ability to Define an Internal Text Bag Property

2012-01-16 Thread Ray Gauss II (Created) (JIRA)
Ability to Define an Internal Text Bag Property --- Key: TIKA-844 URL: https://issues.apache.org/jira/browse/TIKA-844 Project: Tika Issue Type: Improvement Components: metadata

[jira] [Created] (TIKA-845) Check for Existing Value in Multi-Value Fields in XML Metadata Handler

2012-01-16 Thread Ray Gauss II (Created) (JIRA)
Check for Existing Value in Multi-Value Fields in XML Metadata Handler -- Key: TIKA-845 URL: https://issues.apache.org/jira/browse/TIKA-845 Project: Tika Issue Type:

[jira] [Created] (TIKA-846) Ability to Parse RDF Bag Elements in XML

2012-01-16 Thread Ray Gauss II (Created) (JIRA)
Ability to Parse RDF Bag Elements in XML Key: TIKA-846 URL: https://issues.apache.org/jira/browse/TIKA-846 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.0

[jira] [Created] (TIKA-840) OOXML parser content type setting

2012-01-12 Thread Nick Burch (Created) (JIRA)
OOXML parser content type setting - Key: TIKA-840 URL: https://issues.apache.org/jira/browse/TIKA-840 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.0

[jira] [Created] (TIKA-839) TikaException with testPPT.potm in Tika GUI / CLI

2012-01-10 Thread John Mastarone (Created) (JIRA)
, and it said that it was a file created with a beta version of Office, and that it would updated the next time it was saved to a more up-to-date format. I made the contents look like that of the other Office 2007 presentation documents in the test-documents folder, and added this file and its

[jira] [Created] (TIKA-837) Make inner classes static for performance reasons

2012-01-01 Thread Fabian Lange (Created) (JIRA)
Make inner classes static for performance reasons - Key: TIKA-837 URL: https://issues.apache.org/jira/browse/TIKA-837 Project: Tika Issue Type: Sub-task Components: general

[jira] [Created] (TIKA-838) EmptyParser Singleton should be final

2012-01-01 Thread Fabian Lange (Created) (JIRA)
EmptyParser Singleton should be final - Key: TIKA-838 URL: https://issues.apache.org/jira/browse/TIKA-838 Project: Tika Issue Type: Sub-task Components: general Reporter: Fabian

[jira] [Created] (TIKA-836) parsing really slow on some documents

2011-12-29 Thread Rob Tulloh (Created) (JIRA)
parsing really slow on some documents - Key: TIKA-836 URL: https://issues.apache.org/jira/browse/TIKA-836 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.0

[jira] [Created] (TIKA-835) TNEF parsing unstable

2011-12-29 Thread Rob Tulloh (Created) (JIRA)
TNEF parsing unstable - Key: TIKA-835 URL: https://issues.apache.org/jira/browse/TIKA-835 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.0 Environment: CentOS 4.x/5.x/6.x

[jira] [Created] (TIKA-834) server problem only 1st (-m -j) result is correct additional runs include data from previous runs

2011-12-28 Thread George Kappel (Created) (JIRA)
server problem only 1st (-m -j) result is correct additional runs include data from previous runs - Key: TIKA-834 URL: https://issues.apache.org/jira/browse/TIKA-834

[jira] [Created] (TIKA-833) POI Daily beta6 as of 12/27 breaks ExcelParserTest.testExcelParserFormatting()

2011-12-27 Thread Jeremy Anderson (Created) (JIRA)
POI Daily beta6 as of 12/27 breaks ExcelParserTest.testExcelParserFormatting() -- Key: TIKA-833 URL: https://issues.apache.org/jira/browse/TIKA-833 Project: Tika

[jira] [Created] (TIKA-828) TaggedIOException can be passed non Serializable objects

2011-12-23 Thread Jerome Lacoste (Created) (JIRA)
TaggedIOException can be passed non Serializable objects Key: TIKA-828 URL: https://issues.apache.org/jira/browse/TIKA-828 Project: Tika Issue Type: Bug Affects Versions: 1.0

[jira] [Created] (TIKA-824) Extract rel attr with LinkContentHandler

2011-12-21 Thread Markus Jelsma (Created) (JIRA)
Extract rel attr with LinkContentHandler Key: TIKA-824 URL: https://issues.apache.org/jira/browse/TIKA-824 Project: Tika Issue Type: Improvement Components: parser Reporter:

[jira] [Created] (TIKA-825) Extract rel attr with LinkContentHandler

2011-12-21 Thread Markus Jelsma (Created) (JIRA)
Extract rel attr with LinkContentHandler Key: TIKA-825 URL: https://issues.apache.org/jira/browse/TIKA-825 Project: Tika Issue Type: Improvement Components: parser Reporter:

[jira] [Created] (TIKA-826) TikaException / OfficeXmlFileException with .xlsb files

2011-12-21 Thread John Mastarone (Created) (JIRA)
TikaException / OfficeXmlFileException with .xlsb files --- Key: TIKA-826 URL: https://issues.apache.org/jira/browse/TIKA-826 Project: Tika Issue Type: Bug Components: parser

[jira] [Created] (TIKA-821) Support detecting old MIcrosoft Works Word Processor formats

2011-12-20 Thread Antoni Mylka (Created) (JIRA)
Support detecting old MIcrosoft Works Word Processor formats Key: TIKA-821 URL: https://issues.apache.org/jira/browse/TIKA-821 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-822) MediaType fails to parse charset that has quoted value

2011-12-20 Thread peter royal (Created) (JIRA)
MediaType fails to parse charset that has quoted value -- Key: TIKA-822 URL: https://issues.apache.org/jira/browse/TIKA-822 Project: Tika Issue Type: Bug Components: mime

[jira] [Created] (TIKA-818) Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer to allow for a memory vs performance tradeoff

2011-12-19 Thread Paul Pearcy (Created) (JIRA)
Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer to allow for a memory vs performance tradeoff - Key: TIKA-818 URL:

[jira] [Created] (TIKA-812) Improve the detection of Works Spreadsheet 7.0 files

2011-12-13 Thread Antoni Mylka (Created) (JIRA)
Improve the detection of Works Spreadsheet 7.0 files Key: TIKA-812 URL: https://issues.apache.org/jira/browse/TIKA-812 Project: Tika Issue Type: Improvement Components: mime

[jira] [Created] (TIKA-813) Webarchive detection.

2011-12-13 Thread Antoni Mylka (Created) (JIRA)
Webarchive detection. - Key: TIKA-813 URL: https://issues.apache.org/jira/browse/TIKA-813 Project: Tika Issue Type: Improvement Components: mime Affects Versions: 1.1 Reporter: Antoni Mylka

[jira] [Created] (TIKA-814) Increase the amount of bytes read by TextDetector

2011-12-13 Thread Antoni Mylka (Created) (JIRA)
Increase the amount of bytes read by TextDetector - Key: TIKA-814 URL: https://issues.apache.org/jira/browse/TIKA-814 Project: Tika Issue Type: Improvement Affects Versions: 1.1

[jira] [Created] (TIKA-810) Upgrade to PDFbox 1.7.0 as available

2011-12-12 Thread Jeremy Anderson (Created) (JIRA)
Upgrade to PDFbox 1.7.0 as available Key: TIKA-810 URL: https://issues.apache.org/jira/browse/TIKA-810 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.0

[jira] [Created] (TIKA-808) Fork Parser doesn't work for PDF files

2011-12-11 Thread Nick Burch (Created) (JIRA)
Fork Parser doesn't work for PDF files -- Key: TIKA-808 URL: https://issues.apache.org/jira/browse/TIKA-808 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.0

[jira] [Created] (TIKA-809) IndexOutOfBoundsException with TikaGUI

2011-12-11 Thread John Mastarone (Created) (JIRA)
IndexOutOfBoundsException with TikaGUI -- Key: TIKA-809 URL: https://issues.apache.org/jira/browse/TIKA-809 Project: Tika Issue Type: Bug Components: gui Affects Versions: 1.1

[jira] [Created] (TIKA-807) PHP version of Tika

2011-12-10 Thread Ingo Renner (Created) (JIRA)
PHP version of Tika --- Key: TIKA-807 URL: https://issues.apache.org/jira/browse/TIKA-807 Project: Tika Issue Type: New Feature Components: packaging Reporter: Ingo Renner Inspired by #TIKA-773 the

[jira] [Created] (TIKA-805) improvements in XSLFPowerPointExtractorDecorator

2011-12-09 Thread Yegor Kozlov (Created) (JIRA)
improvements in XSLFPowerPointExtractorDecorator - Key: TIKA-805 URL: https://issues.apache.org/jira/browse/TIKA-805 Project: Tika Issue Type: Improvement Components: parser

[jira] [Created] (TIKA-806) MS Word Detection magics are a bit overzealous

2011-12-09 Thread Antoni Mylka (Created) (JIRA)
MS Word Detection magics are a bit overzealous -- Key: TIKA-806 URL: https://issues.apache.org/jira/browse/TIKA-806 Project: Tika Issue Type: Bug Components: mime Affects Versions:

[jira] [Created] (TIKA-801) ContentHandlerDecorator outputs invalid element

2011-12-05 Thread Andrzej Bialecki (Created) (JIRA)
ContentHandlerDecorator outputs invalid element --- Key: TIKA-801 URL: https://issues.apache.org/jira/browse/TIKA-801 Project: Tika Issue Type: Bug Affects Versions: 1.0, 1.1

[jira] [Created] (TIKA-802) NullPointerException when parsing iWork files

2011-12-05 Thread Arthur Meneau (Created) (JIRA)
NullPointerException when parsing iWork files -- Key: TIKA-802 URL: https://issues.apache.org/jira/browse/TIKA-802 Project: Tika Issue Type: Bug Components: parser Affects Versions:

[jira] [Created] (TIKA-797) MimeType.getExtension for application/vnd.ms-powerpoint returns ppz. I'd expect ppt.

2011-12-02 Thread Antoni Mylka (Created) (JIRA)
MimeType.getExtension for application/vnd.ms-powerpoint returns ppz. I'd expect ppt. Key: TIKA-797 URL: https://issues.apache.org/jira/browse/TIKA-797 Project: Tika

[jira] [Created] (TIKA-793) Invalid ASCII character (65533) when retriving MP3 metadata

2011-11-27 Thread William Seemann (Created) (JIRA)
Invalid ASCII character (65533) when retriving MP3 metadata --- Key: TIKA-793 URL: https://issues.apache.org/jira/browse/TIKA-793 Project: Tika Issue Type: Bug Components:

[jira] [Created] (TIKA-794) Mime magic logic for Little16 is incorrect

2011-11-27 Thread Nick Burch (Created) (JIRA)
Mime magic logic for Little16 is incorrect -- Key: TIKA-794 URL: https://issues.apache.org/jira/browse/TIKA-794 Project: Tika Issue Type: Bug Components: mime Affects Versions: 1.0

[jira] [Created] (TIKA-789) Microsoft Project (MPP) basic support

2011-11-25 Thread Nick Burch (Created) (JIRA)
Microsoft Project (MPP) basic support - Key: TIKA-789 URL: https://issues.apache.org/jira/browse/TIKA-789 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.0

[jira] [Created] (TIKA-790) Reduce duplication between POIFSDocumentType (in OfficeParser) and POIFSContainerDetector

2011-11-25 Thread Nick Burch (Created) (JIRA)
Reduce duplication between POIFSDocumentType (in OfficeParser) and POIFSContainerDetector - Key: TIKA-790 URL: https://issues.apache.org/jira/browse/TIKA-790

[jira] [Created] (TIKA-788) DWG parser infinite loop on possibly corrupt file

2011-11-24 Thread Stas Shaposhnikov (Created) (JIRA)
DWG parser infinite loop on possibly corrupt file - Key: TIKA-788 URL: https://issues.apache.org/jira/browse/TIKA-788 Project: Tika Issue Type: Bug Components: parser Affects

[jira] [Created] (TIKA-785) TikaCLI should include a --list-detectors option similar to --list-parsers

2011-11-20 Thread Nick Burch (Created) (JIRA)
TikaCLI should include a --list-detectors option similar to --list-parsers -- Key: TIKA-785 URL: https://issues.apache.org/jira/browse/TIKA-785 Project: Tika Issue

[jira] [Created] (TIKA-784) Mimetype entry for DITA

2011-11-18 Thread Nick Burch (Created) (JIRA)
Mimetype entry for DITA --- Key: TIKA-784 URL: https://issues.apache.org/jira/browse/TIKA-784 Project: Tika Issue Type: Improvement Components: mime Affects Versions: 1.0 Reporter: Nick Burch

[jira] [Created] (TIKA-781) RTF parser should ignore most control words in ignore groups

2011-11-11 Thread Arjohn Kampman (Created) (JIRA)
RTF parser should ignore most control words in ignore groups Key: TIKA-781 URL: https://issues.apache.org/jira/browse/TIKA-781 Project: Tika Issue Type: Bug Components:

[jira] [Created] (TIKA-783) MD5 and SHA1 values posted on the download page for the .jar do not match actual computed values

2011-11-11 Thread Kelvin Meeks (Created) (JIRA)
MD5 and SHA1 values posted on the download page for the .jar do not match actual computed values Key: TIKA-783 URL: https://issues.apache.org/jira/browse/TIKA-783

[jira] [Created] (TIKA-779) Detection of Microsoft Works 2000 Word Processor files

2011-11-10 Thread Antoni Mylka (Created) (JIRA)
Detection of Microsoft Works 2000 Word Processor files -- Key: TIKA-779 URL: https://issues.apache.org/jira/browse/TIKA-779 Project: Tika Issue Type: Test Affects Versions: 1.0

[jira] [Created] (TIKA-780) Optimize loading of the media type registry

2011-11-10 Thread Jukka Zitting (Created) (JIRA)
Optimize loading of the media type registry --- Key: TIKA-780 URL: https://issues.apache.org/jira/browse/TIKA-780 Project: Tika Issue Type: Improvement Components: mime Reporter:

[jira] [Created] (TIKA-777) RTF parser incorrectly applies fonts to complete group

2011-11-08 Thread Arjohn Kampman (Created) (JIRA)
RTF parser incorrectly applies fonts to complete group -- Key: TIKA-777 URL: https://issues.apache.org/jira/browse/TIKA-777 Project: Tika Issue Type: Bug Components: parser

[jira] [Created] (TIKA-776) ExifTool Embedder

2011-11-07 Thread Ray Gauss II (Created) (JIRA)
ExifTool Embedder - Key: TIKA-776 URL: https://issues.apache.org/jira/browse/TIKA-776 Project: Tika Issue Type: New Feature Components: metadata Affects Versions: 1.0 Environment: ExifTool is required

[jira] [Created] (TIKA-774) ExifTool Parser

2011-11-06 Thread Ray Gauss II (Created) (JIRA)
ExifTool Parser --- Key: TIKA-774 URL: https://issues.apache.org/jira/browse/TIKA-774 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.0 Environment: Requires be installed

[jira] [Created] (TIKA-771) Hello, World! in UTF-8/ASCII gets detected as IBM500

2011-11-03 Thread Jukka Zitting (Created) (JIRA)
Hello, World! in UTF-8/ASCII gets detected as IBM500 -- Key: TIKA-771 URL: https://issues.apache.org/jira/browse/TIKA-771 Project: Tika Issue Type: Bug Reporter: Jukka Zitting

[jira] [Created] (TIKA-770) New ODF metadata keys

2011-11-02 Thread Jukka Zitting (Created) (JIRA)
New ODF metadata keys - Key: TIKA-770 URL: https://issues.apache.org/jira/browse/TIKA-770 Project: Tika Issue Type: Improvement Components: metadata, parser Reporter: Jukka Zitting

[jira] [Created] (TIKA-765) add icu dependency

2011-11-01 Thread Robert Muir (Created) (JIRA)
add icu dependency -- Key: TIKA-765 URL: https://issues.apache.org/jira/browse/TIKA-765 Project: Tika Issue Type: Improvement Components: general Affects Versions: 0.10 Reporter: Robert Muir

[jira] [Created] (TIKA-766) Trim down the NetCDF dependency

2011-11-01 Thread Jukka Zitting (Created) (JIRA)
Trim down the NetCDF dependency --- Key: TIKA-766 URL: https://issues.apache.org/jira/browse/TIKA-766 Project: Tika Issue Type: Improvement Components: packaging, parser Reporter: Jukka

[jira] [Created] (TIKA-768) Parser for EDF files

2011-11-01 Thread Jukka Zitting (Created) (JIRA)
Parser for EDF files Key: TIKA-768 URL: https://issues.apache.org/jira/browse/TIKA-768 Project: Tika Issue Type: New Feature Components: parser Reporter: Jukka Zitting Priority: Minor

[jira] [Created] (TIKA-763) Update license metadata

2011-10-28 Thread Jukka Zitting (Created) (JIRA)
Update license metadata --- Key: TIKA-763 URL: https://issues.apache.org/jira/browse/TIKA-763 Project: Tika Issue Type: Improvement Components: packaging Reporter: Jukka Zitting

[jira] [Created] (TIKA-764) OpenDocumentMetaParser should use common metadata keys for document statistics

2011-10-28 Thread Nick Burch (Created) (JIRA)
OpenDocumentMetaParser should use common metadata keys for document statistics -- Key: TIKA-764 URL: https://issues.apache.org/jira/browse/TIKA-764 Project: Tika

[jira] [Created] (TIKA-762) EXIF extraction from PNG images

2011-10-26 Thread Nick Burch (Created) (JIRA)
EXIF extraction from PNG images --- Key: TIKA-762 URL: https://issues.apache.org/jira/browse/TIKA-762 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.0

[jira] [Created] (TIKA-761) Provide version number by CLI argument -V

2011-10-24 Thread Ingo Renner (Created) (JIRA)
Provide version number by CLI argument -V - Key: TIKA-761 URL: https://issues.apache.org/jira/browse/TIKA-761 Project: Tika Issue Type: New Feature Components: cli, general

[jira] [Created] (TIKA-759) Better handling of content type metadata

2011-10-21 Thread Jukka Zitting (Created) (JIRA)
Better handling of content type metadata Key: TIKA-759 URL: https://issues.apache.org/jira/browse/TIKA-759 Project: Tika Issue Type: Improvement Components: metadata, mime

[jira] [Created] (TIKA-760) NPE XHTMLContentHandler in characters Method

2011-10-21 Thread Torsten Krah (Created) (JIRA)
NPE XHTMLContentHandler in characters Method Key: TIKA-760 URL: https://issues.apache.org/jira/browse/TIKA-760 Project: Tika Issue Type: Bug Components: parser Affects Versions:

[jira] [Created] (TIKA-757) Address TODOs when we upgrade to next POI release (3.8 beta 5)

2011-10-20 Thread Michael McCandless (Created) (JIRA)
Address TODOs when we upgrade to next POI release (3.8 beta 5) -- Key: TIKA-757 URL: https://issues.apache.org/jira/browse/TIKA-757 Project: Tika Issue Type: Improvement

  1   2   >