UTF-8 encoded XML is detected as text/plain because of UTF-8 BOM
Key: TIKA-897
URL: https://issues.apache.org/jira/browse/TIKA-897
Project: Tika
Issue Type: Bug
Empty title element makes Tika-generated HTML documents not open
Key: TIKA-895
URL: https://issues.apache.org/jira/browse/TIKA-895
Project: Tika
Issue Type: Bug
OSGi deployment without declarative services
Key: TIKA-896
URL: https://issues.apache.org/jira/browse/TIKA-896
Project: Tika
Issue Type: Improvement
Components: packaging
Affects
Add webapp mode for Tika Server, simplifies deployment
--
Key: TIKA-894
URL: https://issues.apache.org/jira/browse/TIKA-894
Project: Tika
Issue Type: Improvement
Components:
Tika-server bundle includes wrong
META-INF/services/org.apache.tika.parser.Parser, doesn't work
---
Key: TIKA-893
URL: https://issues.apache.org/jira/browse/TIKA-893
NetCDF parser uses Java 6 JAR file and test/compilation fails with Java 1.5,
although TIKA is Java 1.5
--
Key: TIKA-888
URL:
XHTMLContentHandler wont emit newline when html element matches ENDLINE set
---
Key: TIKA-889
URL: https://issues.apache.org/jira/browse/TIKA-889
Project: Tika
Issue
.
This happpens with files downloaded from www.jamendo.com, for example this one:
http://storage.newjamendo.com/download/track/450545/mp32/Swansong.mp3
It may be that the tags are not created properly on this site, but at least
tools like mp3tag display them correctly.
The extracted value looks like
Reporter: Nick Burch
Assignee: Nick Burch
Fix For: 1.2
As identified in an Alfresco bug (ALF-13106), OOXMLExtractorFactory doesn't
currently allow the closing of OPCPackage instances created from Files. This is
because the OPCPackage isn't associated
Dynamic loading of Parser and Detector services
---
Key: TIKA-884
URL: https://issues.apache.org/jira/browse/TIKA-884
Project: Tika
Issue Type: Improvement
Affects Versions: 1.1
Possible ConcurrentModificationException while accessing Metadata produced by
ParsingReader
---
Key: TIKA-885
URL: https://issues.apache.org/jira/browse/TIKA-885
IllegalArgumentException: No part found for relationship
Key: TIKA-882
URL: https://issues.apache.org/jira/browse/TIKA-882
Project: Tika
Issue Type: Bug
Components: parser
Detection problem: message/rfc822 file is detected as text/plain.
-
Key: TIKA-879
URL: https://issues.apache.org/jira/browse/TIKA-879
Project: Tika
Issue Type: Bug
while integrating microsoft parser it is giving error
-
Key: TIKA-880
URL: https://issues.apache.org/jira/browse/TIKA-880
Project: Tika
Issue Type: Wish
Components: parser
Reuse computed MapMediaType, Parser inside CompositeParser
Key: TIKA-878
URL: https://issues.apache.org/jira/browse/TIKA-878
Project: Tika
Issue Type: Improvement
Embedded document not extracted (regression)
Key: TIKA-877
URL: https://issues.apache.org/jira/browse/TIKA-877
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.1
Signed pdf parsing
--
Key: TIKA-876
URL: https://issues.apache.org/jira/browse/TIKA-876
Project: Tika
Issue Type: New Feature
Components: parser
Affects Versions: 1.0
Environment: Java 6.0, Ubuntu
Affects Versions: 1.1, 1.2
Reporter: Peter May
Priority: Minor
Tika does not have a defined signature for application/fits files. I have
created a patch (based on file(1) magic) to address identification of such
files, including a simple unit test.
This patch only
Tika --extract fails for RTF
Key: TIKA-872
URL: https://issues.apache.org/jira/browse/TIKA-872
Project: Tika
Issue Type: New Feature
Components: general
Affects Versions: 1.0
Environment:
Tika --extract fails for DOC
Key: TIKA-873
URL: https://issues.apache.org/jira/browse/TIKA-873
Project: Tika
Issue Type: Bug
Components: general
Affects Versions: 1.0
Environment: Windows
Text in nested groups within a pptx not parsed
--
Key: TIKA-871
URL: https://issues.apache.org/jira/browse/TIKA-871
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions:
IdentityHtmlMapper.mapSafeElement() needs to return lower-cased incoming name
-
Key: TIKA-869
URL: https://issues.apache.org/jira/browse/TIKA-869
Project: Tika
Allow to use call parseToString with a additional parameter of MaxStringLength,
so it can be changed per call
-
Key: TIKA-870
URL:
TXT parser does not honour the specified encoding
-
Key: TIKA-868
URL: https://issues.apache.org/jira/browse/TIKA-868
Project: Tika
Issue Type: Bug
Reporter: Daniel Bonniot de
UTF-8 encoding does not work on windows
---
Key: TIKA-867
URL: https://issues.apache.org/jira/browse/TIKA-867
Project: Tika
Issue Type: Bug
Components: cli
Affects Versions: 1.0
Incomplete configuration file causes OutOfMemoryException
-
Key: TIKA-866
URL: https://issues.apache.org/jira/browse/TIKA-866
Project: Tika
Issue Type: Bug
Components: config
JPSS HDF5 files not being detected appropriately
Key: TIKA-862
URL: https://issues.apache.org/jira/browse/TIKA-862
Project: Tika
Issue Type: Bug
Reporter: Richard Yu
MailContentHandler should not create AutoDetectParser on each call
--
Key: TIKA-863
URL: https://issues.apache.org/jira/browse/TIKA-863
Project: Tika
Issue Type: Bug
Metadata.formatDate should use ThreadLocal
--
Key: TIKA-864
URL: https://issues.apache.org/jira/browse/TIKA-864
Project: Tika
Issue Type: Improvement
Components: metadata
MimeTypes.forName should avoid method-level synchronization
---
Key: TIKA-865
URL: https://issues.apache.org/jira/browse/TIKA-865
Project: Tika
Issue Type: Improvement
Make ZIP bomb detection configureable
-
Key: TIKA-860
URL: https://issues.apache.org/jira/browse/TIKA-860
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.0
Tika TrueTypeParser add metadata from Naming tables
---
Key: TIKA-857
URL: https://issues.apache.org/jira/browse/TIKA-857
Project: Tika
Issue Type: Improvement
Components: parser
Tika add parsing support for ANPA-1312 news wire feeds
--
Key: TIKA-858
URL: https://issues.apache.org/jira/browse/TIKA-858
Project: Tika
Issue Type: New Feature
Components:
DublinCore Metadata Keys Should be Prefixed and Property Objects
Key: TIKA-859
URL: https://issues.apache.org/jira/browse/TIKA-859
Project: Tika
Issue Type: Improvement
Language Detection not working for Japanese and Chinese.
Key: TIKA-855
URL: https://issues.apache.org/jira/browse/TIKA-855
Project: Tika
Issue Type: Bug
Components:
Support CJK (Chinese, Japanese and Korean) language detection
-
Key: TIKA-856
URL: https://issues.apache.org/jira/browse/TIKA-856
Project: Tika
Issue Type: New Feature
No text extraction Word macroenabled template
-
Key: TIKA-854
URL: https://issues.apache.org/jira/browse/TIKA-854
Project: Tika
Issue Type: Bug
Affects Versions: 1.1
Reporter:
java.io.IOException with TikaGUI and testMP4.m4a
Key: TIKA-853
URL: https://issues.apache.org/jira/browse/TIKA-853
Project: Tika
Issue Type: Bug
Components: gui, parser
Affects
Quicktime / MP4 Metadata Parser
---
Key: TIKA-852
URL: https://issues.apache.org/jira/browse/TIKA-852
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.0
M4V magic detection invalid
---
Key: TIKA-851
URL: https://issues.apache.org/jira/browse/TIKA-851
Project: Tika
Issue Type: Bug
Components: mime
Affects Versions: 1.0
Reporter: Alexander
Consistent way to supply document passwords to parsers
--
Key: TIKA-850
URL: https://issues.apache.org/jira/browse/TIKA-850
Project: Tika
Issue Type: Improvement
Components:
NullPointerException in
SecurityHandler.addDictionaryAndSubDictionary(SecurityHandler.java:185)
---
Key: TIKA-848
URL: https://issues.apache.org/jira/browse/TIKA-848
User supplied parsers should be preferred
-
Key: TIKA-841
URL: https://issues.apache.org/jira/browse/TIKA-841
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions:
Support for Date without a Time Component
-
Key: TIKA-843
URL: https://issues.apache.org/jira/browse/TIKA-843
Project: Tika
Issue Type: Improvement
Components: metadata
Affects Versions:
Ability to Define an Internal Text Bag Property
---
Key: TIKA-844
URL: https://issues.apache.org/jira/browse/TIKA-844
Project: Tika
Issue Type: Improvement
Components: metadata
Check for Existing Value in Multi-Value Fields in XML Metadata Handler
--
Key: TIKA-845
URL: https://issues.apache.org/jira/browse/TIKA-845
Project: Tika
Issue Type:
Ability to Parse RDF Bag Elements in XML
Key: TIKA-846
URL: https://issues.apache.org/jira/browse/TIKA-846
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.0
OOXML parser content type setting
-
Key: TIKA-840
URL: https://issues.apache.org/jira/browse/TIKA-840
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.0
, and it said that it was a file created with a beta
version of Office, and that it would updated the next time it was saved to a
more up-to-date format. I made the contents look like that of the other Office
2007 presentation documents in the test-documents folder, and added this file
and its
Make inner classes static for performance reasons
-
Key: TIKA-837
URL: https://issues.apache.org/jira/browse/TIKA-837
Project: Tika
Issue Type: Sub-task
Components: general
EmptyParser Singleton should be final
-
Key: TIKA-838
URL: https://issues.apache.org/jira/browse/TIKA-838
Project: Tika
Issue Type: Sub-task
Components: general
Reporter: Fabian
parsing really slow on some documents
-
Key: TIKA-836
URL: https://issues.apache.org/jira/browse/TIKA-836
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.0
TNEF parsing unstable
-
Key: TIKA-835
URL: https://issues.apache.org/jira/browse/TIKA-835
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.0
Environment: CentOS 4.x/5.x/6.x
server problem only 1st (-m -j) result is correct additional runs include data
from previous runs
-
Key: TIKA-834
URL: https://issues.apache.org/jira/browse/TIKA-834
POI Daily beta6 as of 12/27 breaks ExcelParserTest.testExcelParserFormatting()
--
Key: TIKA-833
URL: https://issues.apache.org/jira/browse/TIKA-833
Project: Tika
TaggedIOException can be passed non Serializable objects
Key: TIKA-828
URL: https://issues.apache.org/jira/browse/TIKA-828
Project: Tika
Issue Type: Bug
Affects Versions: 1.0
Extract rel attr with LinkContentHandler
Key: TIKA-824
URL: https://issues.apache.org/jira/browse/TIKA-824
Project: Tika
Issue Type: Improvement
Components: parser
Reporter:
Extract rel attr with LinkContentHandler
Key: TIKA-825
URL: https://issues.apache.org/jira/browse/TIKA-825
Project: Tika
Issue Type: Improvement
Components: parser
Reporter:
TikaException / OfficeXmlFileException with .xlsb files
---
Key: TIKA-826
URL: https://issues.apache.org/jira/browse/TIKA-826
Project: Tika
Issue Type: Bug
Components: parser
Support detecting old MIcrosoft Works Word Processor formats
Key: TIKA-821
URL: https://issues.apache.org/jira/browse/TIKA-821
Project: Tika
Issue Type: Improvement
MediaType fails to parse charset that has quoted value
--
Key: TIKA-822
URL: https://issues.apache.org/jira/browse/TIKA-822
Project: Tika
Issue Type: Bug
Components: mime
Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer to allow
for a memory vs performance tradeoff
-
Key: TIKA-818
URL:
Improve the detection of Works Spreadsheet 7.0 files
Key: TIKA-812
URL: https://issues.apache.org/jira/browse/TIKA-812
Project: Tika
Issue Type: Improvement
Components: mime
Webarchive detection.
-
Key: TIKA-813
URL: https://issues.apache.org/jira/browse/TIKA-813
Project: Tika
Issue Type: Improvement
Components: mime
Affects Versions: 1.1
Reporter: Antoni Mylka
Increase the amount of bytes read by TextDetector
-
Key: TIKA-814
URL: https://issues.apache.org/jira/browse/TIKA-814
Project: Tika
Issue Type: Improvement
Affects Versions: 1.1
Upgrade to PDFbox 1.7.0 as available
Key: TIKA-810
URL: https://issues.apache.org/jira/browse/TIKA-810
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.0
Fork Parser doesn't work for PDF files
--
Key: TIKA-808
URL: https://issues.apache.org/jira/browse/TIKA-808
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.0
IndexOutOfBoundsException with TikaGUI
--
Key: TIKA-809
URL: https://issues.apache.org/jira/browse/TIKA-809
Project: Tika
Issue Type: Bug
Components: gui
Affects Versions: 1.1
PHP version of Tika
---
Key: TIKA-807
URL: https://issues.apache.org/jira/browse/TIKA-807
Project: Tika
Issue Type: New Feature
Components: packaging
Reporter: Ingo Renner
Inspired by #TIKA-773 the
improvements in XSLFPowerPointExtractorDecorator
-
Key: TIKA-805
URL: https://issues.apache.org/jira/browse/TIKA-805
Project: Tika
Issue Type: Improvement
Components: parser
MS Word Detection magics are a bit overzealous
--
Key: TIKA-806
URL: https://issues.apache.org/jira/browse/TIKA-806
Project: Tika
Issue Type: Bug
Components: mime
Affects Versions:
ContentHandlerDecorator outputs invalid element
---
Key: TIKA-801
URL: https://issues.apache.org/jira/browse/TIKA-801
Project: Tika
Issue Type: Bug
Affects Versions: 1.0, 1.1
NullPointerException when parsing iWork files
--
Key: TIKA-802
URL: https://issues.apache.org/jira/browse/TIKA-802
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions:
MimeType.getExtension for application/vnd.ms-powerpoint returns ppz. I'd expect
ppt.
Key: TIKA-797
URL: https://issues.apache.org/jira/browse/TIKA-797
Project: Tika
Invalid ASCII character (65533) when retriving MP3 metadata
---
Key: TIKA-793
URL: https://issues.apache.org/jira/browse/TIKA-793
Project: Tika
Issue Type: Bug
Components:
Mime magic logic for Little16 is incorrect
--
Key: TIKA-794
URL: https://issues.apache.org/jira/browse/TIKA-794
Project: Tika
Issue Type: Bug
Components: mime
Affects Versions: 1.0
Microsoft Project (MPP) basic support
-
Key: TIKA-789
URL: https://issues.apache.org/jira/browse/TIKA-789
Project: Tika
Issue Type: New Feature
Components: parser
Affects Versions: 1.0
Reduce duplication between POIFSDocumentType (in OfficeParser) and
POIFSContainerDetector
-
Key: TIKA-790
URL: https://issues.apache.org/jira/browse/TIKA-790
DWG parser infinite loop on possibly corrupt file
-
Key: TIKA-788
URL: https://issues.apache.org/jira/browse/TIKA-788
Project: Tika
Issue Type: Bug
Components: parser
Affects
TikaCLI should include a --list-detectors option similar to --list-parsers
--
Key: TIKA-785
URL: https://issues.apache.org/jira/browse/TIKA-785
Project: Tika
Issue
Mimetype entry for DITA
---
Key: TIKA-784
URL: https://issues.apache.org/jira/browse/TIKA-784
Project: Tika
Issue Type: Improvement
Components: mime
Affects Versions: 1.0
Reporter: Nick Burch
RTF parser should ignore most control words in ignore groups
Key: TIKA-781
URL: https://issues.apache.org/jira/browse/TIKA-781
Project: Tika
Issue Type: Bug
Components:
MD5 and SHA1 values posted on the download page for the .jar do not match
actual computed values
Key: TIKA-783
URL: https://issues.apache.org/jira/browse/TIKA-783
Detection of Microsoft Works 2000 Word Processor files
--
Key: TIKA-779
URL: https://issues.apache.org/jira/browse/TIKA-779
Project: Tika
Issue Type: Test
Affects Versions: 1.0
Optimize loading of the media type registry
---
Key: TIKA-780
URL: https://issues.apache.org/jira/browse/TIKA-780
Project: Tika
Issue Type: Improvement
Components: mime
Reporter:
RTF parser incorrectly applies fonts to complete group
--
Key: TIKA-777
URL: https://issues.apache.org/jira/browse/TIKA-777
Project: Tika
Issue Type: Bug
Components: parser
ExifTool Embedder
-
Key: TIKA-776
URL: https://issues.apache.org/jira/browse/TIKA-776
Project: Tika
Issue Type: New Feature
Components: metadata
Affects Versions: 1.0
Environment: ExifTool is required
ExifTool Parser
---
Key: TIKA-774
URL: https://issues.apache.org/jira/browse/TIKA-774
Project: Tika
Issue Type: New Feature
Components: parser
Affects Versions: 1.0
Environment: Requires be installed
Hello, World! in UTF-8/ASCII gets detected as IBM500
--
Key: TIKA-771
URL: https://issues.apache.org/jira/browse/TIKA-771
Project: Tika
Issue Type: Bug
Reporter: Jukka Zitting
New ODF metadata keys
-
Key: TIKA-770
URL: https://issues.apache.org/jira/browse/TIKA-770
Project: Tika
Issue Type: Improvement
Components: metadata, parser
Reporter: Jukka Zitting
add icu dependency
--
Key: TIKA-765
URL: https://issues.apache.org/jira/browse/TIKA-765
Project: Tika
Issue Type: Improvement
Components: general
Affects Versions: 0.10
Reporter: Robert Muir
Trim down the NetCDF dependency
---
Key: TIKA-766
URL: https://issues.apache.org/jira/browse/TIKA-766
Project: Tika
Issue Type: Improvement
Components: packaging, parser
Reporter: Jukka
Parser for EDF files
Key: TIKA-768
URL: https://issues.apache.org/jira/browse/TIKA-768
Project: Tika
Issue Type: New Feature
Components: parser
Reporter: Jukka Zitting
Priority: Minor
Update license metadata
---
Key: TIKA-763
URL: https://issues.apache.org/jira/browse/TIKA-763
Project: Tika
Issue Type: Improvement
Components: packaging
Reporter: Jukka Zitting
OpenDocumentMetaParser should use common metadata keys for document statistics
--
Key: TIKA-764
URL: https://issues.apache.org/jira/browse/TIKA-764
Project: Tika
EXIF extraction from PNG images
---
Key: TIKA-762
URL: https://issues.apache.org/jira/browse/TIKA-762
Project: Tika
Issue Type: New Feature
Components: parser
Affects Versions: 1.0
Provide version number by CLI argument -V
-
Key: TIKA-761
URL: https://issues.apache.org/jira/browse/TIKA-761
Project: Tika
Issue Type: New Feature
Components: cli, general
Better handling of content type metadata
Key: TIKA-759
URL: https://issues.apache.org/jira/browse/TIKA-759
Project: Tika
Issue Type: Improvement
Components: metadata, mime
NPE XHTMLContentHandler in characters Method
Key: TIKA-760
URL: https://issues.apache.org/jira/browse/TIKA-760
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions:
Address TODOs when we upgrade to next POI release (3.8 beta 5)
--
Key: TIKA-757
URL: https://issues.apache.org/jira/browse/TIKA-757
Project: Tika
Issue Type: Improvement
1 - 100 of 118 matches
Mail list logo