[jira] [Updated] (TIKA-896) OSGi deployment without declarative services

2012-04-19 Thread Updated
[ https://issues.apache.org/jira/browse/TIKA-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jörg Ehrlich updated TIKA-896: -- Attachment: osgi.patch The attached patch fixes these issues. OSGi deployment without

[jira] [Updated] (TIKA-896) OSGi deployment without declarative services

2012-04-19 Thread Updated
[ https://issues.apache.org/jira/browse/TIKA-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jörg Ehrlich updated TIKA-896: -- Attachment: osgi.patch Updated patch that drops the scr plugin from the BundleIT in tika-bundle

[jira] [Updated] (TIKA-894) Add webapp mode for Tika Server, simplifies deployment

2012-04-17 Thread Chris Wilson (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Wilson updated TIKA-894: -- Attachment: tika-server-webapp.patch Add webapp mode for Tika Server, simplifies deployment

[jira] [Updated] (TIKA-892) Tika does not use the HTML5 meta charset tag when determining charset

2012-04-11 Thread Chris Jones (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Jones updated TIKA-892: - Attachment: (was: tika-html5-patch.java) Tika does not use the HTML5 meta charset tag when

[jira] [Updated] (TIKA-892) Tika does not use the HTML5 meta charset tag when determining charset

2012-04-11 Thread Chris Jones (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Jones updated TIKA-892: - Attachment: tika-html5.patch Patch to add support for HTML5 meta charset= tag to HtmlParser

[jira] [Updated] (TIKA-892) Tika does not use the HTML5 meta charset tag when determining charset

2012-04-11 Thread Chris Jones (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Jones updated TIKA-892: - Comment: was deleted (was: Add support for HTML5 meta charset= tag to HtmlParser) Tika does not use

[jira] [Updated] (TIKA-892) Tika does not use the HTML5 meta charset tag when determining charset

2012-04-11 Thread Chris Jones (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Jones updated TIKA-892: - Comment: was deleted (was: Patch to add HTML5 meta charset= support to HtmlParser) Tika does not use

[jira] [Updated] (TIKA-892) Tika does not use the HTML5 meta charset tag when determining charset

2012-04-11 Thread Chris Jones (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Jones updated TIKA-892: - Attachment: (was: tika-html5.patch) Tika does not use the HTML5 meta charset tag when determining

[jira] [Updated] (TIKA-887) Tika fails to parse some MP3 tags correctly and produces null characters in value

2012-03-29 Thread Updated
[ https://issues.apache.org/jira/browse/TIKA-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jens Hübel updated TIKA-887: Affects Version/s: 1.1 Thanks for the hint Nick. I missed that there was a new release recently

[jira] [Updated] (TIKA-593) Tika network server

2012-03-27 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-593: --- Attachment: TIKA-593.Mattmann.032612.patch.2.txt - ok tests passing, mostly. Will finish

[jira] [Updated] (TIKA-593) Tika network server

2012-03-27 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-593: --- Attachment: TIKA-593.Mattmann.032712.patch.2.txt Tika network server

[jira] [Updated] (TIKA-593) Tika network server

2012-03-26 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-593: --- Attachment: TIKA-593.Mattmann.032612.patch.txt - Max FYI my current progress. I'm trying

[jira] [Updated] (TIKA-881) HtmlParser sometimes(!) throws IOException while determining Html-Encoding

2012-03-22 Thread Klaus v. Einem (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Klaus v. Einem updated TIKA-881: Attachment: BugfixHtmlParser.java This is my Solution... Sorry, Comments are in German. The Key

[jira] [Updated] (TIKA-881) HtmlParser sometimes(!) throws IOException while determining Html-Encoding

2012-03-22 Thread Klaus v. Einem (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Klaus v. Einem updated TIKA-881: Attachment: HtmlParser.java OK, this is 100% original sourcecode with Bugfix included

[jira] [Updated] (TIKA-877) Embedded document not extracted (regression)

2012-03-18 Thread Daniel Bonniot de Ruisselet (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Bonniot de Ruisselet updated TIKA-877: - Attachment: coffee.xls Embedded document not extracted (regression

[jira] [Updated] (TIKA-874) Identify FITS (Flexible Image Transport System) files

2012-03-12 Thread Peter May (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter May updated TIKA-874: --- Attachment: fits_support.patch This patch identifies FITS files, based on the signature used by the file(1

[jira] [Updated] (TIKA-874) Identify FITS (Flexible Image Transport System) files

2012-03-12 Thread Peter May (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter May updated TIKA-874: --- Attachment: fits_support.patch Identify FITS (Flexible Image Transport System) files

[jira] [Updated] (TIKA-874) Identify FITS (Flexible Image Transport System) files

2012-03-12 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-874: --- Affects Version/s: (was: 1.2) (was: 1.1) Fix Version/s

[jira] [Updated] (TIKA-872) Tika --extract fails for RTF

2012-03-09 Thread Albert L. (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert L. updated TIKA-872: --- Issue Type: Bug (was: New Feature) Tika --extract fails for RTF

[jira] [Updated] (TIKA-873) Tika --extract fails for DOC

2012-03-09 Thread Albert L. (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert L. updated TIKA-873: --- Description: A file that is embedded in an DOCfile doesn't get extracted to disk. To embed a file into an DOC

[jira] [Updated] (TIKA-873) Tika --extract fails for DOC

2012-03-09 Thread Albert L. (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert L. updated TIKA-873: --- Description: A file that is embedded in an DOCfile doesn't get extracted to disk. To embed a file into an DOC

[jira] [Updated] (TIKA-871) Text in nested groups within a pptx not parsed

2012-03-08 Thread Curtis Hyder (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Curtis Hyder updated TIKA-871: -- Description: Text within objects in a nested group is not parsed. Given the following group hierarchy

[jira] [Updated] (TIKA-871) Text in nested groups within a pptx not parsed

2012-03-08 Thread Curtis Hyder (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Curtis Hyder updated TIKA-871: -- Attachment: test.pptx Text in nested groups within a pptx not parsed

[jira] [Updated] (TIKA-817) (PPT/PPTX) Missing date/time in text content.

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-817: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2 (PPT

[jira] [Updated] (TIKA-861) Parse links in PDF

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-861: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2 Parse

[jira] [Updated] (TIKA-868) TXT parser does not honour the specified encoding

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-868: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2 TXT

[jira] [Updated] (TIKA-715) Some parsers produce non-well-formed XHTML SAX events

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-715: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2 Some

[jira] [Updated] (TIKA-816) (XLS/XLSX) Improperly formatted date/time in text content.

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-816: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2 (XLS

[jira] [Updated] (TIKA-605) Tika GDAL parser

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-605: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2 Tika

[jira] [Updated] (TIKA-819) Make Option to Exclude Embedded Files' Text for Text Content

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-819: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2 Make

[jira] [Updated] (TIKA-758) Address TODOs when we upgrade to next PDFBox release

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-758: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2

[jira] [Updated] (TIKA-776) ExifTool Embedder

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-776: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2

[jira] [Updated] (TIKA-820) Locator is unset for HTML parser

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-820: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2

[jira] [Updated] (TIKA-754) Automatic line break insertion (BR element) instead of '\n' in XHTMLContentHandler

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-754: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2

[jira] [Updated] (TIKA-775) Embed Capabilities

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-775: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2 Embed

[jira] [Updated] (TIKA-593) Tika network server

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-593: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2 Tika

[jira] [Updated] (TIKA-859) DublinCore Metadata Keys Should be Prefixed and Property Objects

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-859: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2

[jira] [Updated] (TIKA-774) ExifTool Parser

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-774: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2

[jira] [Updated] (TIKA-842) IPTC Properties Should be Defined Completely and Independently of the Drew Library

2012-03-07 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-842: --- Fix Version/s: (was: 1.1) 1.2 - push out to 1.2 IPTC

[jira] [Updated] (TIKA-869) IdentityHtmlMapper.mapSafeElement() needs to return lower-cased incoming name

2012-03-07 Thread Ken Krugler (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler updated TIKA-869: - Attachment: TIKA-869.patch IdentityHtmlMapper.mapSafeElement() needs to return lower-cased incoming

[jira] [Updated] (TIKA-870) Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call

2012-03-07 Thread Michael McCandless (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated TIKA-870: Attachment: TIKA-870.patch Patch, with the sample code plus a test case. The test case

[jira] [Updated] (TIKA-859) DublinCore Metadata Keys Should be Prefixed and Property Objects

2012-03-07 Thread Ray Gauss II (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-859: -- Attachment: dublincore-prefixed-and-updated-references-parsers-patch dublincore-prefixed

[jira] [Updated] (TIKA-859) DublinCore Metadata Keys Should be Prefixed and Property Objects

2012-03-07 Thread Ray Gauss II (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-859: -- Attachment: (was: dublincore-prefixed-patch.diff) DublinCore Metadata Keys Should be Prefixed

[jira] [Updated] (TIKA-866) Incomplete configuration file causes OutOfMemoryException

2012-02-17 Thread Updated
[ https://issues.apache.org/jira/browse/TIKA-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Mühlstrasser updated TIKA-866: -- Attachment: ConfigFile.java Unit test to reproduce the problem

[jira] [Updated] (TIKA-866) Invalid configuration file causes OutOfMemoryException

2012-02-17 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-866: --- Summary: Invalid configuration file causes OutOfMemoryException (was: Incomplete configuration file

[jira] [Updated] (TIKA-864) Metadata.formatDate causes blocking in concurrent use

2012-02-17 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-864: --- Summary: Metadata.formatDate causes blocking in concurrent use (was: Metadata.formatDate should use

[jira] [Updated] (TIKA-862) JPSS HDF5 files not being detected appropriately

2012-02-16 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-862: --- Component/s: parser Affects Version/s: 1.0 - classify and identify version (I think

[jira] [Updated] (TIKA-862) JPSS HDF5 files not being detected appropriately

2012-02-16 Thread Richard Yu (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Yu updated TIKA-862: Attachment: RNSCA-ROLPS_npp_d20120202_t1841338_e1842112_b01382_c20120202203730692328_noaa_ops.h5

[jira] [Updated] (TIKA-862) JPSS HDF5 files not being detected appropriately

2012-02-16 Thread Richard Yu (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Yu updated TIKA-862: Attachment: RNSCA-ROLPS_npp_d20120202_t1841338_e1842112_b01382_c20120202203730692328_noaa_ops.h5

[jira] [Updated] (TIKA-863) MailContentHandler should not create AutoDetectParser on each call

2012-02-16 Thread Andrzej Bialecki (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated TIKA-863: --- Attachment: TIKA-863.patch Patch to address this issue. AutoDetectParser instance is cached

[jira] [Updated] (TIKA-863) MailContentHandler should not create AutoDetectParser on each call

2012-02-16 Thread Andrzej Bialecki (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated TIKA-863: --- Description: MailContentHandler is called from RFC822Parser, and it creates AutoDetectParser

[jira] [Updated] (TIKA-820) Locator is unset for HTML parser

2012-02-08 Thread Daniel Bonniot de Ruisselet (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Bonniot de Ruisselet updated TIKA-820: - Fix Version/s: 1.1 Affects Version/s: 1.0 Locator is unset

[jira] [Updated] (TIKA-818) Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer to allow for a memory vs performance tradeoff

2012-02-05 Thread Paul Pearcy (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Pearcy updated TIKA-818: - Attachment: PDFParser.java.patch Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer

[jira] [Updated] (TIKA-853) java.io.IOException with TikaGUI and testMP4.m4a

2012-02-04 Thread John Mastarone (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Mastarone updated TIKA-853: Attachment: TIKA-853.patch I've attached a potential patch for the MP4Parser class that prevents

[jira] [Updated] (TIKA-857) Tika TrueTypeParser add metadata from Naming tables

2012-02-02 Thread Craig Stires (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Stires updated TIKA-857: -- Attachment: TrueTypeParser_AddMetadata.patch this is the patch against

[jira] [Updated] (TIKA-858) Tika add parsing support for ANPA-1312 news wire feeds

2012-02-02 Thread Craig Stires (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Stires updated TIKA-858: -- Attachment: tika-mimetypes_ANPA.patch This is the file recognition for ANPA file types. This patch goes

[jira] [Updated] (TIKA-858) Tika add parsing support for ANPA-1312 news wire feeds

2012-02-02 Thread Craig Stires (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Stires updated TIKA-858: -- Attachment: org.apache.tika.parser.Parser_ANPA.patch This is the change to the parser module, which

[jira] [Updated] (TIKA-858) Tika add parsing support for ANPA-1312 news wire feeds

2012-02-02 Thread Craig Stires (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Stires updated TIKA-858: -- Attachment: IptcAnpaParser.java The file which parses and categorizes the ANPA wire feeds. This gets

[jira] [Updated] (TIKA-859) DublinCore Metadata Keys Should be Prefixed and Property Objects

2012-02-02 Thread Ray Gauss II (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-859: -- Attachment: dublincore-prefixed-patch.diff Patch for changes to DublinCore prefixed Property definitions

[jira] [Updated] (TIKA-842) IPTC Properties Should be Defined Completely and Independently of the Drew Library

2012-02-02 Thread Ray Gauss II (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-842: -- Attachment: metadata-remove-iptc-patch.diff iptc-dublincore-aliased-patch.diff Changes

[jira] [Updated] (TIKA-854) No text extraction Word macroenabled template

2012-01-31 Thread Maxim Valyanskiy (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-854: -- Attachment: cat50.dotm test data No text extraction Word macroenabled template

[jira] [Updated] (TIKA-854) No text extraction for Word macroenabled template

2012-01-31 Thread Maxim Valyanskiy (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-854: -- Summary: No text extraction for Word macroenabled template (was: No text extraction Word

[jira] [Updated] (TIKA-851) M4V and M4A detection invalid

2012-01-27 Thread Alexander Chow (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Chow updated TIKA-851: Description: When the mime type of an M4V file is detected using its name only, it returns video/x

[jira] [Updated] (TIKA-851) M4V and M4A detection invalid

2012-01-27 Thread Alexander Chow (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Chow updated TIKA-851: Attachment: TIKA-851.patch I've added a patch file that I think should fix the problem for both M4V

[jira] [Updated] (TIKA-847) Add regular expression support to the MagicDetector

2012-01-26 Thread Peter May (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter May updated TIKA-847: --- Attachment: regex_support.patch Patch updating MagicDetector and associated unit tests to incorporate regular

[jira] [Updated] (TIKA-849) Identify and parse the Apple iBooks format

2012-01-23 Thread Andrew Jackson (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Jackson updated TIKA-849: Attachment: ibooks-support.patch This patch identifies *.ibooks files, and passes them through

[jira] [Updated] (TIKA-818) Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer to allow for a memory vs performance tradeoff

2012-01-23 Thread Paul Pearcy (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Pearcy updated TIKA-818: - Attachment: choose_inmemory_vs_temp_file_pdf.patch Here is a patch based off the trunk. Please let me know

[jira] [Updated] (TIKA-818) Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer to allow for a memory vs performance tradeoff

2012-01-23 Thread Paul Pearcy (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Pearcy updated TIKA-818: - Attachment: choose_inmemory_vs_temp_file_pdf_passes_tests.patch Here's a version that should pass all

[jira] [Updated] (TIKA-842) IPTC Properties Should be Defined Completely and Independently of the Drew Library

2012-01-16 Thread Ray Gauss II (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-842: -- Attachment: IPTC-metadata-def-patch.diff This metadata interface follows the order, standards

[jira] [Updated] (TIKA-843) Support for Date without a Time Component

2012-01-16 Thread Ray Gauss II (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-843: -- Attachment: date-format-patch.diff Patch to add support for parsing of dates with no time component

[jira] [Updated] (TIKA-844) Ability to Define an Internal Text Bag Property

2012-01-16 Thread Ray Gauss II (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-844: -- Attachment: text-bag-property-patch.diff Patch to create an internal text bag Property

[jira] [Updated] (TIKA-845) Check for Existing Value in Multi-Value Fields in XML Metadata Handler

2012-01-16 Thread Ray Gauss II (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-845: -- Attachment: xml-check-multi-value-existing.diff Patch to check for existing multi-value

[jira] [Updated] (TIKA-846) Ability to Parse RDF Bag Elements in XML

2012-01-16 Thread Ray Gauss II (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-846: -- Attachment: bag-element-metadata-handler.diff Patch to parse RDF bag elements. Ability

[jira] [Updated] (TIKA-805) improvements in XSLFPowerPointExtractorDecorator

2012-01-15 Thread Yegor Kozlov (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yegor Kozlov updated TIKA-805: -- Attachment: poi-xlsf.patch reworked patch made against trunk (r1231646) improvements

[jira] [Updated] (TIKA-839) TikaException with testPPT.potm in Tika GUI / CLI

2012-01-10 Thread John Mastarone (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Mastarone updated TIKA-839: Attachment: testPPT.potm TIKA-839.patch OOXMLParserTest update, and new valid potm

[jira] [Updated] (TIKA-839) TikaException with testPPT.potm in Tika GUI / CLI

2012-01-10 Thread John Mastarone (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Mastarone updated TIKA-839: Description: Attempting to open the testPPT.potm file found in the parsers' test-documents folder

[jira] [Updated] (TIKA-694) On extraction, get properties AND / OR content extraction

2012-01-04 Thread Etienne Jouvin (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Jouvin updated TIKA-694: Attachment: Tika-1.0.zip Hi. My comment was posted for 0.9 As I saw the version 1.0 and use

[jira] [Updated] (TIKA-694) On extraction, get properties AND / OR content extraction

2012-01-04 Thread Etienne Jouvin (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Jouvin updated TIKA-694: Affects Version/s: (was: 0.9) 1.0 On extraction, get properties

[jira] [Updated] (TIKA-695) Custom properties on xlsx, docx, pptx

2012-01-04 Thread Etienne Jouvin (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Jouvin updated TIKA-695: Affects Version/s: 1.0 I have just post a solution for that. See the issue TIKA-694: https

[jira] [Updated] (TIKA-837) Make inner classes static for performance reasons

2012-01-01 Thread Fabian Lange (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Lange updated TIKA-837: -- Attachment: static_classes_patch.diff Make inner classes static for performance reasons

[jira] [Updated] (TIKA-838) EmptyParser Singleton should be final

2012-01-01 Thread Fabian Lange (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Lange updated TIKA-838: -- Attachment: EmptyParser.java.patch EmptyParser Singleton should be final

[jira] [Updated] (TIKA-836) parsing really slow on some documents

2011-12-29 Thread Rob Tulloh (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Tulloh updated TIKA-836: Description: We are seeing that tika sometimes takes a very long time to parse some content (likely PDF

[jira] [Updated] (TIKA-833) POI Daily beta6 as of 12/27 breaks ExcelParserTest.testExcelParserFormatting()

2011-12-27 Thread Jeremy Anderson (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Anderson updated TIKA-833: - Description: Attn Nick: Changes made to POI(v1221126) for POI-52349 causes

[jira] [Updated] (TIKA-827) ForkServer fails to report issues if an exception is not properly serializable

2011-12-23 Thread Jerome Lacoste (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerome Lacoste updated TIKA-827: Attachment: 0002-TIKA-827-try-to-report-something-if-the-exception-is.patch This is a way to try

[jira] [Updated] (TIKA-827) ForkServer fails to report issues if an exception is not properly serializable

2011-12-23 Thread Jerome Lacoste (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerome Lacoste updated TIKA-827: Attachment: (was: 0002-TIKA-827-try-to-report-something-if-the-exception-is.patch

[jira] [Updated] (TIKA-827) ForkServer fails to report issues if an exception is not properly serializable

2011-12-23 Thread Jerome Lacoste (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerome Lacoste updated TIKA-827: Attachment: 0002-TIKA-827-try-to-report-something-if-the-exception-is.patch ForkServer fails

[jira] [Updated] (TIKA-831) ForkClient doesn't report error due to widening conversion issue

2011-12-23 Thread Jerome Lacoste (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerome Lacoste updated TIKA-831: Attachment: 0007-TIKA-831-fix-for-errors-not-being-reported-properly-.patch The fix

[jira] [Updated] (TIKA-832) ForkParser is unfriendly to code that prints things to its output

2011-12-23 Thread Jerome Lacoste (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerome Lacoste updated TIKA-832: Attachment: TIKA-832_ForkClient_wait_a_bit_when_asked_to_empty_the_initial_buffers.patch

[jira] [Updated] (TIKA-832) ForkParser is unfriendly to code that prints things to its output

2011-12-23 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-832: --- Issue Type: Improvement (was: Bug) bq. java command that causes java to write something to the output

[jira] [Updated] (TIKA-824) Extract rel attr with LinkContentHandler

2011-12-21 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated TIKA-824: --- Fix Version/s: 1.1 Affects Version/s: 1.1 1.0 Extract rel attr

[jira] [Updated] (TIKA-820) Locator is unset for HTML parser

2011-12-20 Thread Daniel Bonniot de Ruisselet (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Bonniot de Ruisselet updated TIKA-820: - Attachment: text-locator.patch Fix+test patch. Locator

[jira] [Updated] (TIKA-823) Detect StarOffice files

2011-12-20 Thread Antoni Mylka (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoni Mylka updated TIKA-823: -- Attachment: testStarOffice-5.2-write.sdw testStarOffice-5.2-impress.sdd

[jira] [Updated] (TIKA-822) MediaType fails to parse charset that has quoted value

2011-12-20 Thread peter royal (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] peter royal updated TIKA-822: - Comment: was deleted (was: the rfc for mime isn't clear on whether single quotes make a valid quoted

[jira] [Updated] (TIKA-682) Creative Suite formats are not supported

2011-12-19 Thread Adei Mandaluniz (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adei Mandaluniz updated TIKA-682: - Attachment: Untitled-1.indd Attaching an InDesign document with dummy metadata

[jira] [Updated] (TIKA-816) (XLS/XLSX) Improperly formatted date/time in text content.

2011-12-19 Thread Albert L. (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert L. updated TIKA-816: --- Description: Improperly formated text content for XLS and XLSX files. The date and time are not formatted

[jira] [Updated] (TIKA-813) Webarchive detection.

2011-12-14 Thread Antoni Mylka (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoni Mylka updated TIKA-813: -- Attachment: (was: tika-webarchive-detection.patch) Webarchive detection

[jira] [Updated] (TIKA-813) Webarchive detection.

2011-12-14 Thread Antoni Mylka (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoni Mylka updated TIKA-813: -- Attachment: testWEBARCHIVE.webarchive tika-813.patch A second version of the patch which

[jira] [Updated] (TIKA-812) Improve the detection of Works Spreadsheet 7.0 files

2011-12-14 Thread Antoni Mylka (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoni Mylka updated TIKA-812: -- Attachment: tika-812-ver2.patch A second version of the patch. Contains a magic pattern

[jira] [Updated] (TIKA-811) Upgrade metadatExtractor version for OpenJDK 7 support

2011-12-13 Thread Emmanuel Hugonnet (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Hugonnet updated TIKA-811: --- Attachment: metadata.diff Patch to fix the issue with upgrading to MetadataExtractor 2.5.0-RC3

[jira] [Updated] (TIKA-812) Improve the detection of Works Spreadsheet 7.0 files

2011-12-13 Thread Antoni Mylka (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoni Mylka updated TIKA-812: -- Attachment: tika-812.patch testWORKSSpreadsheet7.0.xlr Attached a test file and a patch

  1   2   >