[jira] [Updated] (TIKA-740) SAX parser used for HTML

2011-10-05 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-740: --- Attachment: a221657.html I attached a copy of the page served a the referenced URL http://www.almasry-

[jira] [Updated] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2011-10-05 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-741: --- Affects Version/s: (was: 1.0) 0.10 Issue Type: Bug (was: New Feat

[jira] [Updated] (TIKA-605) Tika GDAL parser

2011-10-05 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-605: --- Attachment: 0001-TIKA-605-Tika-GDAL-parser.patch I guess ideally we should ask the GDAL toolkit to supp

[jira] [Updated] (TIKA-423) Parse docx and output to text file missing words

2011-10-07 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-423: --- Affects Version/s: 0.8 0.9 0.10 This is still a problem w

[jira] [Updated] (TIKA-410) textbox content extaction for word documents

2011-10-07 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-410: --- Affects Version/s: 0.10 This is still an issue with Tika 0.10 and the latest trunk. >

[jira] [Updated] (TIKA-764) OpenDocumentMetaParser should use common metadata keys for document statistics

2011-11-02 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-764: --- Resolving for 1.0 as suggested by Nick on dev@: {quote} We should maybe split it and resolve the first par

[jira] [Updated] (TIKA-773) .NET version of Tika

2011-11-06 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-773: --- Description: As a followup to TIKA-212 and inspired by efforts like [1], I'd like to set up a .NET ver

[jira] [Updated] (TIKA-832) ForkParser is unfriendly to code that prints things to its output

2011-12-23 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-832: --- Issue Type: Improvement (was: Bug) bq. java command that causes java to write something to the output

[jira] [Updated] (TIKA-866) Invalid configuration file causes OutOfMemoryException

2012-02-17 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-866: --- Summary: Invalid configuration file causes OutOfMemoryException (was: Incomplete configuration file ca

[jira] [Updated] (TIKA-864) Metadata.formatDate causes blocking in concurrent use

2012-02-17 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-864: --- Summary: Metadata.formatDate causes blocking in concurrent use (was: Metadata.formatDate should use Th