[jira] [Closed] (TIKA-812) Improve the detection of Works Spreadsheet 7.0 files

2011-12-19 Thread Antoni Mylka (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoni Mylka closed TIKA-812. - Resolution: Fixed Fix Version/s: 1.1 Committed tika-812-ver2.patch in r1220687. > I

[jira] [Closed] (TIKA-813) Webarchive detection.

2011-12-19 Thread Antoni Mylka (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoni Mylka closed TIKA-813. - Resolution: Fixed Fix Version/s: 1.1 Committed the magics and the unit tests in t1220696. Thanks for

[jira] [Closed] (TIKA-814) Increase the amount of bytes read by TextDetector

2011-12-19 Thread Antoni Mylka (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoni Mylka closed TIKA-814. - Resolution: Fixed Fix Version/s: 1.1 Committed in r1220698. This is a change, which theoretically i

[jira] [Commented] (TIKA-291) Adobe InDesign support

2011-12-19 Thread Adei Mandaluniz (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172220#comment-13172220 ] Adei Mandaluniz commented on TIKA-291: -- Not just InDesign but other Adobe products as w

[jira] [Updated] (TIKA-682) Creative Suite formats are not supported

2011-12-19 Thread Adei Mandaluniz (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adei Mandaluniz updated TIKA-682: - Attachment: Untitled-1.indd Attaching an InDesign document with dummy metadata. >

[jira] [Commented] (TIKA-810) Upgrade to PDFbox 1.7.0 as available

2011-12-19 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172335#comment-13172335 ] Jukka Zitting commented on TIKA-810: In revision 1220781 I updated the parser code in PD

[jira] [Created] (TIKA-816) (XLS/XLSX) Missing date/time in text content.

2011-12-19 Thread Albert L. (Created) (JIRA)
(XLS/XLSX) Missing date/time in text content. - Key: TIKA-816 URL: https://issues.apache.org/jira/browse/TIKA-816 Project: Tika Issue Type: Bug Components: general Affects Versions: 1

[jira] [Updated] (TIKA-816) (XLS/XLSX) Improperly formatted date/time in text content.

2011-12-19 Thread Albert L. (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert L. updated TIKA-816: --- Description: Improperly formated text content for XLS and XLSX files. The date and time are not formatted as d

[jira] [Created] (TIKA-817) (PPT/PPTX) Missing date/time in text content.

2011-12-19 Thread Albert L. (Created) (JIRA)
(PPT/PPTX) Missing date/time in text content. - Key: TIKA-817 URL: https://issues.apache.org/jira/browse/TIKA-817 Project: Tika Issue Type: Bug Components: general Affects Versions: 1

[jira] [Commented] (TIKA-817) (PPT/PPTX) Missing date/time in text content.

2011-12-19 Thread Albert L. (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172371#comment-13172371 ] Albert L. commented on TIKA-817: I wonder if "update automatically" Date/Time objects don't

[jira] [Commented] (TIKA-816) (XLS/XLSX) Improperly formatted date/time in text content.

2011-12-19 Thread Albert L. (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172469#comment-13172469 ] Albert L. commented on TIKA-816: XLS files seem to work when calling text extraction via HSS

[jira] [Commented] (TIKA-816) (XLS/XLSX) Improperly formatted date/time in text content.

2011-12-19 Thread Albert L. (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172470#comment-13172470 ] Albert L. commented on TIKA-816: Bug 52369 - XLSX: text extraction malformed "=NOW()" and "=

[jira] [Created] (TIKA-818) Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer to allow for a memory vs performance tradeoff

2011-12-19 Thread Paul Pearcy (Created) (JIRA)
Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer to allow for a memory vs performance tradeoff - Key: TIKA-818 URL: https://issues.a

[jira] [Commented] (TIKA-817) (PPT/PPTX) Missing date/time in text content.

2011-12-19 Thread Albert L. (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172472#comment-13172472 ] Albert L. commented on TIKA-817: Reported the bug in POI v3.8 beta 5. Bug 52367 - PPT: text

[jira] [Created] (TIKA-819) Make Option to Exclude Embedded Files' Text for Text Content

2011-12-19 Thread Albert L. (Created) (JIRA)
Make Option to Exclude Embedded Files' Text for Text Content Key: TIKA-819 URL: https://issues.apache.org/jira/browse/TIKA-819 Project: Tika Issue Type: New Feature Compo

[jira] [Commented] (TIKA-291) Adobe InDesign support

2011-12-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172773#comment-13172773 ] Nick Burch commented on TIKA-291: - I believe Photoshop PSDs are supported, as are Illustrato

[jira] [Commented] (TIKA-682) Creative Suite formats are not supported

2011-12-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172775#comment-13172775 ] Nick Burch commented on TIKA-682: - Adei - any chance you could create a much smaller sample

[jira] [Commented] (TIKA-819) Make Option to Exclude Embedded Files' Text for Text Content

2011-12-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172825#comment-13172825 ] Nick Burch commented on TIKA-819: - You have to explicitly ask for embedded files to be parse

[jira] [Commented] (TIKA-700) Upgrade to POI 3.8 as available

2011-12-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172950#comment-13172950 ] Nick Burch commented on TIKA-700: - Upgraded to POI 3.8 beta 5 in r1221109.

[jira] [Commented] (TIKA-805) improvements in XSLFPowerPointExtractorDecorator

2011-12-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172951#comment-13172951 ] Nick Burch commented on TIKA-805: - The patch doesn't seem to apply cleanly against trunk, is

[jira] [Commented] (TIKA-757) Address TODOs when we upgrade to next POI release (3.8 beta 5)

2011-12-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172974#comment-13172974 ] Nick Burch commented on TIKA-757: - I believe that as of r1221115 most of these are now tackl

[jira] [Commented] (TIKA-705) Valid OOXML PPT file hits InvalidFormatException thrown in POI

2011-12-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172973#comment-13172973 ] Nick Burch commented on TIKA-705: - Code simplified in r1221115 now that we've upgraded POI

[jira] [Resolved] (TIKA-423) Parse docx and output to text file missing words

2011-12-19 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-423. - Resolution: Fixed Fix Version/s: 1.1 Having upgraded POI, as of r1221115 the smart tags text is now

[jira] [Commented] (TIKA-423) Parse docx and output to text file missing words

2011-12-19 Thread David Tran (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172976#comment-13172976 ] David Tran commented on TIKA-423: - I am out of the office until January 16th, please contact

[jira] [Commented] (TIKA-816) (XLS/XLSX) Improperly formatted date/time in text content.

2011-12-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172985#comment-13172985 ] Nick Burch commented on TIKA-816: - Now that POI bug #52369 is fixed, we should get the XLSX

[jira] [Commented] (TIKA-818) Allow PDFBox to be used with RandomAccessFile vs RandomAccessBuffer to allow for a memory vs performance tradeoff

2011-12-19 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173000#comment-13173000 ] Nick Burch commented on TIKA-818: - I've just gone to make the change, and discovered that ev