[jira] [Assigned] (TIKA-731) NPE in WordExtractor.handleParagraph()

2011-09-26 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy reassigned TIKA-731: - Assignee: Maxim Valyanskiy NPE in WordExtractor.handleParagraph()

[jira] [Updated] (TIKA-708) NPE Parsing MS Word 12.0.0

2011-09-13 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-708: -- Comment: was deleted (was: This bug required additional commit to Tika, r1169702. ) NPE

[jira] [Created] (TIKA-693) Incorrent mime-type for .pptm, .ppsm and .ppsx in OOXMLParser

2011-08-22 Thread Maxim Valyanskiy (JIRA)
Incorrent mime-type for .pptm, .ppsm and .ppsx in OOXMLParser - Key: TIKA-693 URL: https://issues.apache.org/jira/browse/TIKA-693 Project: Tika Issue Type: Bug

[jira] [Resolved] (TIKA-693) Incorrect mime-type for .pptm, .ppsm and .ppsx in OOXMLParser

2011-08-22 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-693. --- Resolution: Fixed Committed revision 1160216. Incorrect mime-type for .pptm, .ppsm and

[jira] [Updated] (TIKA-693) Incorrect mime-type for .pptm, .ppsm and .ppsx in OOXMLParser

2011-08-22 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-693: -- Summary: Incorrect mime-type for .pptm, .ppsm and .ppsx in OOXMLParser (was: Incorrent

[jira] [Commented] (TIKA-593) Tika network server

2011-08-02 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13076146#comment-13076146 ] Maxim Valyanskiy commented on TIKA-593: --- I updated tika-server component. I replaced

[jira] [Resolved] (TIKA-434) Bug in TagSoup causes IOException

2011-07-04 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-434. --- Resolution: Fixed Fix Version/s: 1.0 Assignee: Maxim Valyanskiy Bug in

[jira] [Commented] (TIKA-671) Initial support for FB2 (fiction book document) format

2011-06-03 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043398#comment-13043398 ] Maxim Valyanskiy commented on TIKA-671: --- Added initial support - basic text extraction

[jira] [Updated] (TIKA-671) Support for FB2 (fiction book document) format

2011-06-03 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-671: -- Summary: Support for FB2 (fiction book document) format (was: Initial support for FB2 (fiction

[jira] [Commented] (TIKA-521) OutOfMemoryError Parsing XSLX File

2011-05-26 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039609#comment-13039609 ] Maxim Valyanskiy commented on TIKA-521: --- Tika from trunk with POI from trunk parses

[jira] [Commented] (TIKA-521) OutOfMemoryError Parsing XSLX File

2011-05-26 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039620#comment-13039620 ] Maxim Valyanskiy commented on TIKA-521: --- Sorry, I missed screenshot with stack trace.

[jira] [Resolved] (TIKA-662) Prevent creating of ZipInputStreamZipEntrySource when reading files from disk

2011-05-19 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-662. --- Resolution: Fixed Fix Version/s: 1.0 This issue almost duplicates TIKA-645. The rest

[jira] [Created] (TIKA-662) Prevent creating of ZipInputStreamZipEntrySource when reading files from disk

2011-05-18 Thread Maxim Valyanskiy (JIRA)
Prevent creating of ZipInputStreamZipEntrySource when reading files from disk - Key: TIKA-662 URL: https://issues.apache.org/jira/browse/TIKA-662 Project: Tika

[jira] [Updated] (TIKA-662) Prevent creating of ZipInputStreamZipEntrySource when reading files from disk

2011-05-18 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-662: -- Attachment: TIKA-662.patch patch Prevent creating of ZipInputStreamZipEntrySource when reading

[jira] [Commented] (TIKA-649) NPE while parsing a .docx

2011-04-28 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026140#comment-13026140 ] Maxim Valyanskiy commented on TIKA-649: --- It was already fixed in TIKA-633:

[jira] [Created] (TIKA-646) tika command line can't extract metadata for OOXML files

2011-04-22 Thread Maxim Valyanskiy (JIRA)
tika command line can't extract metadata for OOXML files Key: TIKA-646 URL: https://issues.apache.org/jira/browse/TIKA-646 Project: Tika Issue Type: Bug Reporter: Maxim

[jira] [Commented] (TIKA-637) Need API to get list of embedded documents

2011-04-11 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018239#comment-13018239 ] Maxim Valyanskiy commented on TIKA-637: --- tika cli app has option -z that extracts all

[jira] Commented: (TIKA-593) Tika network server

2011-02-09 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992540#comment-12992540 ] Maxim Valyanskiy commented on TIKA-593: --- I uploaded my implementation on GitHub for

[jira] Commented: (TIKA-593) Tika network server

2011-02-08 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12991838#comment-12991838 ] Maxim Valyanskiy commented on TIKA-593: --- I made HTTP-server with Jersey (JAX-RS) and

[jira] Commented: (TIKA-589) NPE with POI when parsing word docs

2011-01-28 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988044#action_12988044 ] Maxim Valyanskiy commented on TIKA-589: --- There is invalid style declaration in this

[jira] Commented: (TIKA-577) IndexOutOfBounds Exception looking for Picture in Word 03 doc that has no pictures

2011-01-21 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984637#action_12984637 ] Maxim Valyanskiy commented on TIKA-577: --- I see, now we have another exception:

[jira] Resolved: (TIKA-577) IndexOutOfBounds Exception looking for Picture in Word 03 doc that has no pictures

2011-01-12 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-577. --- Resolution: Fixed Fixed in POI, revision 1058176. IndexOutOfBounds Exception looking for

[jira] Commented: (TIKA-577) IndexOutOfBounds Exception looking for Picture in Word 03 doc that has no pictures

2010-12-22 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974156#action_12974156 ] Maxim Valyanskiy commented on TIKA-577: --- Can you reproduce this bug with latest POI

[jira] Updated: (TIKA-574) Support for IBM866 (CP866) encoding in TXTParser

2010-12-17 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-574: -- Attachment: TIKA-574.patch Thank you. I added unit-test for this issue Support for IBM866

[jira] Resolved: (TIKA-573) MimeType.getExtension()

2010-12-17 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-573. --- Resolution: Fixed Fix Version/s: 0.9 commited last patch at r1050340

[jira] Commented: (TIKA-574) Support for IBM866 (CP866) encoding in TXTParser

2010-12-17 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972445#action_12972445 ] Maxim Valyanskiy commented on TIKA-574: --- Thank you. Commited in r1050348 Support for

[jira] Commented: (TIKA-573) MimeType.getExtension() and mailcap's mime.types

2010-12-16 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971994#action_12971994 ] Maxim Valyanskiy commented on TIKA-573: --- I'm not sure that this patch fits correctly

[jira] Updated: (TIKA-573) MimeType.getExtension() and mailcap's mime.types

2010-12-16 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-573: -- Attachment: 0001-TIKA-573-add-MimeType.getExtension.patch new version of patch

[jira] Commented: (TIKA-573) MimeType.getExtension() and mailcap's mime.types

2010-12-16 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972156#action_12972156 ] Maxim Valyanskiy commented on TIKA-573: --- Thank you, Jukka. I found that file extensions

[jira] Created: (TIKA-573) MimeType.getExtension() and mailcap's mime.types

2010-12-15 Thread Maxim Valyanskiy (JIRA)
MimeType.getExtension() and mailcap's mime.types Key: TIKA-573 URL: https://issues.apache.org/jira/browse/TIKA-573 Project: Tika Issue Type: Improvement Components: mime

[jira] Commented: (TIKA-551) Unit test failures in org.apache.tika.parser.image.ImageParserTest on JDK 1.6.0_05

2010-11-13 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931650#action_12931650 ] Maxim Valyanskiy commented on TIKA-551: --- We use 1.6.0_05 on Linux. I think it is time

[jira] Updated: (TIKA-550) Add stable filenames for extracted embedded files from Office binaries

2010-11-12 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-550: -- Attachment: 1 Add stable filenames for extracted embedded files from Office binaries

[jira] Created: (TIKA-550) Add stable filenames for extracted embedded files from Office binaries

2010-11-12 Thread Maxim Valyanskiy (JIRA)
Add stable filenames for extracted embedded files from Office binaries -- Key: TIKA-550 URL: https://issues.apache.org/jira/browse/TIKA-550 Project: Tika Issue Type:

[jira] Resolved: (TIKA-550) Add stable filenames for extracted embedded files from Office binaries

2010-11-12 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-550. --- Resolution: Fixed Committed revision 1034373. Add stable filenames for extracted embedded

[jira] Created: (TIKA-551) Unit test failures in org.apache.tika.parser.image.ImageParserTest on JDK 1.6.0_05

2010-11-12 Thread Maxim Valyanskiy (JIRA)
Unit test failures in org.apache.tika.parser.image.ImageParserTest on JDK 1.6.0_05 -- Key: TIKA-551 URL: https://issues.apache.org/jira/browse/TIKA-551 Project: Tika

[jira] Resolved: (TIKA-511) NPE when POI is configured to prefer event extractors

2010-11-09 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-511. --- Resolution: Fixed Committed revision 1032925. NPE when POI is configured to prefer event

[jira] Commented: (TIKA-540) extract text from .docx footnotes

2010-11-05 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928580#action_12928580 ] Maxim Valyanskiy commented on TIKA-540: --- applied in r1031545 extract text from .docx

[jira] Resolved: (TIKA-540) extract text from .docx footnotes

2010-11-05 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-540. --- Resolution: Fixed Fix Version/s: 0.8 extract text from .docx footnotes

[jira] Created: (TIKA-540) extract text from .docx footnotes

2010-10-27 Thread Maxim Valyanskiy (JIRA)
extract text from .docx footnotes - Key: TIKA-540 URL: https://issues.apache.org/jira/browse/TIKA-540 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 0.8

[jira] Updated: (TIKA-540) extract text from .docx footnotes

2010-10-27 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-540: -- Attachment: footnotes.docx 1 patch and test data

[jira] Commented: (TIKA-521) OutOfMemoryError Parsing XSLX File

2010-10-07 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918932#action_12918932 ] Maxim Valyanskiy commented on TIKA-521: --- If a plain text is enough for you, you can

[jira] Created: (TIKA-510) Use POI API for text extraction from XSLF shape

2010-09-09 Thread Maxim Valyanskiy (JIRA)
Use POI API for text extraction from XSLF shape --- Key: TIKA-510 URL: https://issues.apache.org/jira/browse/TIKA-510 Project: Tika Issue Type: Improvement Components: parser

[jira] Updated: (TIKA-511) NPE when POI is configured to prefer event extractors

2010-09-09 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-511: -- Attachment: event.patch patch NPE when POI is configured to prefer event extractors

[jira] Updated: (TIKA-437) OfficeParser: support for write-protected xlsx files

2010-06-07 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-437: -- Attachment: patch protect.xlsx OfficeParser: support for write-protected xlsx