Re: PUT vs. POST in tika-server

2012-04-05 Thread Maxim Valyanskiy
Hello! 05.04.2012 12:29, Jukka Zitting написал: I notice the tika-server component (nice work documenting and setting it up, btw!) uses the PUT verb for receiving documents to be parsed. IMO a more appropriate verb to use is POST, that's meant (among other things) for: "Providing a block o

[jira] [Commented] (TIKA-593) Tika network server

2012-04-05 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247089#comment-13247089 ] Maxim Valyanskiy commented on TIKA-593: --- I updated documentation in

[jira] [Resolved] (TIKA-593) Tika network server

2012-04-04 Thread Maxim Valyanskiy (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-593. --- Resolution: Fixed added shade plugin, now jar can run > Tika network ser

[jira] [Commented] (TIKA-593) Tika network server

2012-03-29 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241273#comment-13241273 ] Maxim Valyanskiy commented on TIKA-593: --- I we have another problem with Tika se

[jira] [Issue Comment Edited] (TIKA-593) Tika network server

2012-03-29 Thread Maxim Valyanskiy (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241266#comment-13241266 ] Maxim Valyanskiy edited comment on TIKA-593 at 3/29/12 2:3

[jira] [Commented] (TIKA-593) Tika network server

2012-03-29 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241266#comment-13241266 ] Maxim Valyanskiy commented on TIKA-593: --- > FYI I do not understand how

[jira] [Commented] (TIKA-593) Tika network server

2012-03-29 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241244#comment-13241244 ] Maxim Valyanskiy commented on TIKA-593: --- {quote} The cool part is that we redu

[jira] [Commented] (TIKA-593) Tika network server

2012-03-29 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241230#comment-13241230 ] Maxim Valyanskiy commented on TIKA-593: --- I do not completely understand

[jira] [Commented] (TIKA-593) Tika network server

2012-03-27 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239686#comment-13239686 ] Maxim Valyanskiy commented on TIKA-593: --- Chris, there is two providers in my

Re: Build failed in Jenkins: Tika-trunk #820

2012-03-26 Thread Maxim Valyanskiy
Hello! Can we use Java 6, maybe just for tika-server component? Current version of Jersey requires it. best wishes, Max 26.03.2012, в 17:30, Apache Jenkins Server написал(а): > See > > Changes: > > [maxcom] tika-server: update java versi

[jira] [Commented] (TIKA-593) Tika network server

2012-03-26 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238268#comment-13238268 ] Maxim Valyanskiy commented on TIKA-593: --- Chris, that is not enough just to ch

[jira] [Resolved] (TIKA-883) Extract embedded images in PPT

2012-03-23 Thread Maxim Valyanskiy (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-883. --- Resolution: Fixed > Extract embedded images in

[jira] [Created] (TIKA-883) Extract embedded images in PPT

2012-03-23 Thread Maxim Valyanskiy (Created) (JIRA)
Extract embedded images in PPT -- Key: TIKA-883 URL: https://issues.apache.org/jira/browse/TIKA-883 Project: Tika Issue Type: Improvement Reporter: Maxim Valyanskiy Assignee: Maxim

[jira] [Commented] (TIKA-593) Tika network server

2012-03-23 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236439#comment-13236439 ] Maxim Valyanskiy commented on TIKA-593: --- {noformat} testExe

[jira] [Issue Comment Edited] (TIKA-593) Tika network server

2012-03-23 Thread Maxim Valyanskiy (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236439#comment-13236439 ] Maxim Valyanskiy edited comment on TIKA-593 at 3/23/12 7:5

[jira] [Resolved] (TIKA-882) IllegalArgumentException: No part found for relationship

2012-03-22 Thread Maxim Valyanskiy (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-882. --- Resolution: Fixed Assignee: Maxim Valyanskiy > IllegalArgumentException: No p

[jira] [Created] (TIKA-882) IllegalArgumentException: No part found for relationship

2012-03-22 Thread Maxim Valyanskiy (Created) (JIRA)
Affects Versions: 1.1 Reporter: Maxim Valyanskiy Priority: Minor Fix For: 1.2 Exception on parsing XLSX file: {noformat} Exception in thread "main" org.apache.tika.exception.TikaException: Error creating OOXML extractor

[jira] [Commented] (TIKA-593) Tika network server

2012-03-22 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235560#comment-13235560 ] Maxim Valyanskiy commented on TIKA-593: --- I found that Jersey dependencies ar

Re: Pluggable language detection

2012-03-22 Thread Maxim Valyanskiy
Hello! 21.03.2012 19:51, Julien Nioche пишет: Just wondering about the best way to make the language detection pluggable instead of having it hard-wired as it is now. We now that the resources that are currently in Tika are both slow and inaccurate [1] and there are other libraries that we could

[jira] [Resolved] (TIKA-873) Tika --extract fails for DOC

2012-03-21 Thread Maxim Valyanskiy (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-873. --- Resolution: Fixed > Tika --extract fails for

[jira] [Commented] (TIKA-873) Tika --extract fails for DOC

2012-03-21 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234310#comment-13234310 ] Maxim Valyanskiy commented on TIKA-873: --- hm, 1.0 extracts something that is not v

[jira] [Commented] (TIKA-873) Tika --extract fails for DOC

2012-03-21 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234294#comment-13234294 ] Maxim Valyanskiy commented on TIKA-873: --- Current trunk version extracts follo

[jira] [Resolved] (TIKA-877) Embedded document not extracted (regression)

2012-03-21 Thread Maxim Valyanskiy (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-877. --- Resolution: Fixed Fix Version/s: (was: 1.1) 1.2 > Embed

[jira] [Commented] (TIKA-877) Embedded document not extracted (regression)

2012-03-21 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234291#comment-13234291 ] Maxim Valyanskiy commented on TIKA-877: --- I think it is not a real problem, bec

[jira] [Commented] (TIKA-877) Embedded document not extracted (regression)

2012-03-21 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234282#comment-13234282 ] Maxim Valyanskiy commented on TIKA-877: --- Hm, no empty files, but file5 siz

[jira] [Commented] (TIKA-877) Embedded document not extracted (regression)

2012-03-21 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234280#comment-13234280 ] Maxim Valyanskiy commented on TIKA-877: --- {noformat} [maxcom@pc-elrond t]$ java

[jira] [Commented] (TIKA-877) Embedded document not extracted (regression)

2012-03-21 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234276#comment-13234276 ] Maxim Valyanskiy commented on TIKA-877: --- It became the same problem after commit

[jira] [Commented] (TIKA-877) Embedded document not extracted (regression)

2012-03-21 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234228#comment-13234228 ] Maxim Valyanskiy commented on TIKA-877: --- I'm no sure about 'file5&#x

[jira] [Commented] (TIKA-877) Embedded document not extracted (regression)

2012-03-21 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234198#comment-13234198 ] Maxim Valyanskiy commented on TIKA-877: --- Hm, I found this problem in my tika-se

[jira] [Assigned] (TIKA-877) Embedded document not extracted (regression)

2012-03-21 Thread Maxim Valyanskiy (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy reassigned TIKA-877: - Assignee: Maxim Valyanskiy > Embedded document not extracted (regress

[jira] [Resolved] (TIKA-854) No text extraction for Word macroenabled template

2012-01-31 Thread Maxim Valyanskiy (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-854. --- Resolution: Fixed Fix Version/s: 1.1 > No text extraction for Word macroenab

[jira] [Updated] (TIKA-854) No text extraction for Word macroenabled template

2012-01-31 Thread Maxim Valyanskiy (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-854: -- Summary: No text extraction for Word macroenabled template (was: No text extraction Word

[jira] [Updated] (TIKA-854) No text extraction Word macroenabled template

2012-01-31 Thread Maxim Valyanskiy (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-854: -- Attachment: cat50.dotm test data > No text extraction Word macroenab

[jira] [Created] (TIKA-854) No text extraction Word macroenabled template

2012-01-31 Thread Maxim Valyanskiy (Created) (JIRA)
: Maxim Valyanskiy Assignee: Maxim Valyanskiy POI cat extract text from this file, but Tika does not. Mimetype detected by Tika, "application/vnd.ms-word.template.macroenabledtemplate" is not correct too (I think that right type is "application/vnd.ms-word.template.

[jira] [Resolved] (TIKA-787) CharsetDetector text buffer is too small to small to correctly detect UTF-8 in HTML page

2011-11-23 Thread Maxim Valyanskiy (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-787. --- Resolution: Fixed > CharsetDetector text buffer is too small to small to correctly det

[jira] [Created] (TIKA-787) CharsetDetector text buffer is too small to small to correctly detect UTF-8 in HTML page

2011-11-23 Thread Maxim Valyanskiy (Created) (JIRA)
Project: Tika Issue Type: Bug Reporter: Maxim Valyanskiy Assignee: Maxim Valyanskiy Priority: Minor Fix For: 1.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators

Re: [VOTE] Apache Tika 0.10 release rc #1

2011-09-26 Thread Maxim Valyanskiy
+1 26.09.2011 10:50, Mattmann, Chris A (388J) пишет: Hi Folks, A first release candidate for the Tika 0.10 release is available at: http://people.apache.org/~mattmann/apache-tika-0.10/rc1/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tik

[jira] [Resolved] (TIKA-731) NPE in WordExtractor.handleParagraph()

2011-09-26 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-731. --- Resolution: Fixed Fix Version/s: 1.0 I think that we can't use attached documen

[jira] [Assigned] (TIKA-731) NPE in WordExtractor.handleParagraph()

2011-09-26 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy reassigned TIKA-731: - Assignee: Maxim Valyanskiy > NPE in WordExtractor.handleParagr

[jira] [Resolved] (TIKA-726) Provide a way to distinguish generic parse error and parse error due to unknown/wrong decryption key

2011-09-21 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-726. --- Resolution: Fixed > Provide a way to distinguish generic parse error and parse error due

[jira] [Created] (TIKA-726) Provide a way to distinguish generic parse error and parse error due to unknown/wrong decryption key

2011-09-21 Thread Maxim Valyanskiy (JIRA)
/TIKA-726 Project: Tika Issue Type: Improvement Reporter: Maxim Valyanskiy Assignee: Maxim Valyanskiy Fix For: 0.10 Currently there is no way to distinguish generic parse failure (i.e. error in parser) and situation when extraction can be

[jira] [Updated] (TIKA-708) NPE Parsing MS Word 12.0.0

2011-09-13 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-708: -- Comment: was deleted (was: This bug required additional commit to Tika, r1169702. ) >

[jira] [Commented] (TIKA-708) NPE Parsing MS Word 12.0.0

2011-09-12 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102716#comment-13102716 ] Maxim Valyanskiy commented on TIKA-708: --- This bug required additional commit to

Re: svn commit: r1165230 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/microsoft/ooxml/ test/java/org/apache/tika/parser/microsoft/ test/resources/test-documents/

2011-09-05 Thread Maxim Valyanskiy
Hello! 05.09.2011, в 16:23, Jukka Zitting написал(а): > That was me in revision 1164578 for TIKA-704. :-( > >> -if (root.hasEntry("CONTENTS")) { >> -stream = TikaInputStream.get( >> -fs.createDocumentInputStream("CONTENTS")); > > This was my a

[jira] [Updated] (TIKA-693) Incorrect mime-type for .pptm, .ppsm and .ppsx in OOXMLParser

2011-08-22 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-693: -- Summary: Incorrect mime-type for .pptm, .ppsm and .ppsx in OOXMLParser (was: Incorrent mime

[jira] [Resolved] (TIKA-693) Incorrect mime-type for .pptm, .ppsm and .ppsx in OOXMLParser

2011-08-22 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-693. --- Resolution: Fixed Committed revision 1160216. > Incorrect mime-type for .pptm, .ppsm

[jira] [Created] (TIKA-693) Incorrent mime-type for .pptm, .ppsm and .ppsx in OOXMLParser

2011-08-22 Thread Maxim Valyanskiy (JIRA)
: parser Reporter: Maxim Valyanskiy Assignee: Maxim Valyanskiy Priority: Minor Fix For: 1.0 Current parser set mime type "application/vnd.openxmlformats-officedocument.presentationml.presentation" for all PowerPoint XML formats, but pptm

Re: svn commit: r1153097 - in /tika/trunk/tika-server: ./ src/main/java/org/apache/tika/server/ src/main/resources/ src/test/java/org/apache/tika/server/

2011-08-02 Thread Maxim Valyanskiy
Hello! 02.08.2011 16:00, Mattmann, Chris A (388J) пишет: Hey Max, Let's replace it with CXF like I suggested -- it works fine. I'll take a pass at it (we used Apache CXF to replace Jersey in Apache OODT). What do you think? Let's try. I have no experience with CXF. Maybe you can port tika-se

[jira] [Commented] (TIKA-593) Tika network server

2011-08-02 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076146#comment-13076146 ] Maxim Valyanskiy commented on TIKA-593: --- I updated tika-server component. I repl

[jira] [Resolved] (TIKA-434) Bug in TagSoup causes IOException

2011-07-04 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-434. --- Resolution: Fixed Fix Version/s: 1.0 Assignee: Maxim Valyanskiy > Bug

[jira] [Commented] (TIKA-677) Installing Tika 0.9 using Maven fails tests

2011-06-20 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051840#comment-13051840 ] Maxim Valyanskiy commented on TIKA-677: --- I think it is a problem in ImageIO of

Re: [Aperture-devel] TikaMimeTypeIdentifier in Aperture

2011-06-14 Thread Maxim Valyanskiy
Hello! 14.06.2011 17:53, Antoni Mylka пишет: Doesn't the "we'll need to buffer the whole file for zip anyway" boil down to the question of using the commons-compress ZipFile vs. ZipArchiveInputStream? I know that in a general case the zip file format isn't well suited for streaming processing,

[jira] [Updated] (TIKA-671) Support for FB2 (fiction book document) format

2011-06-03 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-671: -- Summary: Support for FB2 (fiction book document) format (was: Initial support for FB2 (fiction

[jira] [Commented] (TIKA-671) Initial support for FB2 (fiction book document) format

2011-06-03 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043398#comment-13043398 ] Maxim Valyanskiy commented on TIKA-671: --- Added initial support - basic

[jira] [Updated] (TIKA-671) Initial support for FB2 (fiction book document) format

2011-06-03 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-671: -- Description: Add support for FB2 format (https://secure.wikimedia.org/wikipedia/en/wiki

[jira] [Created] (TIKA-671) Initial support for FB2 (fiction book document) format

2011-06-03 Thread Maxim Valyanskiy (JIRA)
: Maxim Valyanskiy Assignee: Maxim Valyanskiy Add support for FB2 format -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Image Extraction

2011-06-01 Thread Maxim Valyanskiy
Hello! 31.05.2011 20:01, sgraessle пишет: 2. I need to be able to extract the images within the parsed documents and saved them as well. Would the best place to do this be to create my own ImageParser and add a few lines in the Parse method? Tika command line application with '-z' switch can ex

[jira] [Commented] (TIKA-521) OutOfMemoryError Parsing XSLX File

2011-05-26 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039620#comment-13039620 ] Maxim Valyanskiy commented on TIKA-521: --- Sorry, I missed screenshot with stack t

[jira] [Commented] (TIKA-521) OutOfMemoryError Parsing XSLX File

2011-05-26 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039609#comment-13039609 ] Maxim Valyanskiy commented on TIKA-521: --- Tika from trunk with POI from trunk pa

[jira] [Resolved] (TIKA-662) Prevent creating of ZipInputStreamZipEntrySource when reading files from disk

2011-05-19 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-662. --- Resolution: Fixed Fix Version/s: 1.0 This issue almost duplicates TIKA-645. The rest

[jira] [Updated] (TIKA-662) Prevent creating of ZipInputStreamZipEntrySource when reading files from disk

2011-05-18 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-662: -- Attachment: TIKA-662.patch patch > Prevent creating of ZipInputStreamZipEntrySource w

[jira] [Created] (TIKA-662) Prevent creating of ZipInputStreamZipEntrySource when reading files from disk

2011-05-18 Thread Maxim Valyanskiy (JIRA)
Issue Type: Improvement Reporter: Maxim Valyanskiy POI provides two ways to open OPCPackage - via InputStream and via File. Creating OPCPackage from InputStream casuses creation of ZipInputStreamZipEntrySource, that buffers all uncompressed data in memory. This takes a lot of memory and

commons-httpclient dependency

2011-05-13 Thread Maxim Valyanskiy
Hello! Commons-httpclient is declared dependency in tika-parsers/pom.xml. I found that it is not required for building Tika and running unit tests. Maybe anyone knows why it is there? I'm going to delete it if there is no objections. best wishes, Max

[jira] [Commented] (TIKA-649) NPE while parsing a .docx

2011-04-27 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026140#comment-13026140 ] Maxim Valyanskiy commented on TIKA-649: --- It was already fixed in TIKA-633: h

[jira] [Created] (TIKA-646) tika command line can't extract metadata for OOXML files

2011-04-22 Thread Maxim Valyanskiy (JIRA)
tika command line can't extract metadata for OOXML files Key: TIKA-646 URL: https://issues.apache.org/jira/browse/TIKA-646 Project: Tika Issue Type: Bug Reporter:

[jira] [Commented] (TIKA-637) Need API to get list of embedded documents

2011-04-11 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018239#comment-13018239 ] Maxim Valyanskiy commented on TIKA-637: --- tika cli app has option "-z"

[jira] [Commented] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app

2011-04-07 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017311#comment-13017311 ] Maxim Valyanskiy commented on TIKA-636: --- It is known problem in POI, afaik ther

Re: Build failed in Jenkins: Tika-trunk #507

2011-04-06 Thread Maxim Valyanskiy
Hello! Can anyone restart Hudson build? best wishes, Max 05.04.2011 16:00, Apache Hudson Server пишет: See -- Started by an SCM change Building remotely on ubuntu2 hudson.util.IOException2: remote fi

[jira] [Resolved] (TIKA-633) NPE in XWPFWordExtractorDecorator.extractHeaders

2011-04-05 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-633. --- Resolution: Fixed Fix Version/s: 1.0 Assignee: Maxim Valyanskiy Fixed. Thanx

[jira] Resolved: (TIKA-593) Tika network server

2011-02-24 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-593. --- Resolution: Fixed Fix Version/s: 1.0 Assignee: Maxim Valyanskiy (was: Jukka

Re: [VOTE] Apache Tika 0.9 Release Candidate #1

2011-02-15 Thread Maxim Valyanskiy
+1 14.02.2011 08:09, Mattmann, Chris A (388J) пишет: Hi Folks, I have posted a candidate for the Apache Tika 0.9 release. The source code is at: http://people.apache.org/~mattmann/apache-tika-0.9/rc1/ See the included CHANGES.txt file for details on release contents and latest changes. The re

[jira] Commented: (TIKA-593) Tika network server

2011-02-09 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992540#comment-12992540 ] Maxim Valyanskiy commented on TIKA-593: --- I uploaded my implementation on GitHub

[jira] Commented: (TIKA-593) Tika network server

2011-02-08 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991838#comment-12991838 ] Maxim Valyanskiy commented on TIKA-593: --- I made HTTP-server with Jersey (JAX-RS)

[jira] Commented: (TIKA-589) NPE with POI when parsing word docs

2011-01-28 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988044#action_12988044 ] Maxim Valyanskiy commented on TIKA-589: --- There is invalid style declaration in

[jira] Commented: (TIKA-577) IndexOutOfBounds Exception looking for Picture in Word 03 doc that has no pictures

2011-01-21 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984637#action_12984637 ] Maxim Valyanskiy commented on TIKA-577: --- I see, now we have another excep

[jira] Commented: (TIKA-577) IndexOutOfBounds Exception looking for Picture in Word 03 doc that has no pictures

2011-01-17 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982770#action_12982770 ] Maxim Valyanskiy commented on TIKA-577: --- Dennis, that is strange. Are you sure

[jira] Resolved: (TIKA-577) IndexOutOfBounds Exception looking for Picture in Word 03 doc that has no pictures

2011-01-12 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-577. --- Resolution: Fixed Fixed in POI, revision 1058176. > IndexOutOfBounds Exception looking

[jira] Commented: (TIKA-577) IndexOutOfBounds Exception looking for Picture in Word 03 doc that has no pictures

2010-12-22 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974156#action_12974156 ] Maxim Valyanskiy commented on TIKA-577: --- Can you reproduce this bug with latest

Jira permissions

2010-12-20 Thread Maxim Valyanskiy
Hello! It seems that I do not have permission to resolve issues in Tika's Jira. Can anyone fix that? My Jira login is "maxim.valyanskiy" best wishes, Max

[jira] Commented: (TIKA-574) Support for IBM866 (CP866) encoding in TXTParser

2010-12-17 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972445#action_12972445 ] Maxim Valyanskiy commented on TIKA-574: --- Thank you. Commited in r1050348 >

[jira] Resolved: (TIKA-573) MimeType.getExtension()

2010-12-17 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-573. --- Resolution: Fixed Fix Version/s: 0.9 commited last patch at r1050340

[jira] Updated: (TIKA-573) MimeType.getExtension()

2010-12-17 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-573: -- Summary: MimeType.getExtension() (was: MimeType.getExtension() and mailcap's mime.

[jira] Updated: (TIKA-574) Support for IBM866 (CP866) encoding in TXTParser

2010-12-17 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-574: -- Attachment: TIKA-574.patch Thank you. I added unit-test for this issue > Support for IBM

[jira] Commented: (TIKA-573) MimeType.getExtension() and mailcap's mime.types

2010-12-16 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972156#action_12972156 ] Maxim Valyanskiy commented on TIKA-573: --- Thank you, Jukka. I found that

[jira] Updated: (TIKA-573) MimeType.getExtension() and mailcap's mime.types

2010-12-16 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-573: -- Attachment: 0001-TIKA-573-add-MimeType.getExtension.patch new version of patch

[jira] Commented: (TIKA-573) MimeType.getExtension() and mailcap's mime.types

2010-12-16 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971994#action_12971994 ] Maxim Valyanskiy commented on TIKA-573: --- I'm not sure that this patch fits

[jira] Updated: (TIKA-573) MimeType.getExtension() and mailcap's mime.types

2010-12-15 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-573: -- Attachment: TIKA-573.patch patch > MimeType.getExtension() and mailcap's mi

[jira] Created: (TIKA-573) MimeType.getExtension() and mailcap's mime.types

2010-12-15 Thread Maxim Valyanskiy (JIRA)
mime Reporter: Maxim Valyanskiy This patch adds getExtension() method to MimeType and support for reading mime-types from mime.types format. I added mime.types file from Fedora Linux, license says that it is public domain file: === Red Hat disclaims any copyright on the "mailcap"

Re: TikaMimeTypeIdentifier in Aperture

2010-12-13 Thread Maxim Valyanskiy
Hello! 03.12.2010 03:07, Antoni Mylka пишет: - getExtensionsFor(String mimeType), useful in many apps, in tika the the mime knowledge base is hidden in private fields and package-protected classes I think that getExtension() for mime-type method is good idea. It is useful for creating file

Re: buildbot failure in ASF Buildbot on tika-trunk

2010-11-13 Thread Maxim Valyanskiy
Hello! Hm, build timed out on artifact download stage... best wishes, Max 13.11.2010 13:50, build...@apache.org пишет: The Buildbot has detected a new failure of tika-trunk on ASF Buildbot. Full details are available at: http://ci.apache.org/builders/tika-trunk/builds/200 Buildbot URL: http

[jira] Commented: (TIKA-551) Unit test failures in org.apache.tika.parser.image.ImageParserTest on JDK 1.6.0_05

2010-11-13 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931650#action_12931650 ] Maxim Valyanskiy commented on TIKA-551: --- We use 1.6.0_05 on Linux. I think it is

[jira] Commented: (TIKA-551) Unit test failures in org.apache.tika.parser.image.ImageParserTest on JDK 1.6.0_05

2010-11-12 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931382#action_12931382 ] Maxim Valyanskiy commented on TIKA-551: --- 1) testJPEG completely misses

[jira] Updated: (TIKA-551) Unit test failures in org.apache.tika.parser.image.ImageParserTest on JDK 1.6.0_05

2010-11-12 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-551: -- Attachment: org.apache.tika.parser.image.ImageParserTest.txt tika/tika-parsers/target/surefire

[jira] Created: (TIKA-551) Unit test failures in org.apache.tika.parser.image.ImageParserTest on JDK 1.6.0_05

2010-11-12 Thread Maxim Valyanskiy (JIRA)
Issue Type: Bug Reporter: Maxim Valyanskiy $ mvn test . Results : Failed tests: testJPEG(org.apache.tika.parser.image.ImageParserTest) testBMP(org.apache.tika.parser.image.ImageParserTest) testPNG(org.apache.tika.parser.image.ImageParserTest) This bug reproduces

[jira] Resolved: (TIKA-550) Add stable filenames for extracted embedded files from Office binaries

2010-11-12 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-550. --- Resolution: Fixed Committed revision 1034373. > Add stable filenames for extracted embed

[jira] Created: (TIKA-550) Add stable filenames for extracted embedded files from Office binaries

2010-11-12 Thread Maxim Valyanskiy (JIRA)
: Improvement Components: parser Reporter: Maxim Valyanskiy Attachments: 1 This patch add usage of POI entry names as base for file name of extracted embedded file. This make file names stable and reproducible. This intended for debugging and testing -- This message is

[jira] Updated: (TIKA-550) Add stable filenames for extracted embedded files from Office binaries

2010-11-12 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-550: -- Attachment: 1 > Add stable filenames for extracted embedded files from Office binar

[jira] Resolved: (TIKA-549) There is no support for extracting OLE-shapes from PPT

2010-11-12 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-549. --- Resolution: Fixed Fix Version/s: 0.9 fixed in r1034359 and r1034360 > There is

[jira] Created: (TIKA-549) There is no support for extracting OLE-shapes from PPT

2010-11-12 Thread Maxim Valyanskiy (JIRA)
: parser Reporter: Maxim Valyanskiy There is no support for extracting OLE-shapes from PPT -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.

Re: svn commit: r1033937 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/extractor/ tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ tika-parsers/src/main/java/org/apache/tika/pa

2010-11-11 Thread Maxim Valyanskiy
Hello! 11.11.2010 17:05, Jukka Zitting пишет: Log: Extract interface for EmbeddedDocumentExtractor We have POI-based utility that extracts all embedded files (attachments, pictures and etc) from different file formats. This utility takes arbitrary file and returns ZIP-archive with all attac

  1   2   >