Re: [jira] [Created] (TIKA-880) while integrating microsoft parser it is giving error

2012-03-22 Thread som.mukhopadhyay
i am linking the code with the following jars. commons-codec-1.2.jar - done common-compress-1.1.jar - done tika-core-0.9.jar - not needed as i am using the source code tika-parsers-0.9.jar - not needed as i am using the source code fontbox-1.6.0.jar - done jempbox-1.6.0.jar - done pdfbox-1.6

Re: Pluggable language detection

2012-03-22 Thread Maxim Valyanskiy
Hello! 21.03.2012 19:51, Julien Nioche пишет: Just wondering about the best way to make the language detection pluggable instead of having it hard-wired as it is now. We now that the resources that are currently in Tika are both slow and inaccurate [1] and there are other libraries that we could

Re: Pluggable language detection

2012-03-22 Thread Julien Nioche
If you mean integrating a better third-party detector - that's exactly my point. We don't develop and maintain our own parsers, why should we follow a different logic when it comes to language identification? There are other resource around why don't we just use them? I assume that by default our e

[jira] [Created] (TIKA-881) HtmlParser sometimes(!) throws IOException while determining Html-Encoding

2012-03-22 Thread Klaus v. Einem (Created) (JIRA)
HtmlParser sometimes(!) throws IOException while determining Html-Encoding -- Key: TIKA-881 URL: https://issues.apache.org/jira/browse/TIKA-881 Project: Tika Issue Type:

[jira] [Updated] (TIKA-881) HtmlParser sometimes(!) throws IOException while determining Html-Encoding

2012-03-22 Thread Klaus v. Einem (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Klaus v. Einem updated TIKA-881: Attachment: BugfixHtmlParser.java This is my Solution... Sorry, Comments are in German. The Key is: N

[jira] [Commented] (TIKA-593) Tika network server

2012-03-22 Thread Maxim Valyanskiy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235560#comment-13235560 ] Maxim Valyanskiy commented on TIKA-593: --- I found that Jersey dependencies are on Maven

[jira] [Updated] (TIKA-881) HtmlParser sometimes(!) throws IOException while determining Html-Encoding

2012-03-22 Thread Klaus v. Einem (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Klaus v. Einem updated TIKA-881: Attachment: HtmlParser.java OK, this is 100% original sourcecode with Bugfix included.

[jira] [Issue Comment Edited] (TIKA-881) HtmlParser sometimes(!) throws IOException while determining Html-Encoding

2012-03-22 Thread Klaus v. Einem (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235562#comment-13235562 ] Klaus v. Einem edited comment on TIKA-881 at 3/22/12 1:15 PM: --

[jira] [Issue Comment Edited] (TIKA-881) HtmlParser sometimes(!) throws IOException while determining Html-Encoding

2012-03-22 Thread Klaus v. Einem (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235527#comment-13235527 ] Klaus v. Einem edited comment on TIKA-881 at 3/22/12 1:17 PM: --

[jira] [Issue Comment Edited] (TIKA-881) HtmlParser sometimes(!) throws IOException while determining Html-Encoding

2012-03-22 Thread Klaus v. Einem (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235527#comment-13235527 ] Klaus v. Einem edited comment on TIKA-881 at 3/22/12 1:25 PM: --

[jira] [Created] (TIKA-882) IllegalArgumentException: No part found for relationship

2012-03-22 Thread Maxim Valyanskiy (Created) (JIRA)
IllegalArgumentException: No part found for relationship Key: TIKA-882 URL: https://issues.apache.org/jira/browse/TIKA-882 Project: Tika Issue Type: Bug Components: parser

[jira] [Resolved] (TIKA-882) IllegalArgumentException: No part found for relationship

2012-03-22 Thread Maxim Valyanskiy (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-882. --- Resolution: Fixed Assignee: Maxim Valyanskiy > IllegalArgumentException: No part foun

Build failed in Jenkins: Tika-trunk #813

2012-03-22 Thread Apache Jenkins Server
See Changes: [maxcom] TIKA-882 - ignore incorrect part references in OOXML Extractor -- Started by an SCM change Building remotely on ubuntu4 in workspace

[jira] [Assigned] (TIKA-881) HtmlParser sometimes(!) throws IOException while determining Html-Encoding

2012-03-22 Thread Ken Krugler (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler reassigned TIKA-881: Assignee: Ken Krugler > HtmlParser sometimes(!) throws IOException while determining Html-Encodin

[jira] [Commented] (TIKA-881) HtmlParser sometimes(!) throws IOException while determining Html-Encoding

2012-03-22 Thread Ken Krugler (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235701#comment-13235701 ] Ken Krugler commented on TIKA-881: -- Hi Klaus - thanks for debugging this. I'll take a look