[jira] [Commented] (TIKA-965) Text Detection Fails on Mostly Non-ASCII UTF-8 Files

2012-07-31 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425784#comment-13425784 ] Nick Burch commented on TIKA-965: - Do you have a sample file that shows this problem? And is

[jira] [Commented] (TIKA-965) Text Detection Fails on Mostly Non-ASCII UTF-8 Files

2012-07-31 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425790#comment-13425790 ] Ray Gauss II commented on TIKA-965: --- I do have a test file and it's more than a few bytes

[jira] [Commented] (TIKA-965) Text Detection Fails on Mostly Non-ASCII UTF-8 Files

2012-07-31 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425797#comment-13425797 ] Jukka Zitting commented on TIKA-965: In the {{TextDetector}} we could also look for the

[jira] [Commented] (TIKA-965) Text Detection Fails on Mostly Non-ASCII UTF-8 Files

2012-07-31 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425981#comment-13425981 ] Ray Gauss II commented on TIKA-965: --- That's the solution I was looking into and I wanted t

[jira] [Commented] (TIKA-965) Text Detection Fails on Mostly Non-ASCII UTF-8 Files

2012-08-01 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426490#comment-13426490 ] Jukka Zitting commented on TIKA-965: I'm not too big a fan of the {{Charset}} classes in

[jira] [Commented] (TIKA-965) Text Detection Fails on Mostly Non-ASCII UTF-8 Files

2012-08-01 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426525#comment-13426525 ] Ray Gauss II commented on TIKA-965: --- Are we likely to run into similar issues with other e

[jira] [Commented] (TIKA-965) Text Detection Fails on Mostly Non-ASCII UTF-8 Files

2012-08-01 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426541#comment-13426541 ] Ray Gauss II commented on TIKA-965: --- I have a test file that I've gotten permission to inc

[jira] [Commented] (TIKA-965) Text Detection Fails on Mostly Non-ASCII UTF-8 Files

2012-08-01 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426550#comment-13426550 ] Jukka Zitting commented on TIKA-965: I see where you're going, but it's a really tricky