[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-11-10 Thread Thomas Mueller (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247218#comment-16247218 ] Thomas Mueller commented on OAK-5519: - [~jsedding] This only works if text extraction is reading, but in

[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-11-10 Thread Julian Sedding (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247189#comment-16247189 ] Julian Sedding commented on OAK-5519: - [~tmueller] could the processing thread be terminated by closing

[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-11-09 Thread Thomas Mueller (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246063#comment-16246063 ] Thomas Mueller commented on OAK-5519: - http://svn.apache.org/r1814745 [~chetanm] I have incorporated

[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-11-08 Thread Chetan Mehrotra (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244308#comment-16244308 ] Chetan Mehrotra commented on OAK-5519: -- bq. However, after a restart, Oak will not try to extract the

[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-11-08 Thread Thomas Mueller (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244076#comment-16244076 ] Thomas Mueller commented on OAK-5519: - > Going forward we can probably store some hidden property to

[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-11-08 Thread Chetan Mehrotra (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243754#comment-16243754 ] Chetan Mehrotra commented on OAK-5519: -- bq. the text extraction cache only puts results in the cache

[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-11-08 Thread Thomas Mueller (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243702#comment-16243702 ] Thomas Mueller commented on OAK-5519: - I found out why there are two threads consuming 100% each, and

[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-11-08 Thread Thomas Mueller (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243573#comment-16243573 ] Thomas Mueller commented on OAK-5519: - My current approach is: extract larger binaries using a separate

[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-07-28 Thread Chetan Mehrotra (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104727#comment-16104727 ] Chetan Mehrotra commented on OAK-5519: -- bq. it does nothing except throw an exception / error / out of

[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-07-26 Thread Thomas Mueller (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101666#comment-16101666 ] Thomas Mueller commented on OAK-5519: - [~catholicon] and [~chetanm] I think we should try the "Memory of

[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-05-04 Thread Thomas Mueller (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996500#comment-15996500 ] Thomas Mueller commented on OAK-5519: - Do we have a test case (for example a PDF file that runs out of

[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-04-20 Thread Chetan Mehrotra (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976491#comment-15976491 ] Chetan Mehrotra commented on OAK-5519: -- *Problematic Binary Handling* h3. A - Out of process Best

[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-04-20 Thread Chetan Mehrotra (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976472#comment-15976472 ] Chetan Mehrotra commented on OAK-5519: -- bq. It probably makes sense to deal with OOME as well (at

[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-04-20 Thread Thomas Mueller (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976382#comment-15976382 ] Thomas Mueller commented on OAK-5519: - I recently saw OutOfMemory error during the index update; I'm not

[jira] [Commented] (OAK-5519) Skip problematic binaries instead of blocking indexing

2017-01-26 Thread Alexander Klimetschek (JIRA)
[ https://issues.apache.org/jira/browse/OAK-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840316#comment-15840316 ] Alexander Klimetschek commented on OAK-5519: Related issues: * OAK-4939 addresses this in 1.5