[ https://issues.apache.org/jira/browse/NUTCH-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney resolved NUTCH-2287. ----------------------------------------- Resolution: Fixed Assignee: Lewis John McGibbney I've merged this in to master [~naegelejd], that way more people can try it out. I'm marking this as resolved for now. Thanks for the contributions. > Indexer-elastic plugin should use Elasticsearch BulkProcessor and > BackoffPolicy > ------------------------------------------------------------------------------- > > Key: NUTCH-2287 > URL: https://issues.apache.org/jira/browse/NUTCH-2287 > Project: Nutch > Issue Type: Improvement > Components: indexer, plugin > Affects Versions: 1.12 > Reporter: Joseph Naegele > Assignee: Lewis John McGibbney > > Elasticsearch's API (since at least v2.0) includes the {{BulkProcessor}}, > which automatically handles flushing bulk requests given a max doc count > and/or max bulk size. It also now (I believe since 2.2.0) offers a > {{BackoffPolicy}} option, allowing the BulkProcessor/Client to retry bulk > requests when the Elasticsearch cluster is saturated. Using the > {{BulkProcessor}} was originally suggested > [here|https://issues.apache.org/jira/browse/NUTCH-1527?focusedCommentId=13666616&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13666616]. > Refactoring the {{indexer-elastic}} plugin to use the {{BulkProcessor}} will > greatly simplify the existing plugin at the cost of slightly less debug > logging. Additionally, it will allow the plugin to handle cluster saturation > gracefully (rather than raising a RuntimeException and killing the reduce > task), by using a configurable "exponential back-off policy". > https://www.elastic.co/guide/en/elasticsearch/client/java-api/2.3/java-docs-bulk-processor.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)