[ https://issues.apache.org/jira/browse/NUTCH-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666616#comment-13666616 ]
Luca Cavanna commented on NUTCH-1527: ------------------------------------- I just ran into this issue and thought it would be nice if nutch supported elasticsearch out-of-the-box. I had a look at the code and saw a few things that I would do differently: - You can use the BulkProcessor instead of manually having to create the BulkRequest and handle it. It'll automatically execute the bulk when needed and it's also really flexible and configurable. That way you would be able to remove a lot of boilerplate code. - I know the multicast discovery is fancy, that like you do now you don't need to specify any url and the client node will join an existing cluster with same name, but I think I would go for the other type of client here, the TransportClient, which is more lightweight and just sends requests to the configured urls in a round-robin fashion, using the internal binary protocol that elasticsearch uses for inter-node communication. Let me know if I can help more, I'm certainly willing to get my hands dirty here if you want ;) > Port nutch-elasticsearch-indexer to Nutch > ----------------------------------------- > > Key: NUTCH-1527 > URL: https://issues.apache.org/jira/browse/NUTCH-1527 > Project: Nutch > Issue Type: Bug > Components: indexer > Affects Versions: 1.6, 2.1 > Reporter: Lewis John McGibbney > Assignee: lufeng > Priority: Minor > Fix For: 2.4 > > Attachments: NUTCH-1527.patch > > > The source repos for this can be found here [0]. > This issue should be inline with the work already done by Julien and others > over at NUTCH-1047. > [0] https://github.com/ctjmorgan/nutch-elasticsearch-indexer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira