Hi, I have a question on SolrUpdateBolt.execute() <https://github.com/apache/storm/blob/master/external/storm-solr/src/main/java/org/apache/storm/solr/bolt/SolrUpdateBolt.java#L92> method.
It seems that SolrUpdateBolt is sending every tuple to Solr in the execute() method but sending a commit() only after a specified number of documents have been sent. Would it be better if we batch the documents in memory and then send to Solr ? I am drawing inspiration from another very popular search-engine bolt EsBolt that keeps the tuples in memory and then sends one batch-request along with ack() or fail() based on a single batch-request's outcome. Here are some pointers on the EsBolt that shows how they do it: EsBolt.execute() <https://github.com/elastic/elasticsearch-hadoop/blob/master/storm/src/main/java/org/elasticsearch/storm/EsBolt.java#L116-L120> --> RestRepository.writeToIndex() <https://github.com/elastic/elasticsearch-hadoop/blob/master/mr/src/main/java/org/elasticsearch/hadoop/rest/RestRepository.java#L154-L164> ---> RestRepository.doWriteToIndex() <https://github.com/elastic/elasticsearch-hadoop/blob/master/mr/src/main/java/org/elasticsearch/hadoop/rest/RestRepository.java#L182-L214> If we do the same in SolrUpdateBolt, the number of http-calls is reduced by a factor of N, where N is the batch-size of the request and that would be a good performance boost IMO Thanks, Tid
