It should be supported in SolrJ, I'm surprised it's been lopped out. Bulk indexing is extremely common.
On Fri, Nov 4, 2011 at 1:16 PM, Ken Krugler <kkrugler_li...@transpac.com> wrote: > Hi list, > > I'm working on improving the performance of the Solr scheme for Cascading. > > This supports generating a Solr index as the output of a Hadoop job. We use > SolrJ to write the index locally (via EmbeddedSolrServer). > > There are mentions of using overwrite=false with the CSV request handler, as > a way of improving performance. > > I see that https://issues.apache.org/jira/browse/SOLR-653 removed this > support from SolrJ, because it was deemed too dangerous for mere mortals. > > My question is whether anyone knows just how much performance boost this > really provides. > > For Hadoop-based workflows, it's straightforward to ensure that the unique > key field is really unique, thus if the performance gain is significant, I > might look into figuring out some way (with a trigger lock) of re-enabling > this support in SolrJ. > > Thanks, > > -- Ken > > -------------------------- > Ken Krugler > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Mahout & Solr > > > > >