Re: Fastest way to import big amount of documents in SolrCloud

2014-05-02 Thread Erick Erickson
re: optimize after every import This is not recommended in 4.x unless and until you have evidence that it really does help, reviews are very mixed, and it's been renamed force merge in 4.x just so people don't think "Of course I want to do this, who wouldn't?". bq: Doing a commit instead of

Re: Fastest way to import big amount of documents in SolrCloud

2014-05-01 Thread Alexander Kanarsky
If you build your index in Hadoop, read this (it is about the Cloudera Search but in my understanding also should work with Solr Hadoop contrib since 4.7) http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_batch_index_to_solr_servers_using_g

Re: Fastest way to import big amount of documents in SolrCloud

2014-05-01 Thread Costi Muraru
Thanks for the reply, Anshum. Please see my answers to your questions below. * Why do you want to do a full index everyday? Not sure I understand what you mean by full index. Every day we want to import additional documents to the existing ones. Of course, we want to remove older ones as well,

Re: Fastest way to import big amount of documents in SolrCloud

2014-05-01 Thread Anshum Gupta
Hi Costi, I'd recommend SolrJ, parallelize the inserts. Also, it helps to set the commit intervals reasonable. Just to get a better perspective * Why do you want to do a full index everyday? * How much of data are we talking about? * What's your SolrCloud setup like? * Do you already have some be

Fastest way to import big amount of documents in SolrCloud

2014-05-01 Thread Costi Muraru
Hi guys, What would you say it's the fastest way to import data in SolrCloud? Our use case: each day do a single import of a big number of documents. Should we use SolrJ/DataImportHandler/other? Or perhaps is there a bulk import feature in SOLR? I came upon this promising link: http://wiki.apache