On Mon, Oct 18, 2010 at 8:22 PM, Jason, Kim <hialo...@gmail.com> wrote:
Sorry for the delay in replying. Was caught up in various things this week. > Thank you for reply, Gora > > But I still have several questions. > Did you use separate index? > If so, you indexed 0.7 million Xml files per instance > and merged it. Is it Right? Yes, that is correct. We sharded the data by user ID, so that each of the 25 cores held approximately 0.7 million out of the 3.5 million records. We could have used the sharded indices directly for search, but at least for now have decided to go with a single, merged index. > Please let me know how to work multiple instances and cores in your case. [...] * Multi-core Solr setup is quite easy, via configuration in solr.xml: http://wiki.apache.org/solr/CoreAdmin . The configuration, i.e., schema, solrconfig.xml, etc. need to be replicated across the cores. * Decide which XML files you will post to which core, and do the POST with curl, as usual. You might need to write a little script to do this. * After indexing on the cores is done, make sure to do a commit on each. * Merge the sharded indexes (if desired) as described here: http://wiki.apache.org/solr/MergingSolrIndexes . One thing to watch out for here is disk space. When merging with Lucene IndexMergeTool, we found that a rough rule of thumb was that intermediate steps in the merge would require about twice as much space as the total size of the indexes to be merged. I.e., if one is merging 40GB of data in sharded indexes, one should have at least 120GB free. Regards, Gora