On Mon, Oct 18, 2010 at 8:22 PM, Jason, Kim <hialo...@gmail.com> wrote:

Sorry for the delay in replying. Was caught up in various things this
week.

> Thank you for reply, Gora
>
> But I still have several questions.
> Did you use separate index?
> If so, you indexed 0.7 million Xml files per instance
> and merged it. Is it Right?

Yes, that is correct. We sharded the data by user ID, so that each of the 25
cores held approximately 0.7 million out of the 3.5 million records. We could
have used the sharded indices directly for search, but at least for now have
decided to go with a single, merged index.

> Please let me know how to work multiple instances and cores in your case.
[...]

* Multi-core Solr setup is quite easy, via configuration in solr.xml:
  http://wiki.apache.org/solr/CoreAdmin . The configuration, i.e.,
  schema, solrconfig.xml, etc. need to be replicated across the
  cores.
* Decide which XML files you will post to which core, and do the
  POST with curl, as usual. You might need to write a little script
  to do this.
* After indexing on the cores is done, make sure to do a commit
  on each.
* Merge the sharded indexes (if desired) as described here:
  http://wiki.apache.org/solr/MergingSolrIndexes . One thing to
  watch out for here is disk space. When merging with Lucene
  IndexMergeTool, we found that a rough rule of thumb was that
  intermediate steps in the merge would require about twice as
  much space as the total size of the indexes to be merged. I.e.,
  if one is merging 40GB of data in sharded indexes, one should
  have at least 120GB free.

Regards,
Gora

Reply via email to