Re: [scottchu] What kind of configuration to use for this size of news data?

scott.chu Tue, 10 May 2016 21:34:55 -0700

A further question: Can master-slave and SolrCloud exist simultaneously in one 
Solr server? If yes, how can I do it?

scott.chu，scott....@udngroup.com
2016/5/11 (週三)
----- Original Message ----- 
From: scott(自己) 
To: solr-user 
CC: 
Date: 2016/5/11 (週三) 11:11
Subject: [scottchu] What kind of configuration to use for this size of news 
data?

I want to build a Solr engine for over 60-year news articles. My requests are 
(I use Solr 5.4.1):

1> Currently over 10M no. of docs.
2> Currently over 60GB total data size.
3> The no. of docs and data size will keep growing at the rate of 1000 no. of 
docs(or 8MB size) per day.
4> There are totally 5-6 different newspaper types.

My questions are:
1>  Is it wokable enough just to use master-slave model? Or should I turn of 
SolrCloud? (I ask this due to our system management group never manage a 
distributed system before and they also have no knowedge of Zookeeper, shards, 
etc. Also they don't know how to backup/restore distributed data.)
2> Say if I choose Solrcloud anyway. I wish to keep one shard owning one 
specific year of data. Can it be done? What configuration should I do? (AFAIK, 
SolrCloud distributes data based on some intrinsic routing algorithm.)
3> If I wish to create another Solr engine with one or two particular paper 
types. Is it possible to copy their data directly from the big central Solr 
engine? Or I have to rebuild index from raw articles data? (Our business has 
this possibility of needs.)

I'd like to hear and use some well suggestion and experiences.

Thanks in advance and best regards.
scott.chu，scott....@udngroup.com
2016/5/11 (週三)

Re: [scottchu] What kind of configuration to use for this size of news data?

Reply via email to