Well, 12000 is probably too little to do a representative sizing, but you can 
try an optimize() and then calculate what the size will be for 80mill docs. 
You'll definitely not be able to cache the whole index in memory on one server, 
but if you can live with that kind of performance then it's ok. Btw. clustering 
may get very expensive since you need to return many more hits than normally 
and that means disk I/O..

> Another question concerning the execution of solr, have just to run java -jar 
> start.jar ?
> or you think I must run it with another way ?

First you must decide what application server to use. You may of course use the 
built-in Jetty server which comes with Solr if you wish. You will have to set 
JVM parameters for memory, garbage collection, logging etc. See 
http://wiki.apache.org/solr/SolrPerformanceFactors and 
http://wiki.apache.org/solr/SolrJetty

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 22. mai 2012, at 15:16, Bruno Mannina wrote:

> Hi Jan,
> 
> Thanks for all these details !
> 
> Answers are below.
> 
> Sincerely,
> Bruno
> 
> 
> Le 22/05/2012 13:58, Jan Høydahl a écrit :
>> Hi,
>> 
>> It is impossible to guess the required HW size without more knowledge about 
>> data and usage. 80 mill docs is a fair amount.
>> 
>> Here's how I would approach sizing the setup:
>> 1) Get your schema in shape, removing unnecessary stored/indexed fields
> Ok good idea !
>> 2) To a test index locally of a part of the dataset, e.g. 10 mill docs and 
>> perform an Optimize
> Concerning test, I have only actually a sample with 12000 docs. no more :'(
>> 3) Measure the size of the index folder, multiply with 8 to get a clue of 
>> total index size
> With 12 000 docs my index folder size is: 33Mo
> ps: I use "solr.clustering.enabled=true"
> 
>> 4) Do some benchmarking with realistic types of queries to identify 
>> performance bottlenecks on query side
> yep, this point is for later.
> 
>> Depending on your requirements for search performance, you can beef up your 
>> RAM to hold the whole index or depend on slow disks as a bottleneck. If you 
>> find that total size of index is 16Gb, you should leave>16Gb free for OS 
>> disk caching, e.g. allocate 8Gb to Tomcat/Solr and leave the rest for the 
>> OS. If I should guess, you probably find that one server gets overloaded or 
>> too slow with your amount of docs, and that you end up with sharding across 
>> 2-4 servers.
> I will take a look to see if I can easely increase RAM on the server 
> (actually 24Go)
> 
> Another question concerning the execution of solr, have just to run java -jar 
> start.jar ?
> or you think I must run it with another way ?
> 
> 
>> PS: Do you always need to search all data? A trick may be to partition your 
>> data such that say 80% of searches go to a "fresh" index with 10% of the 
>> content, while the remaining searches include everything.
> Yes I need to search to the whole index, even old document must be requested.
> 
> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.facebook.com/Cominvent
>> Solr Training - www.solrtraining.com
>> 
>> On 22. mai 2012, at 11:06, Bruno Mannina wrote:
>> 
>>> My choice: http://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml
>>> 
>>> 24 Go DDR3
>>> 
>>> Le 22/05/2012 10:26, findbestopensource a écrit :
>>>> Dedicated Server may not be required. If you want to cut down cost, then
>>>> prefer shared server.
>>>> 
>>>> How much the RAM?
>>>> 
>>>> Regards
>>>> Aditya
>>>> www.findbestopensource.com
>>>> 
>>>> 
>>>> On Tue, May 22, 2012 at 12:36 PM, Bruno Mannina<bmann...@free.fr>   wrote:
>>>> 
>>>>> Dear Solr users,
>>>>> 
>>>>> My company would like to use solr to index around 80 000 000 documents
>>>>> (xml files with around 5~10ko size each).
>>>>> My program (robot) will connect to this solr with boolean requests.
>>>>> 
>>>>> Number of users: around 1000
>>>>> Number of requests by user and by day: 300
>>>>> Number of users by day: 30
>>>>> 
>>>>> I would like to subscribe to a host provider with this configuration:
>>>>> - Dedicated Server
>>>>> - Ubuntu
>>>>> - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
>>>>> - Unlimited bandwidth
>>>>> - IP fixe
>>>>> 
>>>>> Do you think this configuration is enough?
>>>>> 
>>>>> Thanks for your info,
>>>>> Sincerely
>>>>> Bruno
>>>>> 
>> 
>> 
> 

Reply via email to