Hi Jan,
Thanks for all these details !
Answers are below.
Sincerely,
Bruno
Le 22/05/2012 13:58, Jan Høydahl a écrit :
Hi,
It is impossible to guess the required HW size without more knowledge about
data and usage. 80 mill docs is a fair amount.
Here's how I would approach sizing the setup:
1) Get your schema in shape, removing unnecessary stored/indexed fields
Ok good idea !
2) To a test index locally of a part of the dataset, e.g. 10 mill docs and
perform an Optimize
Concerning test, I have only actually a sample with 12000 docs. no more :'(
3) Measure the size of the index folder, multiply with 8 to get a clue of total
index size
With 12 000 docs my index folder size is: 33Mo
ps: I use "solr.clustering.enabled=true"
4) Do some benchmarking with realistic types of queries to identify performance
bottlenecks on query side
yep, this point is for later.
Depending on your requirements for search performance, you can beef up your RAM to
hold the whole index or depend on slow disks as a bottleneck. If you find that
total size of index is 16Gb, you should leave>16Gb free for OS disk caching,
e.g. allocate 8Gb to Tomcat/Solr and leave the rest for the OS. If I should guess,
you probably find that one server gets overloaded or too slow with your amount of
docs, and that you end up with sharding across 2-4 servers.
I will take a look to see if I can easely increase RAM on the server
(actually 24Go)
Another question concerning the execution of solr, have just to run java
-jar start.jar ?
or you think I must run it with another way ?
PS: Do you always need to search all data? A trick may be to partition your data such
that say 80% of searches go to a "fresh" index with 10% of the content, while
the remaining searches include everything.
Yes I need to search to the whole index, even old document must be
requested.
--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com
On 22. mai 2012, at 11:06, Bruno Mannina wrote:
My choice: http://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml
24 Go DDR3
Le 22/05/2012 10:26, findbestopensource a écrit :
Dedicated Server may not be required. If you want to cut down cost, then
prefer shared server.
How much the RAM?
Regards
Aditya
www.findbestopensource.com
On Tue, May 22, 2012 at 12:36 PM, Bruno Mannina<bmann...@free.fr> wrote:
Dear Solr users,
My company would like to use solr to index around 80 000 000 documents
(xml files with around 5~10ko size each).
My program (robot) will connect to this solr with boolean requests.
Number of users: around 1000
Number of requests by user and by day: 300
Number of users by day: 30
I would like to subscribe to a host provider with this configuration:
- Dedicated Server
- Ubuntu
- Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
- Unlimited bandwidth
- IP fixe
Do you think this configuration is enough?
Thanks for your info,
Sincerely
Bruno