Shawn, the block cache seems to be off-heap according to 
https://lucene.apache.org/solr/guide/7_4/running-solr-on-hdfs.html 
<https://lucene.apache.org/solr/guide/7_4/running-solr-on-hdfs.html>

So you have 800G across 4 nodes, that gives 500M docs and 200G index data per 
solr node and 40G per shard.
Initially I'd say this is way too much data and too little RAM per node but it 
obviously work due to the very small docs you have.
So the first I'd try (after doing some analysis of various metrics for your 
running cluster) was to adjust the size of the hdfs block-cache following the 
instructions from the link above. You'll have 20-25Gb available for this, which 
is only 1/10 of the index size.

So next step would be to replace the EC2 images with ones with more RAM and 
increase block cache further and see of that helps.

Next I'd enable autoWarmCount on filterCache, find alternatives to wildcard 
query  and more.

But all in all, I'd be very very satisfied with those low response times given 
the size of your data.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 23. aug. 2018 kl. 15:05 skrev Shawn Heisey <apa...@elyograg.org>:
> 
> On 8/23/2018 5:19 AM, zhenyuan wei wrote:
>> Thanks for your detail answer @Shawn
>> 
>> Yes I run the query in SolrCloud mode, and my collection has 20 shards,
>> each shard size is 30~50GB。
>> 4 solr server, each solr JVM  use 6GB, HDFS datanode are 4 too, each
>> datanode JVM use 2.5GB。
>> Linux server host are 4 node too,each node is 16 core/32GB RAM/1600GB SSD 。
>> 
>> So, in order to  search 2 billion docs fast( HDFS shows 787GB ),I should
>> turn on autowarm,and   How
>> much  solr RAM/how many solr node  it should be?
>> Is there a roughly  formula to budget ?
> 
> There are no generic answers, no rough formulas.  Every install is different 
> and minimum requirements are dependent on the specifics of the install.
> 
> How many replicas do you have of each of those 20 shards? Is the 787GB of 
> data the size of *one* replica, or the size of *all* replicas?  Based on the 
> info you shared, I suspect that it's the size of one replica.
> 
> Here's a guide I've written:
> 
> https://wiki.apache.org/solr/SolrPerformanceProblems
> 
> That guide doesn't consider HDFS, so the info about the OS disk cache on that 
> page is probably not helpful.  I really have no idea what requirements HDFS 
> has.  I *think* that the HDFS client block cache would replace the OS disk 
> cache, and that the Solr heap must be increased to accommodate that block 
> cache.  This might lead to GC issues, though, because ideally the cache would 
> be large enough to cache all of the index data that the Solr instance is 
> accessing.  In your case, that's a LOT of data, far more than you can fit 
> into the 32GB total system memory.Solr performance will suffer if you're not 
> able to have the system cache Solr's index data.  But I will tell you that 
> achieving a QTime of 125 on a wildcard query against a 2 billion document 
> index is impressive, not something I would expect to happen with the low 
> hardware resources you're using.
> 
> You have 20 shards.  If your replicationFactor is 3, then ideally you would 
> have 60 servers - one for each shard replica. Each server would have enough 
> memory installed that it could cache the 30-50GB of data in that shard, or at 
> least MOST of it.
> 
> IMHO, Solr should be using local storage, not a network filesystem like HDFS. 
>  Things are a lot more straightforward that way.
> 
> Thanks,
> Shawn
> 

Reply via email to