On 6/13/2013 5:53 PM, Utkarsh Sengar wrote:
> *Problems:*
> The initial training pulls 2000 documents from solr to find the most
> probable matches and calculates score (PMI/NPMI). This query is extremely
> slow. Also, a regular query also takes 3-4 seconds.
> I am running solr currently on just one VM with 12GB RAM and 8GB of Heap
> space is allocated to solr, the block storage is an SSD.

Normally, I would say that you should have as much RAM as your heap size
plus your index size, so with your 8GB heap and 15GB index, you'd want
24GB total RAM.  With SSD, that requirement should not be quite so high,
but you might want to try 16GB or more.  Solr works much better on bare
metal than it does on virtual machines.

I suspect that what might be happening here is that your heap is just a
little bit too small for the combination of your index size (both
document count and disk space), how you use Solr, and your config, so
your JVM is constantly doing garbage collections.

> What is the suggested setup for this usecase?
> My guess is, setting up 4 solr nodes will help, but what is the suggested
> RAM/heap for this kind of data?
> And what are the recommended configuration (solrconfig.xml) where I *need
> to speed up reads*?

http://wiki.apache.org/solr/SolrPerformanceProblems
http://wiki.apache.org/solr/SolrPerformanceFactors

Heap size requirements are hard to predict.  I can tell you that it's
highly unlikely that you will need cache sizes as large as you have
configured.  Start with the defaults and only increase them (by small
amounts) if your hitratio is not high enough.  If increasing the size
doesn't increase hitratio, there may be another problem.

> Also, is there a way I can debug what is going on with solr internally? As
> you can see, my queries are not that complex, so I don't need to debug my
> queries but just debug solr and see the troubled pieces in it.

If you add &debugQuery=true to your URL, Solr will give you a lot of
extra information in the response.  One of the things that would be
important here is seeing how much time is spent in various components.

> Also, I am new to solr, so there anything else which I missed to share
> which would help debug the problem?

Sharing the entire config, schema, examples of all fields from your
indexed documents, and examples of your full queries would help.
http://apaste.info

How often do you index and commit, and how many documents each time?
What is your query rate?

Thanks,
Shawn

Reply via email to