Re: How large is your solr index?

Joseph Obernberger Wed, 07 Jan 2015 13:31:51 -0800

Thank you Toke - yes - the data is indexed throughout the day. We arehandling very few searches - probably 50 a day; this is an R&D system.Our HDFS cache, I believe, is too small at 10GBytes per shard. Thiscomes out to 20GBytes of HDFS cache per physical machine plus about 10Geach for the 2 JVMs running the shards. Each of those machines is alsorunning other services which leaves very little RAM available for FS cache.


Current parameters for running each shard are:

JAVA_OPTS="-XX:MaxDirectMemorySize=10g -XX:+UseLargePages -XX:NewRatio=3-XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90-XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC-XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m-XX:CMSFullGCsBeforeCompaction=1 -XX:+UseCMSInitiatingOccupancyOnly-XX:CMSInitiatingOccupancyFraction=70 -XX:CMSTriggerPermRatio=80-XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled-XX:+ParallelRefProcEnabled -XX:+AggressiveOpts -XX:ParallelGCThreads=7-Xmx10752m"

I'd love to try SSDs, but don't have the budget at present to go thatroute. I'd really like to get the HDFS option to work well as itreduces system complexity. It seems to me that if our HDFS cluster haslots/enough spindles, performance should be relatively good, as long asthe OS can actually do some caching. We will be adding more HDFS nodesin the future, increasing spindle count and reducing the amount of datastored into Solr. When we redo our Solr Cloud, we will only run oneshard per box, and supply more HDFS cache.


-Joe

On 1/7/2015 3:50 PM, Toke Eskildsen wrote:

Joseph Obernberger [j...@lovehorsepower.com] wrote:

[HDFS, 9M docs, 2.9TB, 22 shards, 11 bare metal boxes]

A typical query takes about 7 seconds to run, but we also do faceting
and clustering.  Those can take in the 3 - 5 minute range depends on
what was queried, but can be as little as 10 seconds. The index contains
about 100 fields.

7 seconds without faceting seems like a long time. I am guessing your 3M daily 
updates are spread throughout the day, instead of being a nightly batch job? 
How many concurrent searches are you handling?

We have no experience with HDFS for Solr indexes, but a quick check indicates 
that it is not a good fit for Solr. At least not out of the box: 
http://hbase.apache.org/book.html#perf.hdfs.curr

We did at one point try to use networked storage for our index. That meant 1/3 
performance, compared to local storage, but of course your mileage will vary. 
As you are looking into ways of improving performance, what about testing the 
performance difference with local storage (SSD of course)?

- Toke Eskildsen

Re: How large is your solr index?

Reply via email to