Hi, why do you need to change the lockType? Does a readonly instance need locks at all?
thanks, Anders. On Tue, 14 Sep 2010 15:00:54 +0200, Peter Karich <peat...@yahoo.de> wrote: > Peter Sturge, > > this was a nice hint, thanks again! If you are here in Germany anytime I > can invite you to a beer or an apfelschorle ! :-) > I only needed to change the lockType to none in the solrconfig.xml, > disable the replication and set the data dir to the master data dir! > > Regards, > Peter Karich. > >> Hi Peter, >> >> this scenario would be really great for us - I didn't know that this is >> possible and works, so: thanks! >> At the moment we are doing similar with replicating to the readonly >> instance but >> the replication is somewhat lengthy and resource-intensive at this >> datavolume ;-) >> >> Regards, >> Peter. >> >> >>> 1. You can run multiple Solr instances in separate JVMs, with both >>> having their solr.xml configured to use the same index folder. >>> You need to be careful that one and only one of these instances will >>> ever update the index at a time. The best way to ensure this is to use >>> one for writing only, >>> and the other is read-only and never writes to the index. This >>> read-only instance is the one to use for tuning for high search >>> performance. Even though the RO instance doesn't write to the index, >>> it still needs periodic (albeit empty) commits to kick off >>> autowarming/cache refresh. >>> >>> Depending on your needs, you might not need to have 2 separate >>> instances. We need it because the 'write' instance is also doing a lot >>> of metadata pre-write operations in the same jvm as Solr, and so has >>> its own memory requirements. >>> >>> 2. We use sharding all the time, and it works just fine with this >>> scenario, as the RO instance is simply another shard in the pack. >>> >>> >>> On Sun, Sep 12, 2010 at 8:46 PM, Peter Karich <peat...@yahoo.de> wrote: >>> >>> >>>> Peter, >>>> >>>> thanks a lot for your in-depth explanations! >>>> Your findings will be definitely helpful for my next performance >>>> improvement tests :-) >>>> >>>> Two questions: >>>> >>>> 1. How would I do that: >>>> >>>> >>>> >>>>> or a local read-only instance that reads the same core as the indexing >>>>> instance (for the latter, you'll need something that periodically >>>>> refreshes - i.e. runs commit()). >>>>> >>>>> >>>> 2. Did you try sharding with your current setup (e.g. one big, >>>> nearly-static index and a tiny write+read index)? >>>> >>>> Regards, >>>> Peter. >>>> >>>> >>>> >>>>> Hi, >>>>> >>>>> Below are some notes regarding Solr cache tuning that should prove >>>>> useful for anyone who uses Solr with frequent commits (e.g. <5min). >>>>> >>>>> Environment: >>>>> Solr 1.4.1 or branch_3x trunk. >>>>> Note the 4.x trunk has lots of neat new features, so the notes here >>>>> are likely less relevant to the 4.x environment. >>>>> >>>>> Overview: >>>>> Our Solr environment makes extensive use of faceting, we perform >>>>> commits every 30secs, and the indexes tend be on the large-ish side >>>>> (>20million docs). >>>>> Note: For our data, when we commit, we are always adding new data, >>>>> never changing existing data. >>>>> This type of environment can be tricky to tune, as Solr is more geared >>>>> toward fast reads than frequent writes. >>>>> >>>>> Symptoms: >>>>> If anyone has used faceting in searches where you are also performing >>>>> frequent commits, you've likely encountered the dreaded OutOfMemory or >>>>> GC Overhead Exeeded errors. >>>>> In high commit rate environments, this is almost always due to >>>>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't >>>>> finish autowarming their caches before the next commit() >>>>> comes along and invalidates them. >>>>> Once this starts happening on a regular basis, it is likely your >>>>> Solr's JVM will run out of memory eventually, as the number of >>>>> searchers (and their cache arrays) will keep growing until the JVM >>>>> dies of thirst. >>>>> To check if your Solr environment is suffering from this, turn on INFO >>>>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping >>>>> onDeckSearchers=x'. >>>>> >>>>> In tests, we've only ever seen this problem when using faceting, and >>>>> facet.method=fc. >>>>> >>>>> Some solutions to this are: >>>>> Reduce the commit rate to allow searchers to fully warm before the >>>>> next commit >>>>> Reduce or eliminate the autowarming in caches >>>>> Both of the above >>>>> >>>>> The trouble is, if you're doing NRT commits, you likely have a good >>>>> reason for it, and reducing/elimintating autowarming will very >>>>> significantly impact search performance in high commit rate >>>>> environments. >>>>> >>>>> Solution: >>>>> Here are some setup steps we've used that allow lots of faceting (we >>>>> typically search with at least 20-35 different facet fields, and date >>>>> faceting/sorting) on large indexes, and still keep decent search >>>>> performance: >>>>> >>>>> 1. Firstly, you should consider using the enum method for facet >>>>> searches (facet.method=enum) unless you've got A LOT of memory on your >>>>> machine. In our tests, this method uses a lot less memory and >>>>> autowarms more quickly than fc. (Note, I've not tried the new >>>>> segement-based 'fcs' option, as I can't find support for it in >>>>> branch_3x - looks nice for 4.x though) >>>>> Admittedly, for our data, enum is not quite as fast for searching as >>>>> fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile >>>>> tradeoff. >>>>> If you do have access to LOTS of memory, AND you can guarantee that >>>>> the index won't grow beyond the memory capacity (i.e. you have some >>>>> sort of deletion policy in place), fc can be a lot faster than enum >>>>> when searching with lots of facets across many terms. >>>>> >>>>> 2. Secondly, we've found that LRUCache is faster at autowarming than >>>>> FastLRUCache - in our tests, about 20% faster. Maybe this is just our >>>>> environment - your mileage may vary. >>>>> >>>>> So, our filterCache section in solrconfig.xml looks like this: >>>>> <filterCache >>>>> class="solr.LRUCache" >>>>> size="3600" >>>>> initialSize="1400" >>>>> autowarmCount="3600"/> >>>>> >>>>> For a 28GB index, running in a quad-core x64 VMWare instance, 30 >>>>> warmed facet fields, Solr is running at ~4GB. Stats filterCache size >>>>> shows usually in the region of ~2400. >>>>> >>>>> 3. It's also a good idea to have some sort of >>>>> firstSearcher/newSearcher event listener queries to allow new data to >>>>> populate the caches. >>>>> Of course, what you put in these is dependent on the facets you >>>>> need/use. >>>>> We've found a good combination is a firstSearcher with as many facets >>>>> in the search as your environment can handle, then a subset of the >>>>> most common facets for the newSearcher. >>>>> >>>>> 4. We also set: >>>>> <useColdSearcher>true</useColdSearcher> >>>>> just in case. >>>>> >>>>> 5. Another key area for search performance with high commits is to use >>>>> 2 Solr instances - one for the high commit rate indexing, and one for >>>>> searching. >>>>> The read-only searching instance can be a remote replica, or a local >>>>> read-only instance that reads the same core as the indexing instance >>>>> (for the latter, you'll need something that periodically refreshes - >>>>> i.e. runs commit()). >>>>> This way, you can tune the indexing instance for writing performance >>>>> and the searching instance as above for max read performance. >>>>> >>>>> Using the setup above, we get fantastic searching speed for small >>>>> facet sets (well under 1sec), and really good searching for large >>>>> facet sets (a couple of secs depending on index size, number of >>>>> facets, unique terms etc. etc.), >>>>> even when searching against largeish indexes (>20million docs). >>>>> We have yet to see any OOM or GC errors using the techniques above, >>>>> even in low memory conditions. >>>>> >>>>> I hope there are people that find this useful. I know I've spent a lot >>>>> of time looking for stuff like this, so hopefullly, this will save >>>>> someone some time. >>>>> >>>>> >>>>> Peter >>>>> >>>>>