Ravi, what is the replication configuration on both master and slave? Also could you list of files in the index folder on master and slave before and after the replication?
-Alexander On Fri, 2011-05-13 at 18:34 -0400, Ravi Solr wrote: > Sorry guys spoke too soon I guess. The replication still remains very > slow even after upgrading to 3.1 and setting the compression off. Now > Iam totally clueless. I have tried everything that I know of to > increase the speed of replication but failed. if anybody faced the > same issue, can you please tell me how you solved it. > > Ravi Kiran Bhaskar > > On Thu, May 12, 2011 at 6:42 PM, Ravi Solr <ravis...@gmail.com> wrote: > > Thank you Mr. Bell and Mr. Kanarsky, as per your advise we have moved > > from 1.4.1 to 3.1 and have made several changes to configuration. The > > configuration changes have worked nicely till now and the replication > > is finishing within the interval and not backing up. The changes we > > made are as follows > > > > 1. Increased the mergeFactor from 10 to 15 > > 2. Increased ramBufferSizeMB to 1024 > > 3. Changed lockType to single (previously it was simple) > > 4. Set maxCommitsToKeep to 1 in the deletionPolicy > > 5. Set maxPendingDeletes to 0 > > 6. Changed caches from LRUCache to FastLRUCache as we had hit ratios > > well over 75% to increase warming speed > > 7. Increased the poll interval to 6 minutes and re-indexed all content. > > > > Thanks, > > > > Ravi Kiran Bhaskar > > > > On Wed, May 11, 2011 at 6:00 PM, Alexander Kanarsky > > <alexan...@trulia.com> wrote: > >> Ravi, > >> > >> if you have what looks like a full replication each time even if the > >> master generation is greater than slave, try to watch for the index on > >> both master and slave the same time to see what files are getting > >> replicated. You probably may need to adjust your merge factor, as Bill > >> mentioned. > >> > >> -Alexander > >> > >> > >> > >> On Tue, 2011-05-10 at 12:45 -0400, Ravi Solr wrote: > >>> Hello Mr. Kanarsky, > >>> Thank you very much for the detailed explanation, > >>> probably the best explanation I found regarding replication. Just to > >>> be sure, I wanted to test solr 3.1 to see if it alleviates the > >>> problems...I dont think it helped. The master index version and > >>> generation are greater than the slave, still the slave replicates the > >>> entire index form master (see replication admin screen output below). > >>> Any idea why it would get the whole index everytime even in 3.1 or am > >>> I misinterpreting the output ? However I must admit that 3.1 finished > >>> the replication unlike 1.4.1 which would hang and be backed up for > >>> ever. > >>> > >>> Master http://masterurl:post/solr-admin/searchcore/replication > >>> Latest Index Version:null, Generation: null > >>> Replicatable Index Version:1296217097572, Generation: 12726 > >>> > >>> Poll Interval 00:03:00 > >>> > >>> Local Index Index Version: 1296217097569, Generation: 12725 > >>> > >>> Location: /data/solr/core/search-data/index > >>> Size: 944.32 MB > >>> Times Replicated Since Startup: 148 > >>> Previous Replication Done At: Tue May 10 12:32:42 EDT 2011 > >>> Config Files Replicated At: null > >>> Config Files Replicated: null > >>> Times Config Files Replicated Since Startup: null > >>> Next Replication Cycle At: Tue May 10 12:35:41 EDT 2011 > >>> > >>> Current Replication Status Start Time: Tue May 10 12:32:41 EDT 2011 > >>> Files Downloaded: 18 / 108 > >>> Downloaded: 317.48 KB / 436.24 MB [0.0%] > >>> Downloading File: _ayu.nrm, Downloaded: 4 bytes / 4 bytes [100.0%] > >>> Time Elapsed: 17s, Estimated Time Remaining: 23902s, Speed: 18.67 > >>> KB/s > >>> > >>> > >>> Thanks, > >>> Ravi Kiran Bhaskar > >>> > >>> On Tue, May 10, 2011 at 4:10 AM, Alexander Kanarsky > >>> <alexan...@trulia.com> wrote: > >>> > Ravi, > >>> > > >>> > as far as I remember, this is how the replication logic works (see > >>> > SnapPuller class, fetchLatestIndex method): > >>> > > >>> >> 1. Does the Slave get the whole index every time during replication or > >>> >> just the delta since the last replication happened ? > >>> > > >>> > > >>> > It look at the index version AND the index generation. If both slave's > >>> > version and generation are the same as on master, nothing gets > >>> > replicated. if the master's generation is greater than on slave, the > >>> > slave fetches the delta files only (even if the partial merge was done > >>> > on the master) and put the new files from master to the same index > >>> > folder on slave (either index or index.<timestamp>, see further > >>> > explanation). However, if the master's index generation is equals or > >>> > less than one on slave, the slave does the full replication by > >>> > fetching all files of the master's index and place them into a > >>> > separate folder on slave (index.<timestamp>). Then, if the fetch is > >>> > successfull, the slave updates (or creates) the index.properties file > >>> > and puts there the name of the "current" index folder. The "old" > >>> > index.<timestamp> folder(s) will be kept in 1.4.x - which was treated > >>> > as a bug - see SOLR-2156 (and this was fixed in 3.1). After this, the > >>> > slave does commit or reload core depending whether the config files > >>> > were replicated. There is another bug in 1.4.x that fails replication > >>> > if the slave need to do the full replication AND the config files were > >>> > changed - also fixed in 3.1 (see SOLR-1983). > >>> > > >>> >> 2. If there are huge number of queries being done on slave will it > >>> >> affect the replication ? How can I improve the performance ? (see the > >>> >> replications details at he bottom of the page) > >>> > > >>> > > >>> > >From my experience the half of the replication time is a time when the > >>> > transferred data flushes to the disk. So the IO impact is important. > >>> > > >>> >> 3. Will the segment names be same be same on master and slave after > >>> >> replication ? I see that they are different. Is this correct ? If it > >>> >> is correct how does the slave know what to fetch the next time i.e. > >>> >> the delta. > >>> > > >>> > > >>> > They should be the same. The slave fetches the changed files only (see > >>> > above), also look at SnapPuller code. > >>> > > >>> >> 4. When and why does the index.<TIMESTAMP> folder get created ? I see > >>> >> this type of folder getting created only on slave and the slave > >>> >> instance is pointing to it. > >>> > > >>> > > >>> > See above. > >>> > > >>> >> 5. Does replication process copy both the index and index.<TIMESTAMP> > >>> > folder ? > >>> > > >>> > > >>> > index.<timestamp> folder gets created only of the full replication > >>> > happened at least once. Otherwise, the slave will use the index > >>> > folder. > >>> > > >>> >> 6. what happens if the replication kicks off even before the previous > >>> >> invocation has not completed ? will the 2nd invocation block or will > >>> >> it go through causing more confusion ? > >>> > > >>> > > >>> > There is a lock (snapPullLock in ReplicationHandler) that prevents two > >>> > replications run simultaneously. If there is no bug, it should just > >>> > return silently from the replication call. (I personally never had > >>> > problem with this so it looks there is no bug :) > >>> > > >>> >> 7. If I have to prep a new master-slave combination is it OK to copy > >>> >> the respective contents into the new master-slave and start solr ? or > >>> >> do I have have to wipe the new slave and let it replicate from its new > >>> >> master ? > >>> > > >>> > > >>> > If the new master has a different index, the slave will create a new > >>> > <index.timestamp> folder. There is no need to wipe it. > >>> > > >>> >> 8. Doing an 'ls | wc -l' on index folder of master and slave gave 194 > >>> >> and 17968 respectively...I slave has lot of segments_xxx files. Is > >>> >> this normal ? > >>> > > >>> > > >>> > No, it looks like in your case the slave continues to replicate to the > >>> > same folder for a long time period but the old files are not getting > >>> > deleted bu some reason. Try to restart the slave or do core reload on > >>> > it to see if the old segments gone. > >>> > > >>> > -Alexander > >>> > > >>> > > >> > >> > >> > >