Ravi,

what is the replication configuration on both master and slave? 
Also could you list of files in the index folder on master and slave
before and after the replication? 

-Alexander


On Fri, 2011-05-13 at 18:34 -0400, Ravi Solr wrote:
> Sorry guys spoke too soon I guess. The replication still remains very
> slow even after upgrading to 3.1 and setting the compression off. Now
> Iam totally clueless. I have tried everything that I know of to
> increase the speed of replication but failed. if anybody faced the
> same issue, can you please tell me how you solved it.
> 
> Ravi Kiran Bhaskar
> 
> On Thu, May 12, 2011 at 6:42 PM, Ravi Solr <ravis...@gmail.com> wrote:
> > Thank you Mr. Bell and Mr. Kanarsky, as per your advise we have moved
> > from 1.4.1 to 3.1 and have made several changes to configuration. The
> > configuration changes have worked nicely till now and the replication
> > is finishing within the interval and not backing up. The changes we
> > made are as follows
> >
> > 1. Increased the mergeFactor from 10 to 15
> > 2. Increased ramBufferSizeMB to 1024
> > 3. Changed lockType to single (previously it was simple)
> > 4. Set maxCommitsToKeep to 1 in the deletionPolicy
> > 5. Set maxPendingDeletes to 0
> > 6. Changed caches from LRUCache to FastLRUCache as we had hit ratios
> > well over 75% to increase warming speed
> > 7. Increased the poll interval to 6 minutes and re-indexed all content.
> >
> > Thanks,
> >
> > Ravi Kiran Bhaskar
> >
> > On Wed, May 11, 2011 at 6:00 PM, Alexander Kanarsky
> > <alexan...@trulia.com> wrote:
> >> Ravi,
> >>
> >> if you have what looks like a full replication each time even if the
> >> master generation is greater than slave, try to watch for the index on
> >> both master and slave the same time to see what files are getting
> >> replicated. You probably may need to adjust your merge factor, as Bill
> >> mentioned.
> >>
> >> -Alexander
> >>
> >>
> >>
> >> On Tue, 2011-05-10 at 12:45 -0400, Ravi Solr wrote:
> >>> Hello Mr. Kanarsky,
> >>>                 Thank you very much for the detailed explanation,
> >>> probably the best explanation I found regarding replication. Just to
> >>> be sure, I wanted to test solr 3.1 to see if it alleviates the
> >>> problems...I dont think it helped. The master index version and
> >>> generation are greater than the slave, still the slave replicates the
> >>> entire index form master (see replication admin screen output below).
> >>> Any idea why it would get the whole index everytime even in 3.1 or am
> >>> I misinterpreting the output ? However I must admit that 3.1 finished
> >>> the replication unlike 1.4.1 which would hang and be backed up for
> >>> ever.
> >>>
> >>> Master        http://masterurl:post/solr-admin/searchcore/replication
> >>>       Latest Index Version:null, Generation: null
> >>>       Replicatable Index Version:1296217097572, Generation: 12726
> >>>
> >>> Poll Interval         00:03:00
> >>>
> >>> Local Index   Index Version: 1296217097569, Generation: 12725
> >>>
> >>>       Location: /data/solr/core/search-data/index
> >>>       Size: 944.32 MB
> >>>       Times Replicated Since Startup: 148
> >>>       Previous Replication Done At: Tue May 10 12:32:42 EDT 2011
> >>>       Config Files Replicated At: null
> >>>       Config Files Replicated: null
> >>>       Times Config Files Replicated Since Startup: null
> >>>       Next Replication Cycle At: Tue May 10 12:35:41 EDT 2011
> >>>
> >>> Current Replication Status    Start Time: Tue May 10 12:32:41 EDT 2011
> >>>       Files Downloaded: 18 / 108
> >>>       Downloaded: 317.48 KB / 436.24 MB [0.0%]
> >>>       Downloading File: _ayu.nrm, Downloaded: 4 bytes / 4 bytes [100.0%]
> >>>       Time Elapsed: 17s, Estimated Time Remaining: 23902s, Speed: 18.67 
> >>> KB/s
> >>>
> >>>
> >>> Thanks,
> >>> Ravi Kiran Bhaskar
> >>>
> >>> On Tue, May 10, 2011 at 4:10 AM, Alexander Kanarsky
> >>> <alexan...@trulia.com> wrote:
> >>> > Ravi,
> >>> >
> >>> > as far as I remember, this is how the replication logic works (see
> >>> > SnapPuller class, fetchLatestIndex method):
> >>> >
> >>> >> 1. Does the Slave get the whole index every time during replication or
> >>> >> just the delta since the last replication happened ?
> >>> >
> >>> >
> >>> > It look at the index version AND the index generation. If both slave's
> >>> > version and generation are the same as on master, nothing gets
> >>> > replicated. if the master's generation is greater than on slave, the
> >>> > slave fetches the delta files only (even if the partial merge was done
> >>> > on the master) and put the new files from master to the same index
> >>> > folder on slave (either index or index.<timestamp>, see further
> >>> > explanation). However, if the master's index generation is equals or
> >>> > less than one on slave, the slave does the full replication by
> >>> > fetching all files of the master's index and place them into a
> >>> > separate folder on slave (index.<timestamp>). Then, if the fetch is
> >>> > successfull, the slave updates (or creates) the index.properties file
> >>> > and puts there the name of the "current" index folder. The "old"
> >>> > index.<timestamp> folder(s) will be kept in 1.4.x - which was treated
> >>> > as a bug - see SOLR-2156 (and this was fixed in 3.1). After this, the
> >>> > slave does commit or reload core depending whether the config files
> >>> > were replicated. There is another bug in 1.4.x that fails replication
> >>> > if the slave need to do the full replication AND the config files were
> >>> > changed - also fixed in 3.1 (see SOLR-1983).
> >>> >
> >>> >> 2. If there are huge number of queries being done on slave will it
> >>> >> affect the replication ? How can I improve the performance ? (see the
> >>> >> replications details at he bottom of the page)
> >>> >
> >>> >
> >>> > >From my experience the half of the replication time is a time when the
> >>> > transferred data flushes to the disk. So the IO impact is important.
> >>> >
> >>> >> 3. Will the segment names be same be same on master and slave after
> >>> >> replication ? I see that they are different. Is this correct ? If it
> >>> >> is correct how does the slave know what to fetch the next time i.e.
> >>> >> the delta.
> >>> >
> >>> >
> >>> > They should be the same. The slave fetches the changed files only (see
> >>> > above), also look at SnapPuller code.
> >>> >
> >>> >> 4. When and why does the index.<TIMESTAMP> folder get created ? I see
> >>> >> this type of folder getting created only on slave and the slave
> >>> >> instance is pointing to it.
> >>> >
> >>> >
> >>> > See above.
> >>> >
> >>> >> 5. Does replication process copy both the index and index.<TIMESTAMP>
> >>> > folder ?
> >>> >
> >>> >
> >>> > index.<timestamp> folder gets created only of the full replication
> >>> > happened at least once. Otherwise, the slave will use the index
> >>> > folder.
> >>> >
> >>> >> 6. what happens if the replication kicks off even before the previous
> >>> >> invocation has not completed ? will the 2nd invocation block or will
> >>> >> it go through causing more confusion ?
> >>> >
> >>> >
> >>> > There is a lock (snapPullLock in ReplicationHandler) that prevents two
> >>> > replications run simultaneously. If there is no bug, it should just
> >>> > return silently from the replication call. (I personally never had
> >>> > problem with this so it looks there is no bug :)
> >>> >
> >>> >> 7. If I have to prep a new master-slave combination is it OK to copy
> >>> >> the respective contents into the new master-slave and start solr ? or
> >>> >> do I have have to wipe the new slave and let it replicate from its new
> >>> >> master ?
> >>> >
> >>> >
> >>> > If the new master has a different index, the slave will create a new
> >>> > <index.timestamp> folder. There is no need to wipe it.
> >>> >
> >>> >> 8. Doing an 'ls | wc -l' on index folder of master and slave gave 194
> >>> >> and 17968 respectively...I slave has lot of segments_xxx files. Is
> >>> >> this normal ?
> >>> >
> >>> >
> >>> > No, it looks like in your case the slave continues to replicate to the
> >>> > same folder for a long time period but the old files are not getting
> >>> > deleted bu some reason. Try to restart the slave or do core reload on
> >>> > it to see if the old segments gone.
> >>> >
> >>> > -Alexander
> >>> >
> >>> >
> >>
> >>
> >>
> >


Reply via email to