Alexander, sorry for the delay in replying. I wanted to test out a few
hunches that I had before I get back to you.
Hurray!!!  I was able to resolve the issue. The problem was with the
cache settings in the solrconfig.xml. It was taking almost 15-20
minutes to warm up the caches on each commit, as we are commit heavy
(every 5 minutes) the replication was screaming for the new searcher
to be warmed and it would never get a chance to finish so it was
perennially backed up. We reduced the cache and autowarm counts and
now the replication is happy finishing within 20 seconds!! Thank you
again for all your support.

Thanks,

Ravi Kiran Bhaskar
The Washington Post
1150 15th St. NW
Washington, DC 20071

On Sun, May 15, 2011 at 3:12 AM, Alexander Kanarsky
<alexan...@trulia.com> wrote:
> Ravi,
>
> what is the replication configuration on both master and slave?
> Also could you list of files in the index folder on master and slave
> before and after the replication?
>
> -Alexander
>
>
> On Fri, 2011-05-13 at 18:34 -0400, Ravi Solr wrote:
>> Sorry guys spoke too soon I guess. The replication still remains very
>> slow even after upgrading to 3.1 and setting the compression off. Now
>> Iam totally clueless. I have tried everything that I know of to
>> increase the speed of replication but failed. if anybody faced the
>> same issue, can you please tell me how you solved it.
>>
>> Ravi Kiran Bhaskar
>>
>> On Thu, May 12, 2011 at 6:42 PM, Ravi Solr <ravis...@gmail.com> wrote:
>> > Thank you Mr. Bell and Mr. Kanarsky, as per your advise we have moved
>> > from 1.4.1 to 3.1 and have made several changes to configuration. The
>> > configuration changes have worked nicely till now and the replication
>> > is finishing within the interval and not backing up. The changes we
>> > made are as follows
>> >
>> > 1. Increased the mergeFactor from 10 to 15
>> > 2. Increased ramBufferSizeMB to 1024
>> > 3. Changed lockType to single (previously it was simple)
>> > 4. Set maxCommitsToKeep to 1 in the deletionPolicy
>> > 5. Set maxPendingDeletes to 0
>> > 6. Changed caches from LRUCache to FastLRUCache as we had hit ratios
>> > well over 75% to increase warming speed
>> > 7. Increased the poll interval to 6 minutes and re-indexed all content.
>> >
>> > Thanks,
>> >
>> > Ravi Kiran Bhaskar
>> >
>> > On Wed, May 11, 2011 at 6:00 PM, Alexander Kanarsky
>> > <alexan...@trulia.com> wrote:
>> >> Ravi,
>> >>
>> >> if you have what looks like a full replication each time even if the
>> >> master generation is greater than slave, try to watch for the index on
>> >> both master and slave the same time to see what files are getting
>> >> replicated. You probably may need to adjust your merge factor, as Bill
>> >> mentioned.
>> >>
>> >> -Alexander
>> >>
>> >>
>> >>
>> >> On Tue, 2011-05-10 at 12:45 -0400, Ravi Solr wrote:
>> >>> Hello Mr. Kanarsky,
>> >>>                 Thank you very much for the detailed explanation,
>> >>> probably the best explanation I found regarding replication. Just to
>> >>> be sure, I wanted to test solr 3.1 to see if it alleviates the
>> >>> problems...I dont think it helped. The master index version and
>> >>> generation are greater than the slave, still the slave replicates the
>> >>> entire index form master (see replication admin screen output below).
>> >>> Any idea why it would get the whole index everytime even in 3.1 or am
>> >>> I misinterpreting the output ? However I must admit that 3.1 finished
>> >>> the replication unlike 1.4.1 which would hang and be backed up for
>> >>> ever.
>> >>>
>> >>> Master        http://masterurl:post/solr-admin/searchcore/replication
>> >>>       Latest Index Version:null, Generation: null
>> >>>       Replicatable Index Version:1296217097572, Generation: 12726
>> >>>
>> >>> Poll Interval         00:03:00
>> >>>
>> >>> Local Index   Index Version: 1296217097569, Generation: 12725
>> >>>
>> >>>       Location: /data/solr/core/search-data/index
>> >>>       Size: 944.32 MB
>> >>>       Times Replicated Since Startup: 148
>> >>>       Previous Replication Done At: Tue May 10 12:32:42 EDT 2011
>> >>>       Config Files Replicated At: null
>> >>>       Config Files Replicated: null
>> >>>       Times Config Files Replicated Since Startup: null
>> >>>       Next Replication Cycle At: Tue May 10 12:35:41 EDT 2011
>> >>>
>> >>> Current Replication Status    Start Time: Tue May 10 12:32:41 EDT 2011
>> >>>       Files Downloaded: 18 / 108
>> >>>       Downloaded: 317.48 KB / 436.24 MB [0.0%]
>> >>>       Downloading File: _ayu.nrm, Downloaded: 4 bytes / 4 bytes [100.0%]
>> >>>       Time Elapsed: 17s, Estimated Time Remaining: 23902s, Speed: 18.67 
>> >>> KB/s
>> >>>
>> >>>
>> >>> Thanks,
>> >>> Ravi Kiran Bhaskar
>> >>>
>> >>> On Tue, May 10, 2011 at 4:10 AM, Alexander Kanarsky
>> >>> <alexan...@trulia.com> wrote:
>> >>> > Ravi,
>> >>> >
>> >>> > as far as I remember, this is how the replication logic works (see
>> >>> > SnapPuller class, fetchLatestIndex method):
>> >>> >
>> >>> >> 1. Does the Slave get the whole index every time during replication or
>> >>> >> just the delta since the last replication happened ?
>> >>> >
>> >>> >
>> >>> > It look at the index version AND the index generation. If both slave's
>> >>> > version and generation are the same as on master, nothing gets
>> >>> > replicated. if the master's generation is greater than on slave, the
>> >>> > slave fetches the delta files only (even if the partial merge was done
>> >>> > on the master) and put the new files from master to the same index
>> >>> > folder on slave (either index or index.<timestamp>, see further
>> >>> > explanation). However, if the master's index generation is equals or
>> >>> > less than one on slave, the slave does the full replication by
>> >>> > fetching all files of the master's index and place them into a
>> >>> > separate folder on slave (index.<timestamp>). Then, if the fetch is
>> >>> > successfull, the slave updates (or creates) the index.properties file
>> >>> > and puts there the name of the "current" index folder. The "old"
>> >>> > index.<timestamp> folder(s) will be kept in 1.4.x - which was treated
>> >>> > as a bug - see SOLR-2156 (and this was fixed in 3.1). After this, the
>> >>> > slave does commit or reload core depending whether the config files
>> >>> > were replicated. There is another bug in 1.4.x that fails replication
>> >>> > if the slave need to do the full replication AND the config files were
>> >>> > changed - also fixed in 3.1 (see SOLR-1983).
>> >>> >
>> >>> >> 2. If there are huge number of queries being done on slave will it
>> >>> >> affect the replication ? How can I improve the performance ? (see the
>> >>> >> replications details at he bottom of the page)
>> >>> >
>> >>> >
>> >>> > >From my experience the half of the replication time is a time when the
>> >>> > transferred data flushes to the disk. So the IO impact is important.
>> >>> >
>> >>> >> 3. Will the segment names be same be same on master and slave after
>> >>> >> replication ? I see that they are different. Is this correct ? If it
>> >>> >> is correct how does the slave know what to fetch the next time i.e.
>> >>> >> the delta.
>> >>> >
>> >>> >
>> >>> > They should be the same. The slave fetches the changed files only (see
>> >>> > above), also look at SnapPuller code.
>> >>> >
>> >>> >> 4. When and why does the index.<TIMESTAMP> folder get created ? I see
>> >>> >> this type of folder getting created only on slave and the slave
>> >>> >> instance is pointing to it.
>> >>> >
>> >>> >
>> >>> > See above.
>> >>> >
>> >>> >> 5. Does replication process copy both the index and index.<TIMESTAMP>
>> >>> > folder ?
>> >>> >
>> >>> >
>> >>> > index.<timestamp> folder gets created only of the full replication
>> >>> > happened at least once. Otherwise, the slave will use the index
>> >>> > folder.
>> >>> >
>> >>> >> 6. what happens if the replication kicks off even before the previous
>> >>> >> invocation has not completed ? will the 2nd invocation block or will
>> >>> >> it go through causing more confusion ?
>> >>> >
>> >>> >
>> >>> > There is a lock (snapPullLock in ReplicationHandler) that prevents two
>> >>> > replications run simultaneously. If there is no bug, it should just
>> >>> > return silently from the replication call. (I personally never had
>> >>> > problem with this so it looks there is no bug :)
>> >>> >
>> >>> >> 7. If I have to prep a new master-slave combination is it OK to copy
>> >>> >> the respective contents into the new master-slave and start solr ? or
>> >>> >> do I have have to wipe the new slave and let it replicate from its new
>> >>> >> master ?
>> >>> >
>> >>> >
>> >>> > If the new master has a different index, the slave will create a new
>> >>> > <index.timestamp> folder. There is no need to wipe it.
>> >>> >
>> >>> >> 8. Doing an 'ls | wc -l' on index folder of master and slave gave 194
>> >>> >> and 17968 respectively...I slave has lot of segments_xxx files. Is
>> >>> >> this normal ?
>> >>> >
>> >>> >
>> >>> > No, it looks like in your case the slave continues to replicate to the
>> >>> > same folder for a long time period but the old files are not getting
>> >>> > deleted bu some reason. Try to restart the slave or do core reload on
>> >>> > it to see if the old segments gone.
>> >>> >
>> >>> > -Alexander
>> >>> >
>> >>> >
>> >>
>> >>
>> >>
>> >
>
>
>

Reply via email to