Re: Best practices to rebuild index on live system

Shawn Heisey Thu, 11 Nov 2010 23:10:04 -0800


On 11/11/2010 4:45 PM, Robert Gründler wrote:

So far, i can only think of 2 scenarios for rebuilding the index, if we need to 
update the schema after the rollout:


1. Create 3 more cores (A1,B1,C1) - Import the data from the database - After 
importing, switch the application to cores A1, B1, C1

This will most likely cause a corrupt index, as in the 1.5 hours of indexing, 
the database might get inserts/updates/deletes.

2. Put the Livesystem in a Read-Only mode and rebuild the index during that 
time. This will ensure data integrity in the index, with the drawback for users 
not being
able to write to the app.

I can tell you how we handle this. The actual build system is morecomplicated than I have mentioned here, involving replication and errorhandling, but this is the basic idea. This isn't the only possibleapproach, but it does work.

I have 6 main static shards and one incremental shard, each on their ownmachine (Xen VM, actually). Data is distributed by taking the Did value(primary key in the database) and doing a "mod 6" on it, the resultingvalue is the static shard number.

The system tracks two values at all times - minDid and maxDid. Thestatic shards have Did values <= minDid. The incremental is > minDidand <= maxDid. Once an hour, I write the current Did value to an RRD.Once a day, I use that RRD to figure out the Did value corresponding toone week ago. All documents > minDid and <= newMinDid aredelta-imported into the static indexes and deleted from the incrementalindex, and minDid is updated.

When it comes time to rebuild, I first rebuild the static indexes in acore named "build" which takes 5-6 hours. When that's done, I rebuildthe incremental in its build core, which only takes about 10 minutes.Then on all the machines, I swap the build and live cores. While allthe static builds are happening, the incremental continues to get newcontent, until it too is rebuilt.


Shawn

Re: Best practices to rebuild index on live system

Reply via email to