Upon further investigation of this, I believe this is potentially quite a serious situation for remplication servers.
In SOLR-561, the 'create a new index.<time> folder' concept was introduced mainly, as I understand it, because Windows locks files/folders that are in use. I'm not sure why this is a problem, given that it is only the Solr process itself that is 'using' these, so the file handles can simply be closed (by terminating fSyncService or similar), then delete, then carry on. This is somewhat by-the-by, as the code is out there now. The real issue that remains is that whenever the slave feels it needs to do a full copy, any existing index folders are left behind. For large indexes and/or long-running slaves, this is a path to disk starvation. >From the admittedly little I know about SnapPuller, I've come up with 2 possible solutions: 1. Change the inherent behaviour as described above, so that only 1 index folder is ever used (i.e. /index unless an explicit index.properties is specified). 2. Add an optional parameter that tells the SnapPuller to delete all index* folders in dataDir except the new 'live' one. I've modified SnapPuller in our test environment for Option 2, and this works very well. This takes an optional str parameter in solrconfig.xml /replication as: {{<str name="autoCleanOldIndexes">true</str>}} This parameter is really only relevant for slaves, but maybe there's a use case for masters. This option is a little bit 'brute force'ish, and not as elegant as Option 1, but it does have the advantage of being completely transparent if {{autoCleanOldIndexes}} is not specified. If the experts in this area feel it is worthwhile, I can create a JIRA issue for this and associated SnapPuller patch. Comments most welcome. Thanks, Peter On Sat, Apr 3, 2010 at 3:32 PM, Peter Sturge <peter.stu...@googlemail.com>wrote: > Hi, > > I've got a question regarding remplication index.<timestamp> folders - can > anyone help? > > Note: There is a somewhat related thread here: > > http://www.lucidimagination.com/search/document/15a740cca17eed56/solr_1_4_replication_index_directories#e4b0af2f321204d7 > > I have a remplica that is pushed fetchindex commands on a periodic basis > when it's time to remplicate (i.e. remplication is managed by the server > application, not by remplica polling). > The master that is sending these fetchindex commands tells the remplica to > remplicate one of its cores, but which core it is changes over time. > This has the effect of the remplica periodically saying: 'oh, these files > are totally different, I'll create a brand new index.<timestamp> folder, > upload the master's files to it and reload'. On its own, this is absolutely > fine. > The problem is that any previous index folders are left lying around - i.e. > not deleted, so eventually (quickly for large indexes) the remplica runs out > of disk space. > > Is there a way to either tell the remplica to always 'reuse' the /index > folder (ideal) regardless of file name/content, or set its deletionPolicy or > similar so that it deletes any and all 'old' index.* folders and only keeps > the current one? > > > Many thanks, > Peter > > <forgive my anti-spam spelling> > >