Sorry, I should have given more background. We have, at the moment 3.8 million documents of 0.7MB/doc average so we have extremely large shards. We build about 400,000 documents to a shard resulting 200GB/shard. We are also using LVM snapshots to manage a snapshot of the shard which we serve while we continue to build.

In order to optimize the building shard of around 200GB we need 400GB of disk space to allow for 2x size increase. Due to the nature of snapshotting, the volume containing the snapshot has to be as large as the build volume, i.e. 400GB.

If we could write the optimized build shard elsewhere instead of "in place" we could avoid the need for the serving volume to match the size of the building volume.

We'd like to avoid the need to have 200GB+ hanging around just to optimize.

Responses we got on whether writing "elsewhere" optimize make it clear that's not a solution.

I posted another question to the list just a bit ago asking whether mergefactor=1 would give us a single segment index that is always optimized so that we don't have the 2x overhead.

However, running a build with merge factor=1 shows that lots of segments get created/merged and that the index grows in size but shrinks at intervals to a degree too. It is not clear how big the index is at any point in time.


Chris Hostetter wrote:
: Is it possible to tell Solr or Lucene, when optimizing, to write the files
: that constitute the optimized index to somewhere other than
: SOLR_HOME/data/index or is there something about the optimize that requires
: the final segment to be created in SOLR_HOME/data/index?

        For what purpose?

http://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss


Reply via email to