Short summary:

 * If I could make Solr merge oldest segments (or the one
   with the most deleted docs) rather than smallest
   segments; I think I'd almost never need "optimize".

 * Can I tell Solr to do this?  Or if not, can someone
   point me in the right direction regarding where I might
   patch it to try this myself?


I have a system where documents are refreshed and/or expired
pretty much in a FIFO manner.  In particular, no document
in the system can live for over 1 month.

Without frequent optimizes, ISTM my indexes tend to get
bloated with mostly deleted content.   I attached a ls-l
below - showing the largest segments in my index are all
from July.   A query of
   timestamp:([1999-01-01T00:00:00Z TO 2010-08-01T23:59:59Z])
returns no documents so it appears to me the first 2 segments
are entirely filled with deleted documents.

I imagine this is not too uncommon a situation -- for example
a web-crawler that periodically updates web pages that contain
some dynamic content.

Perhaps a different good criteria would be selecting to merge
the segments with the largest number of deleted documents.
In my case it'd be the same; but I could imagine non-FIFO
update-heavy systems where that would work better.




$ ls -lrt *.fdt
-rw-rw-r-- 1 ramayer ramayer 291490823897 Jul 20 21:34 _u63.fdt
-rw-rw-r-- 1 ramayer ramayer  78251326159 Jul 29 18:15 _xkh.fdt
-rw-rw-r-- 1 ramayer ramayer  69295141685 Aug  8 01:29 _10f5.fdt
-rw-rw-r-- 1 ramayer ramayer   5406369697 Aug 10 21:14 _13fv.fdt
-rw-rw-r-- 1 ramayer ramayer  66210508029 Aug 10 21:44 _13g1.fdt
-rw-rw-r-- 1 ramayer ramayer   2001873014 Aug 10 23:05 _13io.fdt
-rw-rw-r-- 1 ramayer ramayer   1578531820 Aug 11 14:10 _13m8.fdt
-rw-rw-r-- 1 ramayer ramayer   2254917604 Aug 12 03:49 _13p3.fdt
-rw-rw-r-- 1 ramayer ramayer   2890967852 Aug 12 06:49 _13s6.fdt
-rw-rw-r-- 1 ramayer ramayer   2820285238 Aug 12 09:49 _13v9.fdt
-rw-rw-r-- 1 ramayer ramayer   2905550377 Aug 12 12:52 _13yc.fdt
-rw-rw-r-- 1 ramayer ramayer   2776837514 Aug 12 15:54 _141f.fdt
-rw-rw-r-- 1 ramayer ramayer    259698816 Aug 12 16:15 _141p.fdt
-rw-rw-r-- 1 ramayer ramayer    290083173 Aug 12 16:34 _1420.fdt
-rw-rw-r-- 1 ramayer ramayer    279500106 Aug 12 16:54 _142b.fdt
-rw-rw-r-- 1 ramayer ramayer    277156197 Aug 12 17:17 _142m.fdt
-rw-rw-r-- 1 ramayer ramayer     91360010 Aug 13 00:27 _142x.fdt
-rw-rw-r-- 1 ramayer ramayer      7351514 Aug 13 00:37 _142y.fdt
-rw-rw-r-- 1 ramayer ramayer         7286 Aug 13 00:38 _142z.fdt
-rw-rw-r-- 1 ramayer ramayer           21 Aug 13 01:07 _1430.fdt
-rw-rw-r-- 1 ramayer ramayer           21 Aug 13 02:07 _1431.fdt
-rw-rw-r-- 1 ramayer ramayer           21 Aug 13 03:07 _1432.fdt
-rw-rw-r-- 1 ramayer ramayer           21 Aug 13 04:07 _1433.fdt
-rw-rw-r-- 1 ramayer ramayer      2388369 Aug 13 04:35 _1434.fdt
-rw-rw-r-- 1 ramayer ramayer           21 Aug 13 05:07 _1435.fdt
-rw-rw-r-- 1 ramayer ramayer           21 Aug 13 06:07 _1436.fdt
-rw-rw-r-- 1 ramayer ramayer           21 Aug 13 07:07 _1437.fdt
-rw-rw-r-- 1 ramayer ramayer           21 Aug 13 08:07 _1438.fdt
-rw-rw-r-- 1 ramayer ramayer           21 Aug 13 09:07 _1439.fdt
-rw-rw-r-- 1 ramayer ramayer           21 Aug 13 10:07 _143a.fdt
-rw-rw-r-- 1 ramayer ramayer       198581 Aug 13 11:04 _143b.fdt

Reply via email to