Hi Erick, Your right I think. On resources we gain a little bit on: disk (a production implementation with live data would be 500 mb saved in disk usage on each slave and master) some reduction in network traffic on replication (we do a full re-index every 24 hours at present)
On design we gain a little by being able to support searches at various document levels (perform a destination search or hotel search and return documents at the "correct" level for the search with out the need to perform field collapsing) But in the cold light of day I don't think we gain huge amounts. (leaving aside the index replication of a full index) cheers lee c On 23 October 2011 19:05, Erick Erickson <erickerick...@gmail.com> wrote: > Yes, stored fields are placed verbatim for every doc. But I wonder > at the utility of trying to share stored information. The stored > info is put in certain files in the index, see: > http://lucene.apache.org/java/3_0_2/fileformats.html#file-names > > and the files that store data are pretty much irrelevant to searching, > the data in them is only referenced when assembling the document > for return. So by adding this complexity you'll be saving a bit > on file transfers when replicating your index, but not much else. > > Is it worth it? If so, why? > > Best > Erick > > On Mon, Oct 17, 2011 at 11:07 AM, lee carroll > <lee.a.carr...@googlemail.com> wrote: >> Just as a follow up >> >> it looks like stored fields are stored verbatim for every doc. >> >> hotel index and store dest attributes >> index size: 131M >> number of records 49147 >> >> hotel index only dest attributes >> >> index size: 111m >> number of records 49147 >> >> >> ~400 chars(bytes) of destination data * 49147 (number of hotel docs) = ~19m >> >> basically everything is being stored >> >> No difference in time to index (very rough and not scientific :-) ) >> >> So it does seem an ok strategy to denormalise docs with index fields >> but normalise with stored fields ? >> Or have i missed some problems with this ? >> >> cheers lee c >> >> >> >> On 16 October 2011 11:54, lee carroll <lee.a.carr...@googlemail.com> wrote: >>> Hi Chris thanks for the response >>> >>>> It's an inverted index, so *tems* exist once (per segment) and those terms >>>> "point" to the documents -- so having the same terms (in the same fields) >>>> for multiple types of documents in one index is going to take up less >>>> overall space then having distinct collections for each type of document. >>> >>> I'm not asking about the indexed terms but rather the stored values. >>> By having two doc types are we gaining anything by "storing" >>> attributes only for that doc type >>> >>> cheers lee c >>> >> >