Sorry Shawn, I seem to have gotten my wording wrong. I meant that we wanted to move away from atomic-updates to replacing/reindexing the document entirely again when changes are made. https://lucene.apache.org/solr/guide/7_5/uploading-data-with-index-handlers.html#adding-documents
Regards, Ash On Mon, Nov 5, 2018 at 11:29 AM Shawn Heisey <apa...@elyograg.org> wrote: > On 11/3/2018 9:45 PM, Ash Ramesh wrote: > > My company currently uses SOLR to completely hydrate client objects by > > storing all fields (stored=true). Therefore we have 2 types of fields: > > > > 1. indexed=true | stored=true : For fields that will be used for > > searching, sorting, etc. > > 2. indexed=false | stored=true: For fields that only need hydrating > for > > clients > > > > We are re-architecting this so that we will eventually only get the id > from > > SOLR (fl=id) and hydrate from another data source. This means we can > > obviously delete all the indexed=false | stored=true fields to reduce our > > index size. > > > > However, when it comes to the indexed=true | stored=true fields, we are > not > > sure whether to also set them to be stored=false and perform in-place > > updates or leave it as is and perform atomic updates. We've done a fair > bit > > of research on the archives of this mailing list, but are still a bit > > confused: > > > > 1. Will having the fields be converted from indexed=true | stored=true -> > > indexed=true | stored=false cause our index size to reduce? Will it also > > mean that indexing will be less compute expensive due to the compression > of > > stored field logic? > > Pretty much anything you change from true to false in the schema will > reduce index size. > > Removal of stored data will not *directly* improve query speed -- stored > data is not used during the query phase. It might *indirectly* increase > query speed by removing data from the OS disk cache, leaving more room > for inverted index data. > > The direct improvement from removing stored data will be during data > retrieval (after the query itself). It will also mean there is less > data to compress, which means that indexing speed might increase. > > > 2. Are atomic updates preferred to in-place updates? Obviously if we move > > to index only fields, then we have to do in-place updates all the time. > > This isn't an issue for us, but we are a bit concerned about how SOLR's > > indexing speed will suffer & deleted docs increase. Currently we perform > > both. > > If you change stored to false, you will most likely not be able to do > atomic updates. Atomic update functionality has very specific > requirements: > > > https://lucene.apache.org/solr/guide/7_5/updating-parts-of-documents.html#field-storage > > In-place updates have requirements that are even more strict than atomic > updates -- the field cannot be indexed: > > > https://lucene.apache.org/solr/guide/7_5/updating-parts-of-documents.html#in-place-updates > > Thanks, > Shawn > > -- *P.S. We've launched a new blog to share the latest ideas and case studies from our team. Check it out here: product.canva.com <http://product.canva.com/>. *** ** <https://canva.com>Empowering the world to design Also, we're hiring. Apply here! <https://about.canva.com/careers/> <https://twitter.com/canva> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> <https://instagram.com/canva>