Also thanks for the information Shawn! :) On Mon, Nov 5, 2018 at 12:09 PM Ash Ramesh <ash...@canva.com> wrote:
> Sorry Shawn, > > I seem to have gotten my wording wrong. I meant that we wanted to move > away from atomic-updates to replacing/reindexing the document entirely > again when changes are made. > https://lucene.apache.org/solr/guide/7_5/uploading-data-with-index-handlers.html#adding-documents > > Regards, > > Ash > > On Mon, Nov 5, 2018 at 11:29 AM Shawn Heisey <apa...@elyograg.org> wrote: > >> On 11/3/2018 9:45 PM, Ash Ramesh wrote: >> > My company currently uses SOLR to completely hydrate client objects by >> > storing all fields (stored=true). Therefore we have 2 types of fields: >> > >> > 1. indexed=true | stored=true : For fields that will be used for >> > searching, sorting, etc. >> > 2. indexed=false | stored=true: For fields that only need hydrating >> for >> > clients >> > >> > We are re-architecting this so that we will eventually only get the id >> from >> > SOLR (fl=id) and hydrate from another data source. This means we can >> > obviously delete all the indexed=false | stored=true fields to reduce >> our >> > index size. >> > >> > However, when it comes to the indexed=true | stored=true fields, we are >> not >> > sure whether to also set them to be stored=false and perform in-place >> > updates or leave it as is and perform atomic updates. We've done a fair >> bit >> > of research on the archives of this mailing list, but are still a bit >> > confused: >> > >> > 1. Will having the fields be converted from indexed=true | stored=true >> -> >> > indexed=true | stored=false cause our index size to reduce? Will it also >> > mean that indexing will be less compute expensive due to the >> compression of >> > stored field logic? >> >> Pretty much anything you change from true to false in the schema will >> reduce index size. >> >> Removal of stored data will not *directly* improve query speed -- stored >> data is not used during the query phase. It might *indirectly* increase >> query speed by removing data from the OS disk cache, leaving more room >> for inverted index data. >> >> The direct improvement from removing stored data will be during data >> retrieval (after the query itself). It will also mean there is less >> data to compress, which means that indexing speed might increase. >> >> > 2. Are atomic updates preferred to in-place updates? Obviously if we >> move >> > to index only fields, then we have to do in-place updates all the time. >> > This isn't an issue for us, but we are a bit concerned about how SOLR's >> > indexing speed will suffer & deleted docs increase. Currently we perform >> > both. >> >> If you change stored to false, you will most likely not be able to do >> atomic updates. Atomic update functionality has very specific >> requirements: >> >> >> https://lucene.apache.org/solr/guide/7_5/updating-parts-of-documents.html#field-storage >> >> In-place updates have requirements that are even more strict than atomic >> updates -- the field cannot be indexed: >> >> >> https://lucene.apache.org/solr/guide/7_5/updating-parts-of-documents.html#in-place-updates >> >> Thanks, >> Shawn >> >> -- *P.S. We've launched a new blog to share the latest ideas and case studies from our team. Check it out here: product.canva.com <http://product.canva.com/>. *** ** <https://canva.com>Empowering the world to design Also, we're hiring. Apply here! <https://about.canva.com/careers/> <https://twitter.com/canva> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> <https://instagram.com/canva>