Re: Questions about stored fields and updates.

Ash Ramesh Sun, 04 Nov 2018 17:10:12 -0800

Sorry Shawn,

I seem to have gotten my wording wrong. I meant that we wanted to move away
from atomic-updates to replacing/reindexing the document entirely again
when changes are made.
https://lucene.apache.org/solr/guide/7_5/uploading-data-with-index-handlers.html#adding-documents


Regards,

Ash

On Mon, Nov 5, 2018 at 11:29 AM Shawn Heisey <apa...@elyograg.org> wrote:

> On 11/3/2018 9:45 PM, Ash Ramesh wrote:
> > My company currently uses SOLR to completely hydrate client objects by
> > storing all fields (stored=true). Therefore we have 2 types of fields:
> >
> >     1. indexed=true | stored=true : For fields that will be used for
> >     searching, sorting, etc.
> >     2. indexed=false | stored=true: For fields that only need hydrating
> for
> >     clients
> >
> > We are re-architecting this so that we will eventually only get the id
> from
> > SOLR (fl=id) and hydrate from another data source. This means we can
> > obviously delete all the indexed=false | stored=true fields to reduce our
> > index size.
> >
> > However, when it comes to the indexed=true | stored=true fields, we are
> not
> > sure whether to also set them to be stored=false and perform in-place
> > updates or leave it as is and perform atomic updates. We've done a fair
> bit
> > of research on the archives of this mailing list, but are still a bit
> > confused:
> >
> > 1. Will having the fields be converted from indexed=true | stored=true ->
> > indexed=true | stored=false cause our index size to reduce? Will it also
> > mean that indexing will be less compute expensive due to the compression
> of
> > stored field logic?
>
> Pretty much anything you change from true to false in the schema will
> reduce index size.
>
> Removal of stored data will not *directly* improve query speed -- stored
> data is not used during the query phase.  It might *indirectly* increase
> query speed by removing data from the OS disk cache, leaving more room
> for inverted index data.
>
> The direct improvement from removing stored data will be during data
> retrieval (after the query itself).  It will also mean there is less
> data to compress, which means that indexing speed might increase.
>
> > 2. Are atomic updates preferred to in-place updates? Obviously if we move
> > to index only fields, then we have to do in-place updates all the time.
> > This isn't an issue for us, but we are a bit concerned about how SOLR's
> > indexing speed will suffer & deleted docs increase. Currently we perform
> > both.
>
> If you change stored to false, you will most likely not be able to do
> atomic updates.  Atomic update functionality has very specific
> requirements:
>
>
> https://lucene.apache.org/solr/guide/7_5/updating-parts-of-documents.html#field-storage
>
> In-place updates have requirements that are even more strict than atomic
> updates -- the field cannot be indexed:
>
>
> https://lucene.apache.org/solr/guide/7_5/updating-parts-of-documents.html#in-place-updates
>
> Thanks,
> Shawn
>
>

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
<http://product.canva.com/>. ***
** <https://canva.com>Empowering the world 
to design
Also, we're hiring. Apply here! 
<https://about.canva.com/careers/>
 <https://twitter.com/canva> 
<https://facebook.com/canva> <https://au.linkedin.com/company/canva> 
<https://instagram.com/canva>

Re: Questions about stored fields and updates.

Reply via email to