Re: Questions about stored fields and updates.

Ash Ramesh Sun, 04 Nov 2018 17:11:11 -0800

Also thanks for the information Shawn! :)

On Mon, Nov 5, 2018 at 12:09 PM Ash Ramesh <ash...@canva.com> wrote:


> Sorry Shawn,
>
> I seem to have gotten my wording wrong. I meant that we wanted to move
> away from atomic-updates to replacing/reindexing the document entirely
> again when changes are made.
> https://lucene.apache.org/solr/guide/7_5/uploading-data-with-index-handlers.html#adding-documents
>
> Regards,
>
> Ash
>
> On Mon, Nov 5, 2018 at 11:29 AM Shawn Heisey <apa...@elyograg.org> wrote:
>
>> On 11/3/2018 9:45 PM, Ash Ramesh wrote:
>> > My company currently uses SOLR to completely hydrate client objects by
>> > storing all fields (stored=true). Therefore we have 2 types of fields:
>> >
>> >     1. indexed=true | stored=true : For fields that will be used for
>> >     searching, sorting, etc.
>> >     2. indexed=false | stored=true: For fields that only need hydrating
>> for
>> >     clients
>> >
>> > We are re-architecting this so that we will eventually only get the id
>> from
>> > SOLR (fl=id) and hydrate from another data source. This means we can
>> > obviously delete all the indexed=false | stored=true fields to reduce
>> our
>> > index size.
>> >
>> > However, when it comes to the indexed=true | stored=true fields, we are
>> not
>> > sure whether to also set them to be stored=false and perform in-place
>> > updates or leave it as is and perform atomic updates. We've done a fair
>> bit
>> > of research on the archives of this mailing list, but are still a bit
>> > confused:
>> >
>> > 1. Will having the fields be converted from indexed=true | stored=true
>> ->
>> > indexed=true | stored=false cause our index size to reduce? Will it also
>> > mean that indexing will be less compute expensive due to the
>> compression of
>> > stored field logic?
>>
>> Pretty much anything you change from true to false in the schema will
>> reduce index size.
>>
>> Removal of stored data will not *directly* improve query speed -- stored
>> data is not used during the query phase.  It might *indirectly* increase
>> query speed by removing data from the OS disk cache, leaving more room
>> for inverted index data.
>>
>> The direct improvement from removing stored data will be during data
>> retrieval (after the query itself).  It will also mean there is less
>> data to compress, which means that indexing speed might increase.
>>
>> > 2. Are atomic updates preferred to in-place updates? Obviously if we
>> move
>> > to index only fields, then we have to do in-place updates all the time.
>> > This isn't an issue for us, but we are a bit concerned about how SOLR's
>> > indexing speed will suffer & deleted docs increase. Currently we perform
>> > both.
>>
>> If you change stored to false, you will most likely not be able to do
>> atomic updates.  Atomic update functionality has very specific
>> requirements:
>>
>>
>> https://lucene.apache.org/solr/guide/7_5/updating-parts-of-documents.html#field-storage
>>
>> In-place updates have requirements that are even more strict than atomic
>> updates -- the field cannot be indexed:
>>
>>
>> https://lucene.apache.org/solr/guide/7_5/updating-parts-of-documents.html#in-place-updates
>>
>> Thanks,
>> Shawn
>>
>>

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
<http://product.canva.com/>. ***
** <https://canva.com>Empowering the world 
to design
Also, we're hiring. Apply here! 
<https://about.canva.com/careers/>
 <https://twitter.com/canva> 
<https://facebook.com/canva> <https://au.linkedin.com/company/canva> 
<https://instagram.com/canva>

Re: Questions about stored fields and updates.

Reply via email to