On 11/3/2018 9:45 PM, Ash Ramesh wrote:
My company currently uses SOLR to completely hydrate client objects by
storing all fields (stored=true). Therefore we have 2 types of fields:

    1. indexed=true | stored=true : For fields that will be used for
    searching, sorting, etc.
    2. indexed=false | stored=true: For fields that only need hydrating for
    clients

We are re-architecting this so that we will eventually only get the id from
SOLR (fl=id) and hydrate from another data source. This means we can
obviously delete all the indexed=false | stored=true fields to reduce our
index size.

However, when it comes to the indexed=true | stored=true fields, we are not
sure whether to also set them to be stored=false and perform in-place
updates or leave it as is and perform atomic updates. We've done a fair bit
of research on the archives of this mailing list, but are still a bit
confused:

1. Will having the fields be converted from indexed=true | stored=true ->
indexed=true | stored=false cause our index size to reduce? Will it also
mean that indexing will be less compute expensive due to the compression of
stored field logic?

Pretty much anything you change from true to false in the schema will reduce index size.

Removal of stored data will not *directly* improve query speed -- stored data is not used during the query phase.  It might *indirectly* increase query speed by removing data from the OS disk cache, leaving more room for inverted index data.

The direct improvement from removing stored data will be during data retrieval (after the query itself).  It will also mean there is less data to compress, which means that indexing speed might increase.

2. Are atomic updates preferred to in-place updates? Obviously if we move
to index only fields, then we have to do in-place updates all the time.
This isn't an issue for us, but we are a bit concerned about how SOLR's
indexing speed will suffer & deleted docs increase. Currently we perform
both.

If you change stored to false, you will most likely not be able to do atomic updates.  Atomic update functionality has very specific requirements:

https://lucene.apache.org/solr/guide/7_5/updating-parts-of-documents.html#field-storage

In-place updates have requirements that are even more strict than atomic updates -- the field cannot be indexed:

https://lucene.apache.org/solr/guide/7_5/updating-parts-of-documents.html#in-place-updates

Thanks,
Shawn

Reply via email to