Re: to handle expired documents: collection alias or delete by id query

Derek Poh Sun, 26 Mar 2017 20:43:37 -0700

Hi Tom

The moving alias design is interesting, will explore it.

Regarding themethod of creating the collection on a node for indexingonly and adding replicas of it to other nodes for queryinguponcompletionof indexing.Am I right to say this is used in conjunction with collection alias orthe moving alias you mentioned?



On 3/24/2017 10:23 PM, Tom Evans wrote:

On Thu, Mar 23, 2017 at 6:10 AM, Derek Poh <d...@globalsources.com> wrote:

Hi

I have collections of products. I am doing indexing 3-4 times daily.
Every day there are products that expired and I need to remove them from
these collectionsdaily.

Ican think of 2 ways to do this.
1. using collection aliasto switch between a main and temp collection.
- clear and index the temp collection
- create alias to temp collection.
- clear and index the main collection.
- create alias to main collection.

this way require additional collections.

Another way of doing this is to have a moving alias (not constantly
clearing the "temp" collection). If you reindex daily, your index
would be called "products_YYYYmmdd" with an alias to "products". The
advantage of this is that you can roll back to a previous version of
the index if there are problems, and each index is guaranteed to be
freshly created with no artifacts.

The biggest consideration for me would be how long indexing your full
corpus takes you. If you can do it in a small period of time, then
full indexes would be preferable. If it takes a very long time,
deleting is preferable.

If you are doing a cloud setup, full indexes are even more appealing.
You can create the new collection on a single node (even if sharded;
just place each shard on the same node). This would only place the
indexing cost on that one node, whilst other nodes would be unaffected
by indexing degrading regular query response time. You also don't have
to distribute the documents around the cluster. There is no
distributed indexing in Solr, each replica has to index each document
again, even if it is not the leader.

Once indexing is complete, you can expand the collection by adding
replicas of that shard on other nodes - perhaps even removing it from
the node that did the indexing. We have a node that solely does
indexing, before the collection is queried for anything it is added to
the querying nodes.

You can do this manually, or you can automate it using the collections API.

Cheers

Tom



----------------------

CONFIDENTIALITY NOTICEThis e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part.

This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: to handle expired documents: collection alias or delete by id query

Reply via email to