Re: Avoid re indexing

naga sharathrayapati Sat, 01 Aug 2015 15:29:53 -0700

I am using solrj to index documents

i agree with you regarding the index update but i should not see any
deleted documents as it is a fresh index. Can we actually identify what are
those deleted documents?


if there is no option of adding shards to existing collection i do not like
the idea of re indexing the whole data (worth hours) and we have gone with
good number of shards but there is a rapid increase of size in data over
the past few days, do you think is it worth logging a ticket?

On Sat, Aug 1, 2015 at 5:04 PM, Upayavira <u...@odoko.co.uk> wrote:

>
>
> On Sat, Aug 1, 2015, at 10:30 PM, naga sharathrayapati wrote:
> > I have an exception with one of the document after indexing 6 mil
> > documents
> > out of 10 mil, is there any way i can avoid re indexing the 6 mil
> > documents?
>
> How are you indexing your documents? Are you using the DIH? Personally,
> I'd recommend you write your own app to push your content to Solr, then
> you will be able to control exceptions more precisely and have the
> behaviour you expect.
>
> > I also see that there are few documents that are deleted (based on the
> > count) while indexing, is there a way to identify what are those
> > documents?
>
> If you see deleted documents but are not actually deleting any, this
> will be because you have updated documents with an existing ID. An
> update is actually a delete followed by an insert.
>
> > can i add shard to a collection without re indexing?
>
> You cannot just add a new shard to an existing collection (at least, one
> that is using the compositeId router (the default). If a shard is too
> large, you will need to split an existing shard, which you can do with
> the collections API.
>
> It is much better though, to start with the right number of shards if at
> all possible.
>
> Upayavira
>

Re: Avoid re indexing

Reply via email to