I am using solrj to index documents i agree with you regarding the index update but i should not see any deleted documents as it is a fresh index. Can we actually identify what are those deleted documents?
if there is no option of adding shards to existing collection i do not like the idea of re indexing the whole data (worth hours) and we have gone with good number of shards but there is a rapid increase of size in data over the past few days, do you think is it worth logging a ticket? On Sat, Aug 1, 2015 at 5:04 PM, Upayavira <u...@odoko.co.uk> wrote: > > > On Sat, Aug 1, 2015, at 10:30 PM, naga sharathrayapati wrote: > > I have an exception with one of the document after indexing 6 mil > > documents > > out of 10 mil, is there any way i can avoid re indexing the 6 mil > > documents? > > How are you indexing your documents? Are you using the DIH? Personally, > I'd recommend you write your own app to push your content to Solr, then > you will be able to control exceptions more precisely and have the > behaviour you expect. > > > I also see that there are few documents that are deleted (based on the > > count) while indexing, is there a way to identify what are those > > documents? > > If you see deleted documents but are not actually deleting any, this > will be because you have updated documents with an existing ID. An > update is actually a delete followed by an insert. > > > can i add shard to a collection without re indexing? > > You cannot just add a new shard to an existing collection (at least, one > that is using the compositeId router (the default). If a shard is too > large, you will need to split an existing shard, which you can do with > the collections API. > > It is much better though, to start with the right number of shards if at > all possible. > > Upayavira >