Dawg, I have a similar setup, and this is what works for me. I have a field which contains a timestamp. The timestamp is set to be identical for all documents added/updated in a run. Whe the run is complete and some/many documents have been overwritten then I can delete all un-updated documents easily: they have a previous timestamp. Cheers -- Rick
On October 31, 2017 7:54:22 AM EDT, "Emir Arnautović" <emir.arnauto...@sematext.com> wrote: >Hi, >There is a possibility that you ended up with documents with the same >ID and that you are overwriting docuements instead of writing new. > >In any case, I would suggest you change your approach in case you have >enough disk space to keep two copies of indices: >1. use alias to read data from index instead of index name >2. index data into new index >3. after verification (e.g. quick check would be number of docs) switch >alias to new index >4. keep old index available in case you need to switch back. >5. before indexing next index, delete one from previous day to free up >space. > >In case you have updates during day you have to account for that as >well - stop updating while indexing new index; update both indices if >want to be able to switch back at any point etc. > >HTH, >Emir >-- >Monitoring - Log Management - Alerting - Anomaly Detection >Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > >> On 31 Oct 2017, at 11:20, o1webdawg <i...@smartweber.com> wrote: >> >> I have an index with about a million documents. It is the backend >for a >> shopping cart system. >> >> Sometimes the inventory gets out of sync with solr and the storefront >> contains out of stock items. >> >> So I setup a scheduled task on the server to run at 12am every >morning to >> delete the entire solr index. >> >> Then at 12:04am I run another scheduled task to re-index the SQL >database >> containing the inventory. >> >> Well, today I check it around 4am and only a fraction of the products >are in >> the solr index. >> >> However, it did not seem to be idle and reloading it showed lots of >deleted >> documents. >> >> >> I open up the core and the deletes keep going up, max docs goes up, >but the >> total docs stays the same. >> >> It's really confusing me what is happening at this point and why I am >> viewing these numbers of docs. >> >> My theory is that the 12am delete is still running 5 hours later at >the same >> time as the re-indexing. >> >> That's the only way I can explain this really odd behavior with my >limited >> knowledge. >> >> Is my theory realistic and could the delete still be running? >> >> >> >> >> -- >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html -- Sorry for being brief. Alternate email is rickleir at yahoo dot com