On 10/22/23 12:25, Gus Heck wrote:
Echoing what Thomas says, this problem indicates your indexing system probably has a significant design flaw. For most systems, you should have a notion of document identity that is external to Solr, and that should be used as (or to deterministically generate) the id in Solr.
*If* you can generate the list of "known good" ids, a query like "*:* AND -id:X AND -id:Y ..." could find the bad ones... if you don't run out of POST size etc.
*If* Solr is not the primary store, you could reload the index from scratch. Preferably fixing the primary key problem while you're at it.
You could dump the documents into a temp sqlite or postgres table and do the sql "over" trick. If you don't have natural keys in your documents, that's probably the only thing you can do.
(If you do have natural key(s) ans are using guids in the index, guid3/5 is your friend: you can keep using guids *and* have them meta-stable too.)
Dima
