I've implemented a custom solr2solr ongoing unidirectional replication
mechanism.

A Replicator (acting as solrJ client), crawls documents from SolrCloud1 and
writes them to SolrCloud2 in batches.
The replicator crawl logic is to read documents with a time greater/equale
to the time of the last replicated document.
Whenever a document is added/updated, I auto updated a a tdate field
"last_updated_in_solr" using TimestampUpdateProcessorFactory.

*My problem: *When a client indexes a batch of 100 documents, all 100 docs
have the same "last_updated_in_solr" value. This makes my ongoing
replication check for new documents to replicate much more complex than if
the time value was unique.

1. Can I use some other processor to generate increasing unique values?
2. Can I use the internal _version_ field for this? is it guaranteed to be
monotonically increasing for the entire collection or only per document,
with each add/update?
Any other options?

Schema.xml:
<field name="last_updated_in_solr" type="tdate" indexed="true"
stored="true" multiValued="false"/>

solrconfig.xml:
<updateRequestProcessorChain name="default">
       <processor class="solr.TimestampUpdateProcessorFactory">
           <str name="fieldName">last_updated_in_solr</str>
       </processor>
       <processor class="solr.LogUpdateProcessorFactory" />
       <processor class="solr.RunUpdateProcessorFactory" />
    </updateRequestProcessorChain>

I know there's work for a build-in replication mechanism, but it's not yet
released.
Using Solr 4.7.2.

Reply via email to