: Subject: TimestampUpdateProcessorFactory updates the field even if the value
:     if present
: 
: Hi,
: 
: Following is the update request processor chain.
: 
: <updateRequestProcessorChain name="DefaultProcessorChain" default="true" > <
: processor class="solr.TimestampUpdateProcessorFactory"> <str name=
: "fieldName">index_time_stamp_create</str> </processor> <processor class=
: "solr.LogUpdateProcessorFactory" /> <processor class=
: "solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain>
: 
: And, here is how the field is defined in schema.xml
: 
: <field name="index_time_stamp_create" type="date" indexed="true" stored=
: "true" />
: 
: Every time I index the same document, above field changes its value with
: latest timestamp. According to TimestampUpdateProcessorFactory  javadoc
: page, if a document does not contain a value in the timestamp field, a new

based on the wording of your question, i suspect you are confused about 
the overall behavior of how "updating" an existing document works in solr, 
and how update processors "see" an *input document* when processing an 
add/update command.


First off, completley ignoring TimestampUpdateProcessorFactory and 
assuming just the simplest possibel update change, let's clarify how 
"updates" work, let's assume you when you say you "index the same 
document" twice you do so with a few diff field values ...

First Time...

{  id:"x",  title:"xxxx" }

Second time...

{  id:"x",  body:"xxxx xxxx xxxx xxxx xxxx xxxx xxx" }

Solr does not implicitly know that you are trying to *update* that 
document, the final result will not be a document containing both a 
"title" field and "body" field in addition to the "id", it will *only* 
have the "id" and "body" fields and the title field will be lost.

The way to "update" a document *and keep existing field values* is with 
one of the "Atomic Update" command options...

https://lucene.apache.org/solr/guide/8_4/updating-parts-of-documents.html#UpdatingPartsofDocuments-AtomicUpdates

{  id:"x",  title:"xxxx" }

Second time...

{  id:"x",  body: { set: "xxxx xxxx xxxx xxxx xxxx xxxx xxx" } }


Now, with that background info clarified: let's talk about update 
processors....


The docs for TimestampUpdateProcessorFactory are refering to how it 
modifies an *input* document that it recieves (as part of the processor 
chain). It adds the timestamp field if it's not already in the *input* 
document, it doesn't know anything about wether that document is already 
in the index, or if it has a value for that field in the index.


When processors like TimestampUpdateProcessorFactory (or any other 
processor that modifies a *input* document) are run they don't know if the 
document you are "indexing" already exists in the index or not.  even if 
you are using the "atomic update" options to set/remove/add a field value, 
with the intent of preserving all other field values, the documents based 
down the processors chain don't include those values until the "document 
merger" logic is run -- as part of the DistributedUpdateProcessor (which 
if not explicit in your chain happens immediatly before the 
RunUpdateProcessorFactory)

Off the top of my head i don't know if there is an "easy" way to have a 
Timestamp added to "new" documents, but left "as is" for existing 
documents.

Untested idea....

explicitly configured 
DistributedUpdateProcessorFactory, so that (in addition to putting 
TimestampUpdateProcessorFactory before it) you can 
also put MinFieldValueUpdateProcessorFactory on the timestamp field 
*after* DistributedUpdateProcessorFactory (but before 
RunUpdateProcessorFactory).  

I think that would work?

Just putting TimestampUpdateProcessorFactory after the 
DistributedUpdateProcessorFactory would be dangerous, because it would 
introduce descrepencies -- each replica would would up with it's own 
locally computed timestamp.  having the timetsamp generated before the 
distributed update processor ensures the value is computed only once.

-Hoss
http://www.lucidworks.com/

Reply via email to