After some further investigation, for those interested: the
SignatureUpdateProcessorFactory fields were somehow mis-configured (I
guess copied over from another collection). The initial import had been
made using a data import handler: I suppose the update chain isn't
called in this process and no signature field is created - am I right?.

The first time a document was updated, a signature field with value
"0000000000000000" was added. The next time, the same signature was
generated for the new udpate, which triggered the deletion of all
documents with the same signature (i.e. the first one) as overwriteDupes
was set to true. Correct behavior but quite tricky...

So my conclusion here (please correct me if I'm wrong) is of course to
fix the signature configuration problem, but also to manage calling the
update chain (or maybe a simplified one, e.g. by skipping logging) in
the data import handler. Is there an easy way to do this? Conceptually,
shouldn't the update chain be callable from the data import process -
maybe it is?

John


On 08/10/15 09:43, Upayavira wrote:
> Yay!
>
> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
>> Yes indeed, the update chain had been activated... I commented it out
>> again and the problem vanished.
>>
>> Good job, thanks Erick and Upayavira!
>> John
>>
>>
>> On 08/10/15 08:58, Upayavira wrote:
>>> Look for the DedupUpdateProcessor in an update chain.
>>>
>>> that is there, but commented out IIRC in the techproducts sample
>>> configs.
>>>
>>> Perhaps you uncommented it to use your own update processors, but didn't
>>> remove that component?
>>>
>>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
>>>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
>>>> INFO level, the update request just gets mentioned. No exception. I
>>>> reran it with the DEBUG level, but most of the log was related to jetty.
>>>> Here's a line I noticed though:
>>>>
>>>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
>>>> {wt=json&commit=true&update.chain=dedupe}
>>>>
>>>> The update.chain parameter wasn't part of the original request, and
>>>> "dedupe" looks suspicious to me. Perhaps should I investigate further
>>>> there?
>>>>
>>>> Thanks,
>>>> John.
>>>>
>>>>
>>>> On 08/10/15 08:25, John Smith wrote:
>>>>> The ids are all different: they're unique numbers followed by a couple
>>>>> of keywords. I've made a test with a small collection of 10 documents to
>>>>> make sure I can manage them manually: all ids are confirmed as different.
>>>>>
>>>>> I also dumped the exact command, here's one example:
>>>>>
>>>>> <add><doc><field name="Id">101084385_Sebago_ sebago shoes</field><field
>>>>> name="Clicks" update="set">1</field><field name="Boost"
>>>>> update="set">1.8701925463775</field></doc></add>
>>>>>
>>>>> It's sent as the body of a POST request to
>>>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true, with a
>>>>> Content-Type: text/xml header. I still noted the consistent loss of
>>>>> another document with the update above.
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On 08/10/15 00:38, Upayavira wrote:
>>>>>> What ID are you using? Are you possibly using the same ID field for
>>>>>> both, so the second document you visit causes the first to be
>>>>>> overwritten?
>>>>>>
>>>>>> Upayavira
>>>>>>
>>>>>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
>>>>>>> This certainly should not be happening. I'd
>>>>>>> take a careful look at what you actually send.
>>>>>>> My _guess_ is that you're not sending the update
>>>>>>> command you think you are....
>>>>>>>
>>>>>>> As a test you could just curl (or use post.jar) to
>>>>>>> send these types of commands up individually.
>>>>>>>
>>>>>>> Perhaps looking at the solr log would help too...
>>>>>>>
>>>>>>> Best,
>>>>>>> Erick
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <solr-u...@remailme.net>
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm bumping on the following problem with update XML messages. The idea
>>>>>>>> is to record the number of clicks for a document: each time, a message
>>>>>>>> is sent to .../update such as this one:
>>>>>>>>
>>>>>>>> <add>
>>>>>>>> <doc>
>>>>>>>> <field name="Id">abc</field>
>>>>>>>> <field name="Clicks" update="set">1</field>
>>>>>>>> <field name="Boost" update="set">1.05</field>
>>>>>>>> </doc>
>>>>>>>> </add>
>>>>>>>>
>>>>>>>> (Clicks is an int field; Boost is a float field, it's updated to 
>>>>>>>> reflect
>>>>>>>> the change in popularity using a formula based on the number of 
>>>>>>>> clicks).
>>>>>>>>
>>>>>>>> At the moment in the dev environment, changes are committed 
>>>>>>>> immediately.
>>>>>>>>
>>>>>>>>
>>>>>>>> When a document is updated, the changes are indeed reflected in the
>>>>>>>> search results. If I click on the same document again, all goes well.
>>>>>>>> But  when I click on an other document, the latter gets updated as
>>>>>>>> expected but the former is plainly deleted. It can no longer be found
>>>>>>>> and the admin core Overview page counts 1 document less. If I click on 
>>>>>>>> a
>>>>>>>> 3rd document, so goes the 2nd one.
>>>>>>>>
>>>>>>>>
>>>>>>>> The schema is the default one amended to remove unneeded fields and add
>>>>>>>> new ones, nothing fancy. All fields are stored="true" and there's no
>>>>>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone mode, with
>>>>>>>> the same outcome. It looks like a bug to me but I might have overlooked
>>>>>>>> something? This is my first attempt at atomic updates.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> John.
>>>>>>>>

Reply via email to