I am using the Block Join Query Parser with success, following the example
on:

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers

As this example shows, each parent document can have a number of documents
embedded, and each document, be it a parent or a child, has its own unique
identifier.

Now I would like to update some of the parent documents, and read that
there are horror stories with duplicate documents, scrambled data etc., the
two prominent JIRA entries for this are:

https://issues.apache.org/jira/browse/SOLR-6700
https://issues.apache.org/jira/browse/SOLR-6096

My question is, how do you usually update such documents, for example to
update a value for the parent or a value for one of its children?

I tried to repost the whole modified document (the parent and ALL of its
children as one file), and it seems to work on a small toy example, but of
course I cannot be sure for a larger instance with thousands of documents,
and I would like to know if this is the correct way to go or not.

To make it clear, if originally I used bin/solr post on on the following
file:

<add>
<doc>
<field name="id">1</field>
<field name="title">Solr has block join support</field>
  <field name="content_type">parentDocument</field>
    <doc>
     <field name="id">2</field>
        <field name="comments">SolrCloud supports it too!</field>
    </doc>
</doc>
</add>

Now I could do bin/solr post on a file:

<add>
<doc>
<field name="id">1</field>
<field name="title">Updated field: Solr has block join support</field>
  <field name="content_type">parentDocument</field>
    <doc>
     <field name="id">2</field>
        <field name="comments">Updated field: SolrCloud supports it
too!</field>
    </doc>
</doc>
</add>

Will this avoid these inconsistent and scrambled or duplicate data on Solr
instances as discussed in the JIRAs? How do you usually do this?

Thanks for any help or hints.

Tom

Reply via email to