Hey Chris,

I figured a separate issue while working on CDCR which may relate to your
problem. Please see jira: *SOLR-12063*
<https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063>. This is a
bug got introduced when we supported the bidirectional approach where an
extra flag in tlog entry for cdcr is added.

This part of the code is messing up:
*UpdateLog.java.RecentUpdates::update()::*

switch (oper) {
  case UpdateLog.ADD:
  case UpdateLog.UPDATE_INPLACE:
  case UpdateLog.DELETE:
  case UpdateLog.DELETE_BY_QUERY:
    Update update = new Update();
    update.log = oldLog;
    update.pointer = reader.position();
    update.version = version;

    if (oper == UpdateLog.UPDATE_INPLACE && entry.size() == 5) {
      update.previousVersion = (Long) entry.get(UpdateLog.PREV_VERSION_IDX);
    }
    updatesForLog.add(update);
    updates.put(version, update);

    if (oper == UpdateLog.DELETE_BY_QUERY) {
      deleteByQueryList.add(update);
    } else if (oper == UpdateLog.DELETE) {
      deleteList.add(new DeleteUpdate(version,
(byte[])entry.get(entry.size()-1)));
    }

    break;

  case UpdateLog.COMMIT:
    break;
  default:
    throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"Unknown Operation! " + oper);
}

deleteList.add(new DeleteUpdate(version, (byte[])entry.get(entry.size()-1)));

is expecting the last entry to be the payload, but everywhere in the
project, *pos:[2] *is the index for the payload, while the last entry in
source code is *boolean* in / after Solr 7.2, denoting update is cdcr
forwarded or typical. UpdateLog.java.RecentUpdates is used to in cdcr sync,
checkpoint operations and hence it is a legit bug, slipped the tests I
wrote.

The immediate fix patch is uploaded and I am awaiting feedback on that.
Meanwhile if it is possible for you to apply the patch, build the jar and
try it out, please do and let us know.

For, *SOLR-9394* <https://issues.apache.org/jira/browse/SOLR-9394>, if you
can comment on the JIRA and post the sample docs, solr logs, relevant
information, I can give it a thorough look.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Wed, Mar 7, 2018 at 1:35 AM, Chris Troullis <cptroul...@gmail.com> wrote:

> Hi all,
>
> We recently upgraded to Solr 7.2.0 as we saw that there were some CDCR bug
> fixes and features added that would finally let us be able to make use of
> it (bi-directional syncing was the big one). The first time we tried to
> implement we ran into all kinds of errors, but this time we were able to
> get it mostly working.
>
> The issue we seem to be having now is that any time a document is deleted
> via deleteById from a collection on the primary node, we are flooded with
> "Invalid Number" errors followed by a random sequence of characters when
> CDCR tries to sync the update to the backup site. This happens on all of
> our collections where our id fields are defined as longs (some of them the
> ids are compound keys and are strings).
>
> Here's a sample exception:
>
> org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error
> from server at http://ip/solr/collection_shard1_replica_n1: Invalid
> Number:  ]
> -s
>         at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> directUpdate(CloudSolrClient.java:549)
>         at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> sendRequest(CloudSolrClient.java:1012)
>         at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:883)
>         at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
>         at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
>         at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
>         at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
>         at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
>         at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(
> CloudSolrClient.java:816)
>         at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>         at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
>         at
> org.apache.solr.handler.CdcrReplicator.sendRequest(
> CdcrReplicator.java:140)
>         at
> org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:104)
>         at
> org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(
> CdcrReplicatorScheduler.java:81)
>         at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.
> lambda$execute$0(ExecutorUtil.java:188)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
>
> I'm scratching my head as to the cause of this. It's like it is trying to
> deleteById for the value "]", even though that is not the ID for the
> document that was deleted from the primary. So I don't know if it is
> pulling this from the wrong field somehow or where that value if coming
> from.
>
> I found this issue: https://issues.apache.org/jira/browse/SOLR-9394 which
> looks related, but doesn't look like it has any traction.
>
> Has anyone else experienced this issue with CDCR, or have any ideas as to
> what could be causing this issue?
>
> Thanks,
>
> Chris
>

Reply via email to