Hi all,

We are seeing a strange problem on our 2-node solr4 cluster. This problem
has resultet in data loss.

We have two servers, varnish01 and varnish02. Zookeeper is running on
varnish02, but in a separate jvm.

We index directly to varnish02 and we read from varnish01. Data is thus
replicated from varnish02 to varnish01.

I found this in the varnish01 log:

*INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2}
status=0 QTime=42
Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2}
status=0 QTime=41
Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2}
status=0 QTime=33
Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2}
status=0 QTime=33
Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log
SEVERE: shard update error StdNode:
http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException:
IOException occured when talking to server at:
http://varnish02.lynero.net:8000/solr/default1_Norwegian
    at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
    at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
    at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
    at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.http.NoHttpResponseException: The target server
failed to respond
    at
org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
    at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
    at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
    at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
    at
org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
    at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
    at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
    at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
    at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
    at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
    at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
    at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
    at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
    ... 11 more

Dec 13, 2012 12:23:39 PM
org.apache.solr.update.processor.DistributedUpdateProcessor doFinish
INFO: try and ask http://varnish02.lynero.net:8000/solr to recover*

It looks like it is sending updates from varnish01 to varnish02. I am not
sure for what since we only index on varnish02. Updates should never be
going from varnish01 to varnish02.

Meanwhile on varnish02:

*INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=16
Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=15
Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=16
Dec 13, 2012 12:23:42 PM org.apache.solr.handler.admin.CoreAdminHandler
handleRequestRecoveryAction
INFO: It has been requested that we recover*
*Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Danish] webapp=/solr path=/select
params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
status=0 QTime=1
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Danish] webapp=/solr path=/select/
params={fq=site_guid:(2810678)&q=win} hits=0 status=0 QTime=17
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Danish] webapp=/solr path=/select
params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
status=0 QTime=1
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Danish] webapp=/solr path=/select
params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
status=0 QTime=1
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Danish] webapp=/solr path=/select
params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url=
varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
status=0 QTime=1
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=26
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=22
Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
doRecovery
Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState
doRecovery
INFO: Running recovery - first canceling any ongoing recovery
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=25
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=24
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=20
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=25
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=23
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=21
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=23
Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Norwegian] webapp=/solr path=/update params={distrib.from=
http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2}
status=0 QTime=16
Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy run
INFO: Starting recovery process.  core=default1_Norwegian
recoveringAfterStartup=false
Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader
updateClusterState
INFO: Updating cloud state from ZooKeeper...
Dec 13, 2012 12:23:42 PM
org.apache.solr.update.processor.LogUpdateProcessor finish*

And less than a second later:

*Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Attempting to PeerSync from
http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian
- recoveringAfterStartup=false
Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=default1_Norwegian url=
http://varnish02.lynero.net:8000/solr START replicas=[
http://varnish01.lynero.net:8000/solr/default1_Norwegian/] nUpdates=100
Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync
WARNING: PeerSync: core=default1_Norwegian url=
http://varnish02.lynero.net:8000/solr too many updates received since start
- startingUpdates no longer overlaps with our currentUpdates
Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: PeerSync Recovery was not successful - trying replication.
core=default1_Norwegian
Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Starting Replication Recovery. core=default1_Norwegian
Dec 13, 2012 12:23:42 PM org.apache.solr.client.solrj.impl.HttpClientUtil
createClient
INFO: Creating new http client,
config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2
process
INFO: A cluster state change has occurred - updating...*

State change on varnish01 at the same time:

*Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2
process
INFO: A cluster state change has occurred - updating...*
*
*And a few seconds later on varnish02, the recovery finishes:
*
Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Replication Recovery was successful - registering as Active.
core=default1_Norwegian
Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Finished recovery process. core=default1_Norwegian
Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore execute
INFO: [default1_Danish] webapp=/solr path=/select
params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url=
varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true}
status=0 QTime=8
Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader
updateClusterState
INFO: Updating cloud state from ZooKeeper... *

Which is picked up on varnish01:

*Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader$2
process
INFO: A cluster state change has occurred - updating...*

It looks like it replicated successfully, only it didnt. The
default1_Norwegian core on varnish01 now has 55.071 docs and the same core
on varnish02 has 35.088 docs.

I checked the log files for both JVM's and no stop-the-world GC were taking
place.

There is also nothing in the zookeeper log of interest that I can see.


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk

Reply via email to