Hi Mark,

we're currently at 4.10.2, update to 4.10.3 ist scheduled for tomorrow.

T

Am 12.01.15 um 17:30 schrieb Mark Miller:
bq. ClusterState says we are the leader, but locally we don't think so

Generally this is due to some bug. One bug that can lead to it was recently
fixed in 4.10.3 I think. What version are you on?

- Mark

On Mon Jan 12 2015 at 7:35:47 AM Thomas Lamy <t.l...@cytainment.de> wrote:

Hi,

I found no big/unusual GC pauses in the Log (at least manually; I found
no free solution to analyze them that worked out of the box on a
headless debian wheezy box). Eventually i tried with -Xmx8G (was 64G
before) on one of the nodes, after checking allocation after 1 hour run
time was at about 2-3GB. That didn't move the time frame where a restart
was needed, so I don't think Solr's JVM GC is the problem.
We're trying to get all of our node's logs (zookeeper and solr) into
Splunk now, just to get a better sorted view of what's going on in the
cloud once a problem occurs. We're also enabling GC logging for
zookeeper; maybe we were missing problems there while focussing on solr
logs.

Thomas


Am 08.01.15 um 16:33 schrieb Yonik Seeley:
It's worth noting that those messages alone don't necessarily signify
a problem with the system (and it wouldn't be called "split brain").
The async nature of updates (and thread scheduling) along with
stop-the-world GC pauses that can change leadership, cause these
little windows of inconsistencies that we detect and log.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


On Wed, Jan 7, 2015 at 5:01 AM, Thomas Lamy <t.l...@cytainment.de>
wrote:
Hi there,

we are running a 3 server cloud serving a dozen
single-shard/replicate-everywhere collections. The 2 biggest
collections are
~15M docs, and about 13GiB / 2.5GiB size. Solr is 4.10.2, ZK 3.4.5,
Tomcat
7.0.56, Oracle Java 1.7.0_72-b14

10 of the 12 collections (the small ones) get filled by DIH full-import
once
a day starting at 1am. The second biggest collection is updated usind
DIH
delta-import every 10 minutes, the biggest one gets bulk json updates
with
commits once in 5 minutes.

On a regular basis, we have a leader information mismatch:
org.apache.solr.update.processor.DistributedUpdateProcessor; Request
says it
is coming from leader, but we are the leader
or the opposite
org.apache.solr.update.processor.DistributedUpdateProcessor;
ClusterState
says we are the leader, but locally we don't think so

One of these pop up once a day at around 8am, making either some cores
going
to "recovery failed" state, or all cores of at least one cloud node into
state "gone".
This started out of the blue about 2 weeks ago, without changes to
neither
software, data, or client behaviour.

Most of the time, we get things going again by restarting solr on the
current leader node, forcing a new election - can this be triggered
while
keeping solr (and the caches) up?
But sometimes this doesn't help, we had an incident last weekend where
our
admins didn't restart in time, creating millions of entries in
/solr/oversser/queue, making zk close the connection, and leader
re-elect
fails. I had to flush zk, and re-upload collection config to get solr up
again (just like in https://gist.github.com/
isoboroff/424fcdf63fa760c1d1a7).
We have a much bigger cloud (7 servers, ~50GiB Data in 8 collections,
1500
requests/s) up and running, which does not have these problems since
upgrading to 4.10.2.


Any hints on where to look for a solution?

Kind regards
Thomas

--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.:     +49 (40) 23 706-747
Fax:     +49 (40) 23 706-139
Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476


--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.:     +49 (40) 23 706-747
Fax:     +49 (40) 23 706-139

Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476




--
Thomas Lamy
Cytainment AG & Co KG
Nordkanalstrasse 52
20097 Hamburg

Tel.:     +49 (40) 23 706-747
Fax:     +49 (40) 23 706-139

Sitz und Registergericht Hamburg
HRA 98121
HRB 86068
Ust-ID: DE213009476

Reply via email to