I have since removed the files but when I had looked there was an index directory, the only files I remember being there were the segments, one of the _* files were present. I'll watch it to see if it happens again but it happened on 2 of the shards while heavy indexing.
On Wed, Apr 3, 2013 at 10:13 PM, Mark Miller <markrmil...@gmail.com> wrote: > Is that file still there when you look? Not being able to find an index > file is not a common error I've seen recently. > > Do those replicas have an index directory or when you look on disk, is it > an index.timestamp directory? > > - Mark > > On Apr 3, 2013, at 10:01 PM, Jamie Johnson <jej2...@gmail.com> wrote: > > > so something is still not right. Things were going ok, but I'm seeing > this > > in the logs of several of the replicas > > > > SEVERE: Unable to create core: dsc-shard3-core1 > > org.apache.solr.common.SolrException: Error opening new searcher > > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:822) > > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:618) > > at > > org.apache.solr.core.CoreContainer.createFromZk(CoreContainer.java:967) > > at > > org.apache.solr.core.CoreContainer.create(CoreContainer.java:1049) > > at > org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) > > at > org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) > > at > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > > at > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > Caused by: org.apache.solr.common.SolrException: Error opening new > searcher > > at > org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1435) > > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1547) > > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:797) > > ... 13 more > > Caused by: org.apache.solr.common.SolrException: Error opening Reader > > at > > > org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172) > > at > > > org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:183) > > at > > > org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:179) > > at > org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1411) > > ... 15 more > > Caused by: java.io.FileNotFoundException: > > /cce2/solr/data/dsc-shard3-core1/index/_13x.si (No such file or > directory) > > at java.io.RandomAccessFile.open(Native Method) > > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216) > > at > > org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193) > > at > > > org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232) > > at > > > org.apache.lucene.codecs.lucene40.Lucene40SegmentInfoReader.read(Lucene40SegmentInfoReader.java:50) > > at > org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:301) > > at > > > org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56) > > at > > > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783) > > at > > > org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) > > at > > org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88) > > at > > > org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34) > > at > > > org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:169) > > ... 18 more > > > > > > > > On Wed, Apr 3, 2013 at 8:54 PM, Jamie Johnson <jej2...@gmail.com> wrote: > > > >> Thanks I will try that. > >> > >> > >> On Wed, Apr 3, 2013 at 8:28 PM, Mark Miller <markrmil...@gmail.com> > wrote: > >> > >>> > >>> > >>> On Apr 3, 2013, at 8:17 PM, Jamie Johnson <jej2...@gmail.com> wrote: > >>> > >>>> I am not using the concurrent low pause garbage collector, I could > look > >>> at > >>>> switching, I'm assuming you're talking about adding > >>> -XX:+UseConcMarkSweepGC > >>>> correct? > >>> > >>> Right - if you don't do that, the default is almost always the > throughput > >>> collector (I've only seen OSX buck this trend when apple handled java). > >>> That means stop the world garbage collections, so with larger heaps, > that > >>> can be a fair amount of time that no threads can run. It's not that > great > >>> for something as interactive as search generally is anyway, but it's > always > >>> not that great when added to heavy load and a 15 sec session timeout > >>> between solr and zk. > >>> > >>> > >>> The below is odd - a replica node is waiting for the leader to see it > as > >>> recovering and live - live means it has created an ephemeral node for > that > >>> Solr corecontainer in zk - it's very strange if that didn't happen, > unless > >>> this happened during shutdown or something. > >>> > >>>> > >>>> I also just had a shard go down and am seeing this in the log > >>>> > >>>> SEVERE: org.apache.solr.common.SolrException: I was asked to wait on > >>> state > >>>> down for 10.38.33.17:7576_solr but I still do not see the requested > >>> state. > >>>> I see state: recovering live:false > >>>> at > >>>> > >>> > org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:890) > >>>> at > >>>> > >>> > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186) > >>>> at > >>>> > >>> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > >>>> at > >>>> > >>> > org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591) > >>>> at > >>>> > >>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192) > >>>> at > >>>> > >>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) > >>>> at > >>>> > >>> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) > >>>> at > >>>> > >>> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) > >>>> at > >>>> > >>> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > >>>> at > >>>> > >>> > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) > >>>> at > >>>> > >>> > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > >>>> > >>>> Nothing other than this in the log jumps out as interesting though. > >>>> > >>>> > >>>> On Wed, Apr 3, 2013 at 7:47 PM, Mark Miller <markrmil...@gmail.com> > >>> wrote: > >>>> > >>>>> This shouldn't be a problem though, if things are working as they are > >>>>> supposed to. Another node should simply take over as the overseer and > >>>>> continue processing the work queue. It's just best if you configure > so > >>> that > >>>>> session timeouts don't happen unless a node is really down. On the > >>> other > >>>>> hand, it's nicer to detect that faster. Your tradeoff to make. > >>>>> > >>>>> - Mark > >>>>> > >>>>> On Apr 3, 2013, at 7:46 PM, Mark Miller <markrmil...@gmail.com> > wrote: > >>>>> > >>>>>> Yeah. Are you using the concurrent low pause garbage collector? > >>>>>> > >>>>>> This means the overseer wasn't able to communicate with zk for 15 > >>>>> seconds - due to load or gc or whatever. If you can't resolve the > root > >>>>> cause of that, or the load just won't allow for it, next best thing > >>> you can > >>>>> do is raise it to 30 seconds. > >>>>>> > >>>>>> - Mark > >>>>>> > >>>>>> On Apr 3, 2013, at 7:41 PM, Jamie Johnson <jej2...@gmail.com> > wrote: > >>>>>> > >>>>>>> I am occasionally seeing this in the log, is this just a timeout > >>> issue? > >>>>>>> Should I be increasing the zk client timeout? > >>>>>>> > >>>>>>> WARNING: Overseer cannot talk to ZK > >>>>>>> Apr 3, 2013 11:14:25 PM > >>>>>>> org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process > >>>>>>> INFO: Watcher fired on path: null state: Expired type None > >>>>>>> Apr 3, 2013 11:14:25 PM > >>>>> org.apache.solr.cloud.Overseer$ClusterStateUpdater > >>>>>>> run > >>>>>>> WARNING: Solr cannot talk to ZK, exiting Overseer main queue loop > >>>>>>> org.apache.zookeeper.KeeperException$SessionExpiredException: > >>>>>>> KeeperErrorCode = Session expired for /overseer/queue > >>>>>>> at > >>>>>>> > org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > >>>>>>> at > >>>>>>> > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > >>>>>>> at > >>> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468) > >>>>>>> at > >>>>>>> > >>>>> > >>> > org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:236) > >>>>>>> at > >>>>>>> > >>>>> > >>> > org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:233) > >>>>>>> at > >>>>>>> > >>>>> > >>> > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65) > >>>>>>> at > >>>>>>> > >>>>> > >>> > org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:233) > >>>>>>> at > >>>>>>> > >>>>> > >>> > org.apache.solr.cloud.DistributedQueue.orderedChildren(DistributedQueue.java:89) > >>>>>>> at > >>>>>>> > >>>>> > >>> > org.apache.solr.cloud.DistributedQueue.element(DistributedQueue.java:131) > >>>>>>> at > >>>>>>> > >>> org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:326) > >>>>>>> at > >>>>>>> > >>>>> > >>> > org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:128) > >>>>>>> at java.lang.Thread.run(Thread.java:662) > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Wed, Apr 3, 2013 at 7:25 PM, Jamie Johnson <jej2...@gmail.com> > >>>>> wrote: > >>>>>>> > >>>>>>>> just an update, I'm at 1M records now with no issues. This looks > >>>>>>>> promising as to the cause of my issues, thanks for the help. Is > the > >>>>>>>> routing method with numShards documented anywhere? I know > >>> numShards is > >>>>>>>> documented but I didn't know that the routing changed if you don't > >>>>> specify > >>>>>>>> it. > >>>>>>>> > >>>>>>>> > >>>>>>>> On Wed, Apr 3, 2013 at 4:44 PM, Jamie Johnson <jej2...@gmail.com> > >>>>> wrote: > >>>>>>>> > >>>>>>>>> with these changes things are looking good, I'm up to 600,000 > >>>>> documents > >>>>>>>>> without any issues as of right now. I'll keep going and add more > >>> to > >>>>> see if > >>>>>>>>> I find anything. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Wed, Apr 3, 2013 at 4:01 PM, Jamie Johnson <jej2...@gmail.com > > > >>>>> wrote: > >>>>>>>>> > >>>>>>>>>> ok, so that's not a deal breaker for me. I just changed it to > >>> match > >>>>> the > >>>>>>>>>> shards that are auto created and it looks like things are happy. > >>>>> I'll go > >>>>>>>>>> ahead and try my test to see if I can get things out of sync. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller < > >>> markrmil...@gmail.com > >>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> I had thought you could - but looking at the code recently, I > >>> don't > >>>>>>>>>>> think you can anymore. I think that's a technical limitation > more > >>>>> than > >>>>>>>>>>> anything though. When these changes were made, I think support > >>> for > >>>>> that was > >>>>>>>>>>> simply not added at the time. > >>>>>>>>>>> > >>>>>>>>>>> I'm not sure exactly how straightforward it would be, but it > >>> seems > >>>>>>>>>>> doable - as it is, the overseer will preallocate shards when > >>> first > >>>>> creating > >>>>>>>>>>> the collection - that's when they get named shard(n). There > would > >>>>> have to > >>>>>>>>>>> be logic to replace shard(n) with the custom shard name when > the > >>>>> core > >>>>>>>>>>> actually registers. > >>>>>>>>>>> > >>>>>>>>>>> - Mark > >>>>>>>>>>> > >>>>>>>>>>> On Apr 3, 2013, at 3:42 PM, Jamie Johnson <jej2...@gmail.com> > >>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> answered my own question, it now says compositeId. What is > >>>>>>>>>>> problematic > >>>>>>>>>>>> though is that in addition to my shards (which are say > >>>>> jamie-shard1) > >>>>>>>>>>> I see > >>>>>>>>>>>> the solr created shards (shard1). I assume that these were > >>> created > >>>>>>>>>>> because > >>>>>>>>>>>> of the numShards param. Is there no way to specify the names > of > >>>>> these > >>>>>>>>>>>> shards? > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson < > >>> jej2...@gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> ah interesting....so I need to specify num shards, blow out > zk > >>> and > >>>>>>>>>>> then > >>>>>>>>>>>>> try this again to see if things work properly now. What is > >>> really > >>>>>>>>>>> strange > >>>>>>>>>>>>> is that for the most part things have worked right and on > >>> 4.2.1 I > >>>>>>>>>>> have > >>>>>>>>>>>>> 600,000 items indexed with no duplicates. In any event I > will > >>>>>>>>>>> specify num > >>>>>>>>>>>>> shards clear out zk and begin again. If this works properly > >>> what > >>>>>>>>>>> should > >>>>>>>>>>>>> the router type be? > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller < > >>>>> markrmil...@gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> If you don't specify numShards after 4.1, you get an > implicit > >>> doc > >>>>>>>>>>> router > >>>>>>>>>>>>>> and it's up to you to distribute updates. In the past, > >>>>> partitioning > >>>>>>>>>>> was > >>>>>>>>>>>>>> done on the fly - but for shard splitting and perhaps other > >>>>>>>>>>> features, we > >>>>>>>>>>>>>> now divvy up the hash range up front based on numShards and > >>> store > >>>>>>>>>>> it in > >>>>>>>>>>>>>> ZooKeeper. No numShards is now how you take complete control > >>> of > >>>>>>>>>>> updates > >>>>>>>>>>>>>> yourself. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson < > jej2...@gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The router says "implicit". I did start from a blank zk > >>> state > >>>>> but > >>>>>>>>>>>>>> perhaps > >>>>>>>>>>>>>>> I missed one of the ZkCLI commands? One of my shards from > >>> the > >>>>>>>>>>>>>>> clusterstate.json is shown below. What is the process that > >>>>> should > >>>>>>>>>>> be > >>>>>>>>>>>>>> done > >>>>>>>>>>>>>>> to bootstrap a cluster other than the ZkCLI commands I > listed > >>>>>>>>>>> above? My > >>>>>>>>>>>>>>> process right now is run those ZkCLI commands and then > start > >>>>> solr > >>>>>>>>>>> on > >>>>>>>>>>>>>> all of > >>>>>>>>>>>>>>> the instances with a command like this > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> java -server -Dshard=shard5 -DcoreName=shard5-core1 > >>>>>>>>>>>>>>> -Dsolr.data.dir=/solr/data/shard5-core1 > >>>>>>>>>>>>>> -Dcollection.configName=solr-conf > >>>>>>>>>>>>>>> -Dcollection=collection1 > >>>>>>>>>>> -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181 > >>>>>>>>>>>>>>> -Djetty.port=7575 -DhostPort=7575 -jar start.jar > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I feel like maybe I'm missing a step. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> "shard5":{ > >>>>>>>>>>>>>>> "state":"active", > >>>>>>>>>>>>>>> "replicas":{ > >>>>>>>>>>>>>>> "10.38.33.16:7575_solr_shard5-core1":{ > >>>>>>>>>>>>>>> "shard":"shard5", > >>>>>>>>>>>>>>> "state":"active", > >>>>>>>>>>>>>>> "core":"shard5-core1", > >>>>>>>>>>>>>>> "collection":"collection1", > >>>>>>>>>>>>>>> "node_name":"10.38.33.16:7575_solr", > >>>>>>>>>>>>>>> "base_url":"http://10.38.33.16:7575/solr", > >>>>>>>>>>>>>>> "leader":"true"}, > >>>>>>>>>>>>>>> "10.38.33.17:7577_solr_shard5-core2":{ > >>>>>>>>>>>>>>> "shard":"shard5", > >>>>>>>>>>>>>>> "state":"recovering", > >>>>>>>>>>>>>>> "core":"shard5-core2", > >>>>>>>>>>>>>>> "collection":"collection1", > >>>>>>>>>>>>>>> "node_name":"10.38.33.17:7577_solr", > >>>>>>>>>>>>>>> "base_url":"http://10.38.33.17:7577/solr"}}} > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller < > >>>>> markrmil...@gmail.com > >>>>>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> It should be part of your clusterstate.json. Some users > have > >>>>>>>>>>> reported > >>>>>>>>>>>>>>>> trouble upgrading a previous zk install when this change > >>> came. > >>>>> I > >>>>>>>>>>>>>>>> recommended manually updating the clusterstate.json to > have > >>> the > >>>>>>>>>>> right > >>>>>>>>>>>>>> info, > >>>>>>>>>>>>>>>> and that seemed to work. Otherwise, I guess you have to > >>> start > >>>>>>>>>>> from a > >>>>>>>>>>>>>> clean > >>>>>>>>>>>>>>>> zk state. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> If you don't have that range information, I think there > >>> will be > >>>>>>>>>>>>>> trouble. > >>>>>>>>>>>>>>>> Do you have an router type defined in the > clusterstate.json? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson < > >>> jej2...@gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Where is this information stored in ZK? I don't see it > in > >>> the > >>>>>>>>>>> cluster > >>>>>>>>>>>>>>>>> state (or perhaps I don't understand it ;) ). > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Perhaps something with my process is broken. What I do > >>> when I > >>>>>>>>>>> start > >>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>> scratch is the following > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> ZkCLI -cmd upconfig ... > >>>>>>>>>>>>>>>>> ZkCLI -cmd linkconfig .... > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> but I don't ever explicitly create the collection. What > >>>>> should > >>>>>>>>>>> the > >>>>>>>>>>>>>> steps > >>>>>>>>>>>>>>>>> from scratch be? I am moving from an unreleased snapshot > >>> of > >>>>> 4.0 > >>>>>>>>>>> so I > >>>>>>>>>>>>>>>> never > >>>>>>>>>>>>>>>>> did that previously either so perhaps I did create the > >>>>>>>>>>> collection in > >>>>>>>>>>>>>> one > >>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>> my steps to get this working but have forgotten it along > >>> the > >>>>> way. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller < > >>>>>>>>>>> markrmil...@gmail.com> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thanks for digging Jamie. In 4.2, hash ranges are > >>> assigned up > >>>>>>>>>>> front > >>>>>>>>>>>>>>>> when a > >>>>>>>>>>>>>>>>>> collection is created - each shard gets a range, which > is > >>>>>>>>>>> stored in > >>>>>>>>>>>>>>>>>> zookeeper. You should not be able to end up with the > same > >>> id > >>>>> on > >>>>>>>>>>>>>>>> different > >>>>>>>>>>>>>>>>>> shards - something very odd going on. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hopefully I'll have some time to try and help you > >>> reproduce. > >>>>>>>>>>> Ideally > >>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>> can capture it in a test case. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson < > >>> jej2...@gmail.com > >>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> no, my thought was wrong, it appears that even with the > >>>>>>>>>>> parameter > >>>>>>>>>>>>>> set I > >>>>>>>>>>>>>>>>>> am > >>>>>>>>>>>>>>>>>>> seeing this behavior. I've been able to duplicate it > on > >>>>> 4.2.0 > >>>>>>>>>>> by > >>>>>>>>>>>>>>>>>> indexing > >>>>>>>>>>>>>>>>>>> 100,000 documents on 10 threads (10,000 each) when I > get > >>> to > >>>>>>>>>>> 400,000 > >>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>>> so. > >>>>>>>>>>>>>>>>>>> I will try this on 4.2.1. to see if I see the same > >>> behavior > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson < > >>>>>>>>>>> jej2...@gmail.com> > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Since I don't have that many items in my index I > >>> exported > >>>>> all > >>>>>>>>>>> of > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> keys > >>>>>>>>>>>>>>>>>>>> for each shard and wrote a simple java program that > >>> checks > >>>>> for > >>>>>>>>>>>>>>>>>> duplicates. > >>>>>>>>>>>>>>>>>>>> I found some duplicate keys on different shards, a > grep > >>> of > >>>>> the > >>>>>>>>>>>>>> files > >>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>> the keys found does indicate that they made it to the > >>> wrong > >>>>>>>>>>> places. > >>>>>>>>>>>>>>>> If > >>>>>>>>>>>>>>>>>> you > >>>>>>>>>>>>>>>>>>>> notice documents with the same ID are on shard 3 and > >>> shard > >>>>> 5. > >>>>>>>>>>> Is > >>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>>>>> possible that the hash is being calculated taking into > >>>>>>>>>>> account only > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> "live" nodes? I know that we don't specify the > >>> numShards > >>>>>>>>>>> param @ > >>>>>>>>>>>>>>>>>> startup > >>>>>>>>>>>>>>>>>>>> so could this be what is happening? > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" * > >>>>>>>>>>>>>>>>>>>> shard1-core1:0 > >>>>>>>>>>>>>>>>>>>> shard1-core2:0 > >>>>>>>>>>>>>>>>>>>> shard2-core1:0 > >>>>>>>>>>>>>>>>>>>> shard2-core2:0 > >>>>>>>>>>>>>>>>>>>> shard3-core1:1 > >>>>>>>>>>>>>>>>>>>> shard3-core2:1 > >>>>>>>>>>>>>>>>>>>> shard4-core1:0 > >>>>>>>>>>>>>>>>>>>> shard4-core2:0 > >>>>>>>>>>>>>>>>>>>> shard5-core1:1 > >>>>>>>>>>>>>>>>>>>> shard5-core2:1 > >>>>>>>>>>>>>>>>>>>> shard6-core1:0 > >>>>>>>>>>>>>>>>>>>> shard6-core2:0 > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson < > >>>>>>>>>>> jej2...@gmail.com> > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Something interesting that I'm noticing as well, I > just > >>>>>>>>>>> indexed > >>>>>>>>>>>>>>>> 300,000 > >>>>>>>>>>>>>>>>>>>>> items, and some how 300,020 ended up in the index. I > >>>>> thought > >>>>>>>>>>>>>>>> perhaps I > >>>>>>>>>>>>>>>>>>>>> messed something up so I started the indexing again > and > >>>>>>>>>>> indexed > >>>>>>>>>>>>>>>> another > >>>>>>>>>>>>>>>>>>>>> 400,000 and I see 400,064 docs. Is there a good way > to > >>>>> find > >>>>>>>>>>>>>>>> possibile > >>>>>>>>>>>>>>>>>>>>> duplicates? I had tried to facet on key (our id > field) > >>>>> but > >>>>>>>>>>> that > >>>>>>>>>>>>>>>> didn't > >>>>>>>>>>>>>>>>>>>>> give me anything with more than a count of 1. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson < > >>>>>>>>>>> jej2...@gmail.com> > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Ok, so clearing the transaction log allowed things > to > >>> go > >>>>>>>>>>> again. > >>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>> am > >>>>>>>>>>>>>>>>>>>>>> going to clear the index and try to replicate the > >>>>> problem on > >>>>>>>>>>>>>> 4.2.0 > >>>>>>>>>>>>>>>>>> and then > >>>>>>>>>>>>>>>>>>>>>> I'll try on 4.2.1 > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller < > >>>>>>>>>>>>>> markrmil...@gmail.com > >>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> No, not that I know if, which is why I say we need > to > >>>>> get > >>>>>>>>>>> to the > >>>>>>>>>>>>>>>>>> bottom > >>>>>>>>>>>>>>>>>>>>>>> of it. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson < > >>>>>>>>>>> jej2...@gmail.com> > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Mark > >>>>>>>>>>>>>>>>>>>>>>>> It's there a particular jira issue that you think > >>> may > >>>>>>>>>>> address > >>>>>>>>>>>>>>>> this? > >>>>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>>>>>> read > >>>>>>>>>>>>>>>>>>>>>>>> through it quickly but didn't see one that jumped > >>> out > >>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" < > >>>>>>>>>>> jej2...@gmail.com> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> I brought the bad one down and back up and it did > >>>>>>>>>>> nothing. I > >>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>>>>>>>> clear > >>>>>>>>>>>>>>>>>>>>>>>>> the index and try4.2.1. I will save off the logs > >>> and > >>>>> see > >>>>>>>>>>> if > >>>>>>>>>>>>>> there > >>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>>>>> anything else odd > >>>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" < > >>>>>>>>>>> markrmil...@gmail.com> > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> It would appear it's a bug given what you have > >>> said. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Any other exceptions would be useful. Might be > >>> best > >>>>> to > >>>>>>>>>>> start > >>>>>>>>>>>>>>>>>>>>>>> tracking in > >>>>>>>>>>>>>>>>>>>>>>>>>> a JIRA issue as well. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> To fix, I'd bring the behind node down and back > >>>>> again. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Unfortunately, I'm pressed for time, but we > really > >>>>> need > >>>>>>>>>>> to > >>>>>>>>>>>>>> get > >>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>> bottom of this and fix it, or determine if it's > >>>>> fixed in > >>>>>>>>>>>>>> 4.2.1 > >>>>>>>>>>>>>>>>>>>>>>> (spreading > >>>>>>>>>>>>>>>>>>>>>>>>>> to mirrors now). > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson < > >>>>>>>>>>> jej2...@gmail.com > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry I didn't ask the obvious question. Is > >>> there > >>>>>>>>>>> anything > >>>>>>>>>>>>>>>> else > >>>>>>>>>>>>>>>>>>>>>>> that I > >>>>>>>>>>>>>>>>>>>>>>>>>>> should be looking for here and is this a bug? > >>> I'd > >>>>> be > >>>>>>>>>>> happy > >>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>> troll > >>>>>>>>>>>>>>>>>>>>>>>>>>> through the logs further if more information is > >>>>>>>>>>> needed, just > >>>>>>>>>>>>>>>> let > >>>>>>>>>>>>>>>>>> me > >>>>>>>>>>>>>>>>>>>>>>>>>> know. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Also what is the most appropriate mechanism to > >>> fix > >>>>>>>>>>> this. > >>>>>>>>>>>>>> Is it > >>>>>>>>>>>>>>>>>>>>>>>>>> required to > >>>>>>>>>>>>>>>>>>>>>>>>>>> kill the index that is out of sync and let solr > >>>>> resync > >>>>>>>>>>>>>> things? > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson < > >>>>>>>>>>>>>>>> jej2...@gmail.com > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> sorry for spamming here.... > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> shard5-core2 is the instance we're having > issues > >>>>>>>>>>> with... > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM > >>>>>>>>>>> org.apache.solr.common.SolrException > >>>>>>>>>>>>>>>> log > >>>>>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: shard update error StdNode: > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException > >>>>>>>>>>>>>>>>>>>>>>>>>> : > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Server at > >>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned > >>>>>>>>>>>>>>>>>>>>>>> non > >>>>>>>>>>>>>>>>>>>>>>>>>> ok > >>>>>>>>>>>>>>>>>>>>>>>>>>>> status:503, message:Service Unavailable > >>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) > >>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) > >>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) > >>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) > >>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>> > >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > >>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) > >>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > >>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>> > >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > >>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) > >>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >>>>>>>>>>>>>>>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:662) > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson > < > >>>>>>>>>>>>>>>>>> jej2...@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> here is another one that looks interesting > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM > >>>>>>>>>>>>>> org.apache.solr.common.SolrException > >>>>>>>>>>>>>>>> log > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: > >>>>>>>>>>> ClusterState > >>>>>>>>>>>>>>>> says > >>>>>>>>>>>>>>>>>>>>>>> we are > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the leader, but locally we don't think so > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>> > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>> > >>>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie > Johnson < > >>>>>>>>>>>>>>>>>> jej2...@gmail.com > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking at the master it looks like at some > >>> point > >>>>>>>>>>> there > >>>>>>>>>>>>>> were > >>>>>>>>>>>>>>>>>>>>>>> shards > >>>>>>>>>>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> went down. I am seeing things like what is > >>>>> below. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> NFO: A cluster state change: WatchedEvent > >>>>>>>>>>>>>>>> state:SyncConnected > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> type:NodeChildrenChanged path:/live_nodes, > has > >>>>>>>>>>> occurred - > >>>>>>>>>>>>>>>>>>>>>>>>>> updating... (live > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nodes size: 12) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Updating live nodes... (9) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Running the leader process. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Checking if I should try and be the > >>> leader. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: My last published State was Active, > it's > >>>>> okay > >>>>>>>>>>> to be > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>> leader. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: I may be the new leader - try and sync > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller > < > >>>>>>>>>>>>>>>>>>>>>>> markrmil...@gmail.com > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't think the versions you are thinking > >>> of > >>>>>>>>>>> apply > >>>>>>>>>>>>>> here. > >>>>>>>>>>>>>>>>>>>>>>> Peersync > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does not look at that - it looks at version > >>>>>>>>>>> numbers for > >>>>>>>>>>>>>>>>>>>>>>> updates in > >>>>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transaction log - it compares the last 100 > of > >>>>> them > >>>>>>>>>>> on > >>>>>>>>>>>>>>>> leader > >>>>>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>>>> replica. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What it's saying is that the replica seems > to > >>>>> have > >>>>>>>>>>>>>> versions > >>>>>>>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>>>>>>>> the leader > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does not. Have you scanned the logs for any > >>>>>>>>>>> interesting > >>>>>>>>>>>>>>>>>>>>>>> exceptions? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Did the leader change during the heavy > >>> indexing? > >>>>>>>>>>> Did > >>>>>>>>>>>>>> any zk > >>>>>>>>>>>>>>>>>>>>>>> session > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> timeouts occur? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson < > >>>>>>>>>>>>>>>> jej2...@gmail.com > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am currently looking at moving our Solr > >>>>> cluster > >>>>>>>>>>> to > >>>>>>>>>>>>>> 4.2 > >>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>>>> noticed a > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strange issue while testing today. > >>>>> Specifically > >>>>>>>>>>> the > >>>>>>>>>>>>>>>> replica > >>>>>>>>>>>>>>>>>>>>>>> has a > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> higher > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version than the master which is causing > the > >>>>>>>>>>> index to > >>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>>>>> replicate. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because of this the replica has fewer > >>> documents > >>>>>>>>>>> than > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>> master. > >>>>>>>>>>>>>>>>>>>>>>>>>> What > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could cause this and how can I resolve it > >>>>> short of > >>>>>>>>>>>>>> taking > >>>>>>>>>>>>>>>>>>>>>>> down the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> index > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and scping the right version in? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MASTER: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified:about an hour ago > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164880 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164880 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:2387 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:23 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REPLICA: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified: about an hour ago > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164773 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164773 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:3001 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:30 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the replicas log it says this: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Creating new http client, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>> > >>> > config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM > >>>>>>>>>>> org.apache.solr.update.PeerSync > >>>>>>>>>>>>>>>> sync > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> url= > >>> http://10.38.33.17:7577/solrSTARTreplicas=[ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>> http://10.38.33.16:7575/solr/dsc-shard5-core1/ > >>>>> ] > >>>>>>>>>>>>>>>>>> nUpdates=100 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM > >>>>>>>>>>> org.apache.solr.update.PeerSync > >>>>>>>>>>>>>>>>>>>>>>>>>> handleVersions > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Received 100 versions from > >>>>>>>>>>>>>>>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM > >>>>>>>>>>> org.apache.solr.update.PeerSync > >>>>>>>>>>>>>>>>>>>>>>>>>> handleVersions > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr Our > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> versions are newer. > >>>>>>>>>>> ourLowThreshold=1431233788792274944 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM > >>>>>>>>>>> org.apache.solr.update.PeerSync > >>>>>>>>>>>>>>>> sync > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE. > sync > >>>>>>>>>>> succeeded > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which again seems to point that it thinks > it > >>>>> has a > >>>>>>>>>>>>>> newer > >>>>>>>>>>>>>>>>>>>>>>> version of > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> index so it aborts. This happened while > >>>>> having 10 > >>>>>>>>>>>>>> threads > >>>>>>>>>>>>>>>>>>>>>>> indexing > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 10,000 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> items writing to a 6 shard (1 replica > each) > >>>>>>>>>>> cluster. > >>>>>>>>>>>>>> Any > >>>>>>>>>>>>>>>>>>>>>>> thoughts > >>>>>>>>>>>>>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or what I should look for would be > >>> appreciated. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>> > >>>>> > >>>>> > >>> > >>> > >> > >