just an update, I'm at 1M records now with no issues.  This looks promising
as to the cause of my issues, thanks for the help.  Is the routing method
with numShards documented anywhere?  I know numShards is documented but I
didn't know that the routing changed if you don't specify it.


On Wed, Apr 3, 2013 at 4:44 PM, Jamie Johnson <jej2...@gmail.com> wrote:

> with these changes things are looking good, I'm up to 600,000 documents
> without any issues as of right now.  I'll keep going and add more to see if
> I find anything.
>
>
> On Wed, Apr 3, 2013 at 4:01 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>
>> ok, so that's not a deal breaker for me.  I just changed it to match the
>> shards that are auto created and it looks like things are happy.  I'll go
>> ahead and try my test to see if I can get things out of sync.
>>
>>
>> On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller <markrmil...@gmail.com>wrote:
>>
>>> I had thought you could - but looking at the code recently, I don't
>>> think you can anymore. I think that's a technical limitation more than
>>> anything though. When these changes were made, I think support for that was
>>> simply not added at the time.
>>>
>>> I'm not sure exactly how straightforward it would be, but it seems
>>> doable - as it is, the overseer will preallocate shards when first creating
>>> the collection - that's when they get named shard(n). There would have to
>>> be logic to replace shard(n) with the custom shard name when the core
>>> actually registers.
>>>
>>> - Mark
>>>
>>> On Apr 3, 2013, at 3:42 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>>
>>> > answered my own question, it now says compositeId.  What is problematic
>>> > though is that in addition to my shards (which are say jamie-shard1) I
>>> see
>>> > the solr created shards (shard1).  I assume that these were created
>>> because
>>> > of the numShards param.  Is there no way to specify the names of these
>>> > shards?
>>> >
>>> >
>>> > On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson <jej2...@gmail.com>
>>> wrote:
>>> >
>>> >> ah interesting....so I need to specify num shards, blow out zk and
>>> then
>>> >> try this again to see if things work properly now.  What is really
>>> strange
>>> >> is that for the most part things have worked right and on 4.2.1 I have
>>> >> 600,000 items indexed with no duplicates.  In any event I will
>>> specify num
>>> >> shards clear out zk and begin again.  If this works properly what
>>> should
>>> >> the router type be?
>>> >>
>>> >>
>>> >> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller <markrmil...@gmail.com>
>>> wrote:
>>> >>
>>> >>> If you don't specify numShards after 4.1, you get an implicit doc
>>> router
>>> >>> and it's up to you to distribute updates. In the past, partitioning
>>> was
>>> >>> done on the fly - but for shard splitting and perhaps other
>>> features, we
>>> >>> now divvy up the hash range up front based on numShards and store it
>>> in
>>> >>> ZooKeeper. No numShards is now how you take complete control of
>>> updates
>>> >>> yourself.
>>> >>>
>>> >>> - Mark
>>> >>>
>>> >>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>> >>>
>>> >>>> The router says "implicit".  I did start from a blank zk state but
>>> >>> perhaps
>>> >>>> I missed one of the ZkCLI commands?  One of my shards from the
>>> >>>> clusterstate.json is shown below.  What is the process that should
>>> be
>>> >>> done
>>> >>>> to bootstrap a cluster other than the ZkCLI commands I listed
>>> above?  My
>>> >>>> process right now is run those ZkCLI commands and then start solr on
>>> >>> all of
>>> >>>> the instances with a command like this
>>> >>>>
>>> >>>> java -server -Dshard=shard5 -DcoreName=shard5-core1
>>> >>>> -Dsolr.data.dir=/solr/data/shard5-core1
>>> >>> -Dcollection.configName=solr-conf
>>> >>>> -Dcollection=collection1
>>> -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181
>>> >>>> -Djetty.port=7575 -DhostPort=7575 -jar start.jar
>>> >>>>
>>> >>>> I feel like maybe I'm missing a step.
>>> >>>>
>>> >>>> "shard5":{
>>> >>>>       "state":"active",
>>> >>>>       "replicas":{
>>> >>>>         "10.38.33.16:7575_solr_shard5-core1":{
>>> >>>>           "shard":"shard5",
>>> >>>>           "state":"active",
>>> >>>>           "core":"shard5-core1",
>>> >>>>           "collection":"collection1",
>>> >>>>           "node_name":"10.38.33.16:7575_solr",
>>> >>>>           "base_url":"http://10.38.33.16:7575/solr";,
>>> >>>>           "leader":"true"},
>>> >>>>         "10.38.33.17:7577_solr_shard5-core2":{
>>> >>>>           "shard":"shard5",
>>> >>>>           "state":"recovering",
>>> >>>>           "core":"shard5-core2",
>>> >>>>           "collection":"collection1",
>>> >>>>           "node_name":"10.38.33.17:7577_solr",
>>> >>>>           "base_url":"http://10.38.33.17:7577/solr"}}}
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller <markrmil...@gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>>> It should be part of your clusterstate.json. Some users have
>>> reported
>>> >>>>> trouble upgrading a previous zk install when this change came. I
>>> >>>>> recommended manually updating the clusterstate.json to have the
>>> right
>>> >>> info,
>>> >>>>> and that seemed to work. Otherwise, I guess you have to start from
>>> a
>>> >>> clean
>>> >>>>> zk state.
>>> >>>>>
>>> >>>>> If you don't have that range information, I think there will be
>>> >>> trouble.
>>> >>>>> Do you have an router type defined in the clusterstate.json?
>>> >>>>>
>>> >>>>> - Mark
>>> >>>>>
>>> >>>>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson <jej2...@gmail.com>
>>> wrote:
>>> >>>>>
>>> >>>>>> Where is this information stored in ZK?  I don't see it in the
>>> cluster
>>> >>>>>> state (or perhaps I don't understand it ;) ).
>>> >>>>>>
>>> >>>>>> Perhaps something with my process is broken.  What I do when I
>>> start
>>> >>> from
>>> >>>>>> scratch is the following
>>> >>>>>>
>>> >>>>>> ZkCLI -cmd upconfig ...
>>> >>>>>> ZkCLI -cmd linkconfig ....
>>> >>>>>>
>>> >>>>>> but I don't ever explicitly create the collection.  What should
>>> the
>>> >>> steps
>>> >>>>>> from scratch be?  I am moving from an unreleased snapshot of 4.0
>>> so I
>>> >>>>> never
>>> >>>>>> did that previously either so perhaps I did create the collection
>>> in
>>> >>> one
>>> >>>>> of
>>> >>>>>> my steps to get this working but have forgotten it along the way.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller <
>>> markrmil...@gmail.com>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>>> Thanks for digging Jamie. In 4.2, hash ranges are assigned up
>>> front
>>> >>>>> when a
>>> >>>>>>> collection is created - each shard gets a range, which is stored
>>> in
>>> >>>>>>> zookeeper. You should not be able to end up with the same id on
>>> >>>>> different
>>> >>>>>>> shards - something very odd going on.
>>> >>>>>>>
>>> >>>>>>> Hopefully I'll have some time to try and help you reproduce.
>>> Ideally
>>> >>> we
>>> >>>>>>> can capture it in a test case.
>>> >>>>>>>
>>> >>>>>>> - Mark
>>> >>>>>>>
>>> >>>>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson <jej2...@gmail.com>
>>> wrote:
>>> >>>>>>>
>>> >>>>>>>> no, my thought was wrong, it appears that even with the
>>> parameter
>>> >>> set I
>>> >>>>>>> am
>>> >>>>>>>> seeing this behavior.  I've been able to duplicate it on 4.2.0
>>> by
>>> >>>>>>> indexing
>>> >>>>>>>> 100,000 documents on 10 threads (10,000 each) when I get to
>>> 400,000
>>> >>> or
>>> >>>>>>> so.
>>> >>>>>>>> I will try this on 4.2.1. to see if I see the same behavior
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson <
>>> jej2...@gmail.com>
>>> >>>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> Since I don't have that many items in my index I exported all
>>> of
>>> >>> the
>>> >>>>>>> keys
>>> >>>>>>>>> for each shard and wrote a simple java program that checks for
>>> >>>>>>> duplicates.
>>> >>>>>>>>> I found some duplicate keys on different shards, a grep of the
>>> >>> files
>>> >>>>> for
>>> >>>>>>>>> the keys found does indicate that they made it to the wrong
>>> places.
>>> >>>>> If
>>> >>>>>>> you
>>> >>>>>>>>> notice documents with the same ID are on shard 3 and shard 5.
>>>  Is
>>> >>> it
>>> >>>>>>>>> possible that the hash is being calculated taking into account
>>> only
>>> >>>>> the
>>> >>>>>>>>> "live" nodes?  I know that we don't specify the numShards
>>> param @
>>> >>>>>>> startup
>>> >>>>>>>>> so could this be what is happening?
>>> >>>>>>>>>
>>> >>>>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" *
>>> >>>>>>>>> shard1-core1:0
>>> >>>>>>>>> shard1-core2:0
>>> >>>>>>>>> shard2-core1:0
>>> >>>>>>>>> shard2-core2:0
>>> >>>>>>>>> shard3-core1:1
>>> >>>>>>>>> shard3-core2:1
>>> >>>>>>>>> shard4-core1:0
>>> >>>>>>>>> shard4-core2:0
>>> >>>>>>>>> shard5-core1:1
>>> >>>>>>>>> shard5-core2:1
>>> >>>>>>>>> shard6-core1:0
>>> >>>>>>>>> shard6-core2:0
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson <
>>> jej2...@gmail.com>
>>> >>>>>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>> Something interesting that I'm noticing as well, I just
>>> indexed
>>> >>>>> 300,000
>>> >>>>>>>>>> items, and some how 300,020 ended up in the index.  I thought
>>> >>>>> perhaps I
>>> >>>>>>>>>> messed something up so I started the indexing again and
>>> indexed
>>> >>>>> another
>>> >>>>>>>>>> 400,000 and I see 400,064 docs.  Is there a good way to find
>>> >>>>> possibile
>>> >>>>>>>>>> duplicates?  I had tried to facet on key (our id field) but
>>> that
>>> >>>>> didn't
>>> >>>>>>>>>> give me anything with more than a count of 1.
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson <
>>> jej2...@gmail.com>
>>> >>>>>>> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>>> Ok, so clearing the transaction log allowed things to go
>>> again.
>>> >>> I
>>> >>>>> am
>>> >>>>>>>>>>> going to clear the index and try to replicate the problem on
>>> >>> 4.2.0
>>> >>>>>>> and then
>>> >>>>>>>>>>> I'll try on 4.2.1
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller <
>>> >>> markrmil...@gmail.com
>>> >>>>>>>> wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> No, not that I know if, which is why I say we need to get
>>> to the
>>> >>>>>>> bottom
>>> >>>>>>>>>>>> of it.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> - Mark
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson <
>>> jej2...@gmail.com>
>>> >>>>>>> wrote:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>> Mark
>>> >>>>>>>>>>>>> It's there a particular jira issue that you think may
>>> address
>>> >>>>> this?
>>> >>>>>>> I
>>> >>>>>>>>>>>> read
>>> >>>>>>>>>>>>> through it quickly but didn't see one that jumped out
>>> >>>>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" <
>>> jej2...@gmail.com>
>>> >>>>> wrote:
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> I brought the bad one down and back up and it did
>>> nothing.  I
>>> >>> can
>>> >>>>>>>>>>>> clear
>>> >>>>>>>>>>>>>> the index and try4.2.1. I will save off the logs and see
>>> if
>>> >>> there
>>> >>>>>>> is
>>> >>>>>>>>>>>>>> anything else odd
>>> >>>>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" <
>>> markrmil...@gmail.com>
>>> >>>>>>> wrote:
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> It would appear it's a bug given what you have said.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Any other exceptions would be useful. Might be best to
>>> start
>>> >>>>>>>>>>>> tracking in
>>> >>>>>>>>>>>>>>> a JIRA issue as well.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> To fix, I'd bring the behind node down and back again.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Unfortunately, I'm pressed for time, but we really need
>>> to
>>> >>> get
>>> >>>>> to
>>> >>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>> bottom of this and fix it, or determine if it's fixed in
>>> >>> 4.2.1
>>> >>>>>>>>>>>> (spreading
>>> >>>>>>>>>>>>>>> to mirrors now).
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> - Mark
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson <
>>> jej2...@gmail.com
>>> >>>>
>>> >>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Sorry I didn't ask the obvious question.  Is there
>>> anything
>>> >>>>> else
>>> >>>>>>>>>>>> that I
>>> >>>>>>>>>>>>>>>> should be looking for here and is this a bug?  I'd be
>>> happy
>>> >>> to
>>> >>>>>>>>>>>> troll
>>> >>>>>>>>>>>>>>>> through the logs further if more information is needed,
>>> just
>>> >>>>> let
>>> >>>>>>> me
>>> >>>>>>>>>>>>>>> know.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Also what is the most appropriate mechanism to fix this.
>>> >>> Is it
>>> >>>>>>>>>>>>>>> required to
>>> >>>>>>>>>>>>>>>> kill the index that is out of sync and let solr resync
>>> >>> things?
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson <
>>> >>>>> jej2...@gmail.com
>>> >>>>>>>>
>>> >>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> sorry for spamming here....
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> shard5-core2 is the instance we're having issues
>>> with...
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM
>>> org.apache.solr.common.SolrException
>>> >>>>> log
>>> >>>>>>>>>>>>>>>>> SEVERE: shard update error StdNode:
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException
>>> >>>>>>>>>>>>>>> :
>>> >>>>>>>>>>>>>>>>> Server at
>>> >>>>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned
>>> >>>>>>>>>>>> non
>>> >>>>>>>>>>>>>>> ok
>>> >>>>>>>>>>>>>>>>> status:503, message:Service Unavailable
>>> >>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
>>> >>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>> >>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
>>> >>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
>>> >>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>> >>>>>>>>>>>>>>>>>   at
>>> >>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>> >>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>> >>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>> >>>>>>>>>>>>>>>>>   at
>>> >>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>> >>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>> >>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>> >>>>>>>>>>>>>>>>>   at java.lang.Thread.run(Thread.java:662)
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson <
>>> >>>>>>> jej2...@gmail.com>
>>> >>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> here is another one that looks interesting
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM
>>> >>> org.apache.solr.common.SolrException
>>> >>>>> log
>>> >>>>>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException:
>>> ClusterState
>>> >>>>> says
>>> >>>>>>>>>>>> we are
>>> >>>>>>>>>>>>>>>>>> the leader, but locally we don't think so
>>> >>>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
>>> >>>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
>>> >>>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
>>> >>>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>>> >>>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
>>> >>>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>>> >>>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>>> >>>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>>> >>>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>> >>>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
>>> >>>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
>>> >>>>>>>>>>>>>>>>>>   at
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson <
>>> >>>>>>> jej2...@gmail.com
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> Looking at the master it looks like at some point
>>> there
>>> >>> were
>>> >>>>>>>>>>>> shards
>>> >>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>> went down.  I am seeing things like what is below.
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> NFO: A cluster state change: WatchedEvent
>>> >>>>> state:SyncConnected
>>> >>>>>>>>>>>>>>>>>>> type:NodeChildrenChanged path:/live_nodes, has
>>> occurred -
>>> >>>>>>>>>>>>>>> updating... (live
>>> >>>>>>>>>>>>>>>>>>> nodes size: 12)
>>> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>> >>>>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3
>>> >>>>>>>>>>>>>>>>>>> process
>>> >>>>>>>>>>>>>>>>>>> INFO: Updating live nodes... (9)
>>> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>> >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>> >>>>>>>>>>>>>>>>>>> runLeaderProcess
>>> >>>>>>>>>>>>>>>>>>> INFO: Running the leader process.
>>> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>> >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>> >>>>>>>>>>>>>>>>>>> shouldIBeLeader
>>> >>>>>>>>>>>>>>>>>>> INFO: Checking if I should try and be the leader.
>>> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>> >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>> >>>>>>>>>>>>>>>>>>> shouldIBeLeader
>>> >>>>>>>>>>>>>>>>>>> INFO: My last published State was Active, it's okay
>>> to be
>>> >>>>> the
>>> >>>>>>>>>>>> leader.
>>> >>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>> >>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>> >>>>>>>>>>>>>>>>>>> runLeaderProcess
>>> >>>>>>>>>>>>>>>>>>> INFO: I may be the new leader - try and sync
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller <
>>> >>>>>>>>>>>> markrmil...@gmail.com
>>> >>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> I don't think the versions you are thinking of apply
>>> >>> here.
>>> >>>>>>>>>>>> Peersync
>>> >>>>>>>>>>>>>>>>>>>> does not look at that - it looks at version numbers
>>> for
>>> >>>>>>>>>>>> updates in
>>> >>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>> transaction log - it compares the last 100 of them
>>> on
>>> >>>>> leader
>>> >>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>> replica.
>>> >>>>>>>>>>>>>>>>>>>> What it's saying is that the replica seems to have
>>> >>> versions
>>> >>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>> the leader
>>> >>>>>>>>>>>>>>>>>>>> does not. Have you scanned the logs for any
>>> interesting
>>> >>>>>>>>>>>> exceptions?
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> Did the leader change during the heavy indexing? Did
>>> >>> any zk
>>> >>>>>>>>>>>> session
>>> >>>>>>>>>>>>>>>>>>>> timeouts occur?
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> - Mark
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson <
>>> >>>>> jej2...@gmail.com
>>> >>>>>>>>
>>> >>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> I am currently looking at moving our Solr cluster
>>> to
>>> >>> 4.2
>>> >>>>> and
>>> >>>>>>>>>>>>>>> noticed a
>>> >>>>>>>>>>>>>>>>>>>>> strange issue while testing today.  Specifically
>>> the
>>> >>>>> replica
>>> >>>>>>>>>>>> has a
>>> >>>>>>>>>>>>>>>>>>>> higher
>>> >>>>>>>>>>>>>>>>>>>>> version than the master which is causing the index
>>> to
>>> >>> not
>>> >>>>>>>>>>>>>>> replicate.
>>> >>>>>>>>>>>>>>>>>>>>> Because of this the replica has fewer documents
>>> than
>>> >>> the
>>> >>>>>>>>>>>> master.
>>> >>>>>>>>>>>>>>> What
>>> >>>>>>>>>>>>>>>>>>>>> could cause this and how can I resolve it short of
>>> >>> taking
>>> >>>>>>>>>>>> down the
>>> >>>>>>>>>>>>>>>>>>>> index
>>> >>>>>>>>>>>>>>>>>>>>> and scping the right version in?
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> MASTER:
>>> >>>>>>>>>>>>>>>>>>>>> Last Modified:about an hour ago
>>> >>>>>>>>>>>>>>>>>>>>> Num Docs:164880
>>> >>>>>>>>>>>>>>>>>>>>> Max Doc:164880
>>> >>>>>>>>>>>>>>>>>>>>> Deleted Docs:0
>>> >>>>>>>>>>>>>>>>>>>>> Version:2387
>>> >>>>>>>>>>>>>>>>>>>>> Segment Count:23
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> REPLICA:
>>> >>>>>>>>>>>>>>>>>>>>> Last Modified: about an hour ago
>>> >>>>>>>>>>>>>>>>>>>>> Num Docs:164773
>>> >>>>>>>>>>>>>>>>>>>>> Max Doc:164773
>>> >>>>>>>>>>>>>>>>>>>>> Deleted Docs:0
>>> >>>>>>>>>>>>>>>>>>>>> Version:3001
>>> >>>>>>>>>>>>>>>>>>>>> Segment Count:30
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> in the replicas log it says this:
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> INFO: Creating new http client,
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
>>> org.apache.solr.update.PeerSync
>>> >>>>> sync
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
>>> >>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrSTART replicas=[
>>> >>>>>>>>>>>>>>>>>>>>> http://10.38.33.16:7575/solr/dsc-shard5-core1/]
>>> >>>>>>> nUpdates=100
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
>>> org.apache.solr.update.PeerSync
>>> >>>>>>>>>>>>>>> handleVersions
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
>>> >>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr
>>> >>>>>>>>>>>>>>>>>>>>> Received 100 versions from
>>> >>>>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
>>> org.apache.solr.update.PeerSync
>>> >>>>>>>>>>>>>>> handleVersions
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
>>> >>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr  Our
>>> >>>>>>>>>>>>>>>>>>>>> versions are newer.
>>> ourLowThreshold=1431233788792274944
>>> >>>>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
>>> org.apache.solr.update.PeerSync
>>> >>>>> sync
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
>>> >>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync
>>> succeeded
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> which again seems to point that it thinks it has a
>>> >>> newer
>>> >>>>>>>>>>>> version of
>>> >>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>> index so it aborts.  This happened while having 10
>>> >>> threads
>>> >>>>>>>>>>>> indexing
>>> >>>>>>>>>>>>>>>>>>>> 10,000
>>> >>>>>>>>>>>>>>>>>>>>> items writing to a 6 shard (1 replica each)
>>> cluster.
>>> >>> Any
>>> >>>>>>>>>>>> thoughts
>>> >>>>>>>>>>>>>>> on
>>> >>>>>>>>>>>>>>>>>>>> this
>>> >>>>>>>>>>>>>>>>>>>>> or what I should look for would be appreciated.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>
>>> >>>
>>> >>
>>>
>>>
>>
>

Reply via email to