Hi Drew, I tried the kafka-server-stop script and it worked for me. Wondering which OS are you using?
Guozhang On Mon, Dec 23, 2013 at 10:57 AM, Drew Goya <d...@gradientx.com> wrote: > Occasionally I do have to hard kill brokers, the kafka-server-stop.sh > script stopped working for me a few months ago. I saw another thread in > the mailing list mentioning the issue too. I'll change the signal back to > SIGTERM and run that way for a while, hopefully the problem goes away. > > This is the commit where it changed: > > > https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0 > > > On Mon, Dec 23, 2013 at 10:09 AM, Neha Narkhede <neha.narkh...@gmail.com > >wrote: > > > Are you hard killing the brokers? And is this issue reproducible? > > > > > > On Sat, Dec 21, 2013 at 11:39 AM, Drew Goya <d...@gradientx.com> wrote: > > > > > Hey guys, another small issue to report for 0.8.1. After a couple > days 3 > > > of my brokers had fallen off the ISR list for a 2-3 of their > partitions. > > > > > > I didn't see anything unusual in the log and I just restarted one. It > > came > > > up fine but as it loaded its logs I these messages showed up: > > > > > > [2013-12-21 19:25:19,968] WARN [ReplicaFetcherThread-0-2], Replica 1 > for > > > partition [Events2,58] reset its fetch offset to current leader 2's > start > > > offset 1042738519 (kafka.server.ReplicaFetcherThread) > > > [2013-12-21 19:25:19,969] WARN [ReplicaFetcherThread-0-14], Replica 1 > for > > > partition [Events2,28] reset its fetch offset to current leader 14's > > start > > > offset 1043415514 (kafka.server.ReplicaFetcherThread) > > > [2013-12-21 19:25:20,012] WARN [ReplicaFetcherThread-0-2], Current > offset > > > 1011209589 for partition [Events2,58] out of range; reset offset to > > > 1042738519 (kafka.server.ReplicaFetcherThread) > > > [2013-12-21 19:25:20,013] WARN [ReplicaFetcherThread-0-14], Current > > offset > > > 1010086751 for partition [Events2,28] out of range; reset offset to > > > 1043415514 (kafka.server.ReplicaFetcherThread) > > > [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-14], Replica 1 > for > > > partition [Events2,71] reset its fetch offset to current leader 14's > > start > > > offset 1026871415 (kafka.server.ReplicaFetcherThread) > > > [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-2], Replica 1 > for > > > partition [Events2,44] reset its fetch offset to current leader 2's > start > > > offset 1052372907 (kafka.server.ReplicaFetcherThread) > > > [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-14], Current > > offset > > > 993879706 for partition [Events2,71] out of range; reset offset to > > > 1026871415 (kafka.server.ReplicaFetcherThread) > > > [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-2], Current > offset > > > 1020715056 for partition [Events2,44] out of range; reset offset to > > > 1052372907 (kafka.server.ReplicaFetcherThread) > > > > > > Judging by the network traffic and disk usage changes after the reboot > > > (both jumped up) a couple of the partition replicas had fallen behind > and > > > are now catching up. > > > > > > > > > On Thu, Dec 19, 2013 at 4:37 PM, Neha Narkhede < > neha.narkh...@gmail.com > > > >wrote: > > > > > > > Hi Drew, > > > > > > > > That problem will be fixed by > > > > https://issues.apache.org/jira/browse/KAFKA-1074. I think we are > close > > > to > > > > checking that in to trunk. > > > > > > > > Thanks, > > > > Neha > > > > > > > > > > > > On Wed, Dec 18, 2013 at 9:02 AM, Drew Goya <d...@gradientx.com> > wrote: > > > > > > > > > Thanks Neha, I rolled upgrades and completed a rebalance! > > > > > > > > > > I ran into a few small issues I figured I would share. > > > > > > > > > > On a few Brokers, there were some log directories left over from > some > > > > > failed rebalances which prevented the 0.8.1 brokers from starting > > once > > > I > > > > > completed the upgrade. These directories contained an index file > > and a > > > > > zero size log file, once I cleaned those out the brokers were able > to > > > > start > > > > > up fine. If anyone else runs into the same problem, and is running > > > RHEL, > > > > > this is the bash script I used to clean them out: > > > > > > > > > > du --max-depth=1 -h /data/kafka/logs | grep K | sed s/.*K.// | sudo > > rm > > > -r > > > > > > > > > > > > > > > On Tue, Dec 17, 2013 at 10:42 AM, Neha Narkhede < > > > neha.narkh...@gmail.com > > > > > >wrote: > > > > > > > > > > > There are no compatibility issues. You can roll upgrades through > > the > > > > > > cluster one node at a time. > > > > > > > > > > > > Thanks > > > > > > Neha > > > > > > > > > > > > > > > > > > On Tue, Dec 17, 2013 at 9:15 AM, Drew Goya <d...@gradientx.com> > > > wrote: > > > > > > > > > > > > > So I'm going to be going through the process of upgrading a > > cluster > > > > > from > > > > > > > 0.8.0 to the trunk (0.8.1). > > > > > > > > > > > > > > I'm going to be expanding this cluster several times and the > > > problems > > > > > > with > > > > > > > reassigning partitions in 0.8.0 mean I have to move to > > trunk(0.8.1) > > > > > asap. > > > > > > > > > > > > > > Will it be safe to roll upgrades through the cluster one by > one? > > > > > > > > > > > > > > Also are there any client compatibility issues I need to worry > > > about? > > > > > > Am I > > > > > > > going to need to pause/upgrade all my consumers/producers at > once > > > or > > > > > can > > > > > > I > > > > > > > roll upgrades through the cluster and then upgrade my clients > one > > > by > > > > > one? > > > > > > > > > > > > > > Thanks in advance! > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- -- Guozhang