The exceptions seem to indicate that they couldn't find files to compact. 
Without more information it is hard to tell, but it doesn't look like that is 
what's causing your problems. Also, the load you're imposing shouldn't be 
saturating, unless you're underprovisioning. Are you using two disks for each 
bookie?

-Flavio

On Apr 4, 2012, at 1:11 AM, Aniruddha Laud wrote:

> There are no bookkeeper errors at around the time the hedwig hub disconnects 
> from the bookies. However, a few minutes before the disconnection, I see some 
> exceptions thrown in the bookkeeper log 
> 
> I've attached the exceptions in a file. 
> 
> We are trying to load test hedwig. Around 1000 QPS for one topic were 
> sustained for about 30 minutes. We then cranked up the load to around 2000 
> QPS for the same topic and we got this error. The setup is 15 hedwig hubs and 
> 15 bookies with ensemble size of 5 and replication factor of 3. 
> 
> Regards, 
> Aniruddha. 
> 
> On Tue, Apr 3, 2012 at 2:09 AM, Ivan Kelly <iv...@yahoo-inc.com> wrote:
> This type of disconnection occurs when there's a read timeout from one of the 
> bookies. The cause could be something crashing on the bookie side, or simply 
> a very slow network. What type of network are you running this in?
> Do you have any logs on the bookie side?
> 
> -Ivan
> 
> On 3 Apr 2012, at 03:22, Aniruddha Laud wrote:
> 
> > While sending requests to a hedwig hub, the hub seems to disconnect from
> > the bookies and never connects back. The logfile contains
> >
> > 2012-04-02 22:33:09,207 - INFO [Hashed wheel timer
> > #3:PerChannelBookieClient@409] - Disconnected from bookie: /
> > 10.35.84.103:3181
> > 2012-04-02 22:33:09,211 - INFO [Hashed wheel timer
> > #4:PerChannelBookieClient@409] - Disconnected from bookie: /
> > 10.34.133.114:3181
> > 2012-04-02 22:33:09,214 - INFO [Hashed wheel timer
> > #5:PerChannelBookieClient@409] - Disconnected from bookie: /
> > 10.35.89.103:3181
> > 2012-04-02 22:33:09,217 - INFO [Hashed wheel timer
> > #8:PerChannelBookieClient@409] - Disconnected from bookie: /
> > 10.35.91.102:3181
> > 2012-04-02 22:33:09,247 - INFO [Hashed wheel timer
> > #10:PerChannelBookieClient@409] - Disconnected from bookie: /
> > 10.34.234.125:3181
> > 2012-04-02 22:33:09,256 - INFO [Hashed wheel timer
> > #7:PerChannelBookieClient@409] - Disconnected from bookie: /
> > 10.34.235.129:3181
> >
> > Some time before getting this message, the "Got response for ..." messages
> > stop and there are only "Successfully wrote request ..." messages in the
> > hedwig log file. The bookkeeper log-file shows no indication of the
> > connection being lost. All the bookies and hedwig hubs are up and running
> > and I am able to connect to them with the hedwig console and able to create
> > new topics and publish/subscribe to them. But I'm not able to publish or
> > subscribe to the topic that caused the errors. About 200,000 entries were
> > created in the topic that caused this error.
> >
> > I'm unable to attach the log files or even portions of it, because the
> > relevant portions are around 3MB.
> >
> > Regards,
> > Aniruddha.
> 
> 
> <bookieexception.txt>

flavio
junqueira
senior research scientist
 
f...@yahoo-inc.com
direct +34 93-183-8828
 
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

Reply via email to