Hi,
We sometimes fail in a stop of attrd.
Step1. start a cluster in 2 nodes
Step2. stop the first node.(/etc/init.d/heartbeat stop.)
Step3. stop the second node after time passed a little.(/etc/init.d/heartbeat
stop.)
The attrd catches the TERM signal, but does not stop.
(snip)
Oct 5 02:37:38
Hi Dejan,
Hi Lars,
In our environment, the problem recurred with the patch of Mr. Lars.
After a problem occurred, I sent TERM signal, but attrd does not seem to
receive TERM at all.
The reconsideration of the patch is necessary for the solution to problem.
Best Regards,
Hideo Yamauchi.
--- On
On Thu, Dec 22, 2011 at 09:54:47AM +0900, renayama19661...@ybb.ne.jp wrote:
> Hi Dejan,
> Hi Lars,
>
> In our environment, the problem recurred with the patch of Mr. Lars.
> After a problem occurred, I sent TERM signal, but attrd does not seem to
> receive TERM at all.
If you are able to reproduc
Hi Lars,
> If you are able to reproduce,
> you could try to find out what exactly attrd is doing.
>
> various ways to try to do that:
> cat /proc//stack # if your platform supports that
> strace it,
> ltrace it,
> attach with gdb and provide a stack trace, or even start to single step it,
> cau
Hi Lars,
I attach strace file when a problem reappeared at the end of last year.
I used glue which applied your patch for confirmation.
It is the file which I picked with attrd by strace -p command right before I
stop Heartbeat.
Finally SIGTERM caught it, but attrd did not stop.
The attrd stopp
Hi Lars,
Hi Dejan,
I got ltrace file when a problem occurred.
I attach ltrace file.
The investigation in gdb continues it and performs it.
If there is suggestion of any improvement, please tell me.
Best Regards,
Hideo Yamauchi.
--- On Tue, 2012/1/10, renayama19661...@ybb.ne.jp
wrote:
> Hi
On Tue, Jan 10, 2012 at 04:43:51PM +0900, renayama19661...@ybb.ne.jp wrote:
> Hi Lars,
>
> I attach strace file when a problem reappeared at the end of last year.
> I used glue which applied your patch for confirmation.
>
> It is the file which I picked with attrd by strace -p command right befor
Hi Lars,
Thank you for comments and suggestion.
> > poll([{fd=7, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI}, {fd=5,
> > events=POLLIN|POLLPRI}], 3, -1
>
> Note the -1 (infinity timeout!)
>
> So even though the trigger was (presumably) set,
> and the ->prepare() should have returned
On Sun, Jan 15, 2012 at 1:57 AM, Lars Ellenberg
wrote:
> On Tue, Jan 10, 2012 at 04:43:51PM +0900, renayama19661...@ybb.ne.jp wrote:
>> Hi Lars,
>>
>> I attach strace file when a problem reappeared at the end of last year.
>> I used glue which applied your patch for confirmation.
>>
>> It is the f
On Mon, Jan 16, 2012 at 04:46:58PM +1100, Andrew Beekhof wrote:
> > Now we proceed to the next mainloop poll:
> >
> >> poll([{fd=7, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI}, {fd=5,
> >> events=POLLIN|POLLPRI}], 3, -1
> >
> > Note the -1 (infinity timeout!)
> >
> > So even though the t
I know I could just apply the patch and be done, but I'd like to
understand this so it works for the right reason.
On Mon, Jan 16, 2012 at 7:30 PM, Lars Ellenberg
wrote:
> On Mon, Jan 16, 2012 at 04:46:58PM +1100, Andrew Beekhof wrote:
>> > Now we proceed to the next mainloop poll:
>> >
>> >> pol
On Mon, Jan 16, 2012 at 11:27 PM, Andrew Beekhof wrote:
> I know I could just apply the patch and be done, but I'd like to
> understand this so it works for the right reason.
>
> On Mon, Jan 16, 2012 at 7:30 PM, Lars Ellenberg
> wrote:
>> On Mon, Jan 16, 2012 at 04:46:58PM +1100, Andrew Beekhof w
On Mon, Jan 16, 2012 at 11:30 PM, Andrew Beekhof wrote:
> On Mon, Jan 16, 2012 at 11:27 PM, Andrew Beekhof wrote:
>> I know I could just apply the patch and be done, but I'd like to
>> understand this so it works for the right reason.
>>
>> On Mon, Jan 16, 2012 at 7:30 PM, Lars Ellenberg
>> wrot
On Mon, Jan 16, 2012 at 11:42 PM, Andrew Beekhof wrote:
> On Mon, Jan 16, 2012 at 11:30 PM, Andrew Beekhof wrote:
>> On Mon, Jan 16, 2012 at 11:27 PM, Andrew Beekhof wrote:
>>> I know I could just apply the patch and be done, but I'd like to
>>> understand this so it works for the right reason.
On Mon, Jan 16, 2012 at 11:42:32PM +1100, Andrew Beekhof wrote:
> >>> http://developer.gnome.org/glib/2.30/glib-The-Main-Event-Loop.html#GSourceFuncs
> >>>
> >>> iiuc, mainloop does something similar to (oversimplified):
> >>> timeout = -1; /* infinity */
> >>> for s in all GSource
>
On Tue, Jan 17, 2012 at 09:52:35AM +1100, Andrew Beekhof wrote:
> On Mon, Jan 16, 2012 at 11:42 PM, Andrew Beekhof wrote:
> > On Mon, Jan 16, 2012 at 11:30 PM, Andrew Beekhof wrote:
> >> On Mon, Jan 16, 2012 at 11:27 PM, Andrew Beekhof
> >> wrote:
> >>> I know I could just apply the patch and b
On Tue, Jan 17, 2012 at 10:11 AM, Lars Ellenberg
wrote:
> On Mon, Jan 16, 2012 at 11:42:32PM +1100, Andrew Beekhof wrote:
>> >>> http://developer.gnome.org/glib/2.30/glib-The-Main-Event-Loop.html#GSourceFuncs
>> >>>
>> >>> iiuc, mainloop does something similar to (oversimplified):
>> >>> ti
Hi,
I've seen a very similar problem in a recent release. In fact, I'm in
the process of reproducing it so that it can be properly logged and so
on. When I get the right data for the bug report, I'll attach it to the
bug.
FWIW: I'm pretty sure that the signal was properly received by attrd
Hi Alan,
Thank you for comment.
We reproduce a problem, too and are going to send a report.
However, the problem does not reappear for the moment.
Best Regards,
Hideo Yamauchi.
--- On Thu, 2011/10/20, Alan Robertson wrote:
> Hi,
>
> I've seen a very similar problem in a recent release. In f
On 10/20/2011 07:30 PM, renayama19661...@ybb.ne.jp wrote:
Hi Alan,
Thank you for comment.
We reproduce a problem, too and are going to send a report.
However, the problem does not reappear for the moment.
I gather that the folks on the test team for my project have it happen
fairly often when
On Tue, Oct 18, 2011 at 12:19 PM, wrote:
> Hi,
>
> We sometimes fail in a stop of attrd.
>
> Step1. start a cluster in 2 nodes
> Step2. stop the first node.(/etc/init.d/heartbeat stop.)
> Step3. stop the second node after time passed a little.(/etc/init.d/heartbeat
> stop.)
>
> The attrd catches
Hi Andrew,
Hi Alan,
We work hard to collect the evidence of reproduction and the problem of the
phenomenon.
However, we do not yet get the evidence.
I will wait for the information from Alan.
Best Regards,
Hideo Yamauchi.
--- On Wed, 2011/11/2, Andrew Beekhof wrote:
> On Tue, Oct 18, 2011 a
On Thu, Nov 03, 2011 at 01:49:46AM +1100, Andrew Beekhof wrote:
> On Tue, Oct 18, 2011 at 12:19 PM, wrote:
> > Hi,
> >
> > We sometimes fail in a stop of attrd.
> >
> > Step1. start a cluster in 2 nodes
> > Step2. stop the first node.(/etc/init.d/heartbeat stop.)
> > Step3. stop the second node a
On Mon, Nov 7, 2011 at 8:39 AM, Lars Ellenberg
wrote:
> On Thu, Nov 03, 2011 at 01:49:46AM +1100, Andrew Beekhof wrote:
>> On Tue, Oct 18, 2011 at 12:19 PM, wrote:
>> > Hi,
>> >
>> > We sometimes fail in a stop of attrd.
>> >
>> > Step1. start a cluster in 2 nodes
>> > Step2. stop the first node
On Mon, Nov 14, 2011 at 11:58:09AM +1100, Andrew Beekhof wrote:
> On Mon, Nov 7, 2011 at 8:39 AM, Lars Ellenberg
> wrote:
> > On Thu, Nov 03, 2011 at 01:49:46AM +1100, Andrew Beekhof wrote:
> >> On Tue, Oct 18, 2011 at 12:19 PM, wrote:
> >> > Hi,
> >> >
> >> > We sometimes fail in a stop of attr
Hi,
On Mon, Nov 14, 2011 at 01:17:37PM +0100, Lars Ellenberg wrote:
> On Mon, Nov 14, 2011 at 11:58:09AM +1100, Andrew Beekhof wrote:
> > On Mon, Nov 7, 2011 at 8:39 AM, Lars Ellenberg
> > wrote:
> > > On Thu, Nov 03, 2011 at 01:49:46AM +1100, Andrew Beekhof wrote:
> > >> On Tue, Oct 18, 2011 at
Hi Dejan,
Hi Lars,
I understood it.
I try the operation of the patch in our environment.
To Alan: Will you try a patch?
Best Regards,
Hideo Yamauchi.
--- On Tue, 2011/11/15, Dejan Muhamedagic wrote:
> Hi,
>
> On Mon, Nov 14, 2011 at 01:17:37PM +0100, Lars Ellenberg wrote:
> > On Mon, Nov 14,
27 matches
Mail list logo