Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-09-28 Thread renayama19661014
Hi Andrew, Thank you for comment. The problem here is that attrd is supposed to be the authoritative source for this sort of data. Yes. I understand. Additionally, you don't always want attrd reading from the status section - like after the cluster restarts. The problem seems to be able

[Pacemaker] [Problem]Lost fail-count.

2010-09-29 Thread renayama19661014
Hi, We examined the trouble outbreak of a resource during cluster division and the recovery of the cluster. However, at the time of cluster recovery, the phenomenon that fail-count disappeared occurred. Failed-Actions did not disappear then. In the next procedure, it occurred. Step1)We

Re: [Pacemaker] About behavior in Action Lost.

2010-09-29 Thread renayama19661014
Hi Andrew, Sorry, it probably got rebased before I pushed it. http://hg.clusterlabs.org/pacemaker/1.1/rev/dd8e37df3e96 should be the right link Thanks!! Hideo Yamuachi. --- Andrew Beekhof and...@beekhof.net wrote: Sorry, it probably got rebased before I pushed it.

Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-09-30 Thread renayama19661014
Hi Andrew, Thank you for comment. During crmd startup, one could read all the values from attrd into the hashtable. So the hashtable would only do something if only attrd went down. If attrd communicates with crmd at the time of start and reads the data of the hash table, the problem seems

[Pacemaker] [GUI]Compatibility issues of Python.

2010-09-30 Thread renayama19661014
Hi Yan, I operated latest GUI for Japanization of GUI. However, on RHEL5.5, GUI causes an error by the operation of the resource. There seems to be a cause in the difference of the version of Python. (snip) def on_rsc_action(self, action) : (cur_type, cur_name) =

Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-10-04 Thread renayama19661014
Hi Andrew, Thank you for comment. Is the change of this attrd and crmd difficult? I dont think so. But its not a huge priority because I've never heard of attrd actually crashing. So while I agree that its theoretically a problem, in practice no-one is going to hit this in

[Pacemaker] Election Timeout and node became the Pending state.

2010-10-04 Thread renayama19661014
Hi, We tested complicated node trouble. An error of Election Timeout occurred then. * Pacemaker:pacemaker-1.0.9.1 * heartbeat-3.0.3-2.3.el5 * cluster-glue:cluster-glue-1.0.6-1.6.el5 * resource-agents-1.0.3-1.0.dev.b7a3b1973ba7 We tested it in the next procedure. Step1) Start all nodes.

Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-10-05 Thread renayama19661014
Hi Andrew, I registered these contents with Bugzilla as enhancement of the functions. * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2501 Thanks, Hideo Yamauchi. --- renayama19661...@ybb.ne.jp wrote: Hi Andrew, Thank you for comment. Is the change of this attrd and

[Pacemaker] [Problem]The monitor that start-delay is long does not stop.

2010-10-07 Thread renayama19661014
Hi, I operated the next to confirm the contribution of the mailing list. * http://www.gossamer-threads.com/lists/linuxha/pacemaker/66939 Step1) I prepare cib.xml having monitor which set start-delay than five minutes.. Step2) I start two nodes and send cib. Last updated: Thu

Re: [Pacemaker] [Problem]The monitor that start-delay is long does not stop.

2010-10-07 Thread renayama19661014
Hi Andrew, Thank you for comment. Funnily enough I was just looking at that message and saw that the code relevant to this one looked wrong too. I believe this should fix the issue: http://hg.clusterlabs.org/pacemaker/1.1/rev/e06810256413 I registered log and more with Bugzilla.

Re: [Pacemaker] [GUI]Compatibility issues of Python.

2010-10-09 Thread renayama19661014
Hi Yan, It should work with python 2.5 now: http://hg.clusterlabs.org/pacemaker/pygui/rev/16a7d8a5d3eb Thank you for a revision. I confirm the Japanese display of a new po-file in revised GUI. If confirmation is over, I contact you. Best Regards, Hideo Yamauchi. --- Yan Gao

Re: [Pacemaker] [Problem]The monitor that start-delay is long does not stop.

2010-10-12 Thread renayama19661014
Hi Andrew, Funnily enough I was just looking at that message and saw that the code relevant to this one looked wrong too. I believe this should fix the issue: http://hg.clusterlabs.org/pacemaker/1.1/rev/e06810256413 I registered log and more with Bugzilla. #65533;*

Re: [Pacemaker] [GUI]Compatibility issues of Python.

2010-10-12 Thread renayama19661014
Hi Yan, * http://www.gossamer-threads.com/lists/linuxha/pacemaker/67046 Appreciate your good work! Thanks! Pushed them: http://hg.clusterlabs.org/pacemaker/pygui/rev/9920f30d364c http://hg.clusterlabs.org/pacemaker/pygui/rev/af237f362f13 Thank you for revision of hg-GUI. Best Regards,

[Pacemaker] Time to a service stop is very long.

2010-10-21 Thread renayama19661014
Hi, We confirmed movement when we set freeze in no-quorum-policy. In the cluster that freeze setting became effective, we stopped the service. However, a stop of the service took time very much. We set shutdown-escalation for five minutes to shorten the time for test. But, a stop of the service

[Pacemaker] [Problem]Failed in node recovery in no-quorum-policy=freeze.

2010-10-26 Thread renayama19661014
Hi, We found a problem about the change of the inside number of the clone resource. This problem influences it and fails in the reconfiguration of a divided cluster. * The divided cluster cannot constitute a cluster again. Because a log file is big, we register the details with next Bugzilla.

Re: [Pacemaker] Time to a service stop is very long.

2010-10-27 Thread renayama19661014
Hi Andrew, Wait, I think I read that wrong. I would expect that no-matter what that pacemaker would exit after shutdown-escalation. You're saying it didn't? Better create a bug and attach the logs. At the time of Step4, srv03,srv04 requested a stop of the Heartbeat service. To see log,

[Pacemaker] [Question]About the recovery procedure from the state that a node was divided.

2010-11-03 Thread renayama19661014
Hi All, We tested it about the recovery procedure from the state that a node was divided. (As for four nodes, three nodes are active, and one node is constitution of the standby.) It is the restoration from a state divided by two nodes that we set in no-quorum-policy=freeze. The resource

[Pacemaker] [Problem]Number of times control of the fail-count is late.

2010-11-09 Thread renayama19661014
Hi, We constituted a cluster by two node constitution. The migration-threshold set it to 2. We confirmed a phenomenon in the next procedure. Step1) Start two nodes and send config5.crm. (The clnDiskd-resources is original.) Last updated: Tue Nov 9 21:10:49 2010 Stack: Heartbeat

Re: [Pacemaker] [Problem]Number of times control of the fail-count is late.

2010-11-12 Thread renayama19661014
Hi Andrew, Thank you for comment. It seems to be a problem that update of fail-count was late. But, this problem seems to occur by a timing. It affects it in fail over time of the resource that the control number of times of fail-count is wrong. Is this problem already

Re: [Pacemaker] [Question]About the recovery procedure from the state that a node was divided.

2010-11-15 Thread renayama19661014
Hi Andrew, If there is not a procedure of Step3, I think that the bug that I reported before is easy to occur. #65533;* http://developerbugs.linux-foundation.org/show_bug.cgi?id=2508 I think that this bug influences that a procedure of step3 is necessary. Hopefully we'll get that

Re: [Pacemaker] Transition-graph of pengine does a loop when configuration set order of stonith.

2010-12-01 Thread renayama19661014
Hi Andrew, It is necessary for us to set order of a stonith resource and the clone resource. #65533;* hb_report attached it to Bugzilla. #65533;* http://developerbugs.linux-foundation.org/show_bug.cgi?id=2529 Thanks for the report, I'll try to follow up there soon Thannks!! Hideo

Re: [Pacemaker] [Problem]The movement of the resource is not possible.

2010-12-01 Thread renayama19661014
Hi Andrew, Can 1.0 reflect this revision? Because there is influence else, is it impossible? I have no objection to it being added to 1.0, it should be safe. Thanks. About 1.0, I ask Mr. Mori for backporting. Will you revise 1.1? Best Regards, Hideo Yamauchi. --- Andrew Beekhof

Re: [Pacemaker] It affects it that the update of the attribute by attrd is late, and a resource starts with a standby node.

2010-12-01 Thread renayama19661014
Hi Andrew, Step1) 192.168.40.3 addresses invalidate the understanding of ping. Not sure I understand this, can you rephrase? Sorry For pingd, we address 2 of the next. * 192.168.4.2 * 192.168.4.3 When one address cannot communicate, this problem occurs. When cluster can communicate

Re: [Pacemaker] [Problem]The movement of the resource is not possible.

2010-12-01 Thread renayama19661014
Hi Andrew, I send a patch to 1.1. Mr. Mori performs the backporting for 1.0. Best Regards, Hideo Yamauchi. --- renayama19661...@ybb.ne.jp wrote: Hi Andrew, Can 1.0 reflect this revision? Because there is influence else, is it impossible? I have no objection to it being added

[Pacemaker] [Problem]post_notify_start_0 is carried out in the node that disappeared.

2011-02-15 Thread renayama19661014
Hi all, We test trouble at the time of the start of the Master/Slave resource. Step1) We start the first node and send cib. Last updated: Thu Feb 10 16:32:12 2011 Stack: Heartbeat Current DC: srv01 (c7435833-8bc5-43aa-8195-c666b818677f) - partition with quorum Version:

Re: [Pacemaker] [Problem]post_notify_start_0 is carried out in the node that disappeared.

2011-02-15 Thread renayama19661014
Hi Andrew, Thank you for comment. Perhaps I misunderstood - does the node fail _while_ we're running post_notify_start_0? Is that the ordering you're talking about? Yes. I think that stonith do not have to wait for post_notify_start_0 of the inoperative node. If so, then the crmd is

[Pacemaker] About the depth appointment of the monitor.

2011-02-17 Thread renayama19661014
Hi all, I performed the depth appointment of the Filesystem resource in OCF_CHECK_LEVEL. The monitor of depth seems to work well. I was going to perform the same thing by the depeth appointment of the monitor action. But, it does not move well.(This appointment becomes the error...) (snip)

Re: [Pacemaker] About the depth appointment of the monitor.

2011-02-17 Thread renayama19661014
Hi Dejan, I was going to perform the same thing by the depeth appointment of the monitor action. But, it does not move well.(This appointment becomes the error...) The depth attribute hasn't been implemented, OCF_CHECK_LEVEL has to be used. Thank you for comment. All right. Thanks!

Re: [Pacemaker] [Problem]post_notify_start_0 is carried out in the node that disappeared.

2011-02-18 Thread renayama19661014
Hi Andrew, Thank you for comment. If you need detailed information, give me communication. Should be enough in the bug, i'll follow up there All right. Thanks! Hideo Yamauchi. --- On Tue, 2011/2/15, Andrew Beekhof and...@beekhof.net wrote: On Tue, Feb 15, 2011 at 3:01 PM, 

[Pacemaker] [Problem]The trouble of the slave node influences a master.

2011-03-29 Thread renayama19661014
Hi, We examined master slave constitution of drbd. We made a node of iSCSI in drbd as data of postgreSQL. We confirmed stop trouble of drbd in an iSCSI node. Step1) We start an iSCSI node. (Node C and Node D) * We use a stonith module(stonith-helper) to need time for. Last

Re: [Pacemaker] [Problem]The trouble of the slave node influences a master.

2011-04-04 Thread renayama19661014
Hi All, As a result of having investigated it in various ways, there seemed to be the problem in a version of drbd which we used. The problem was settled when we changed it into drbd8.3.9. The details of the cause are unclear. However, please ignore the report of this email because it was

[Pacemaker] Patch to crm command.

2011-05-25 Thread renayama19661014
Hi, A load command of the crm command is a wrong order, but is handled as update. [root@srv01 ~]# crm configure load upadate trac1383.crm.dampen5s --- miss upadate crm_verify[7602]: 2011/04/19_17:09:40 WARN: unpack_nodes: Blind faith: not fencing unseen nodes I send a patch. It is displayed

Re: [Pacemaker] Patch to crm command.

2011-05-26 Thread renayama19661014
Hi, I confirmed that a patch was reflected. * http://hg.clusterlabs.org/pacemaker/devel/rev/954c93bdb8dd Thanks!! Hideo Yamauchi. --- On Thu, 2011/5/26, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi, A load command of the crm command is a wrong order, but is handled as

[Pacemaker] The active trap of the SNMP is delayed.

2011-06-14 Thread renayama19661014
Hi All, I found a problem with a trap of the SNMP.(from hbagent.) A trap of active of the node seems to have possibilities to be delayed. In addition, this problem sometimes occurs and does not always occur. I confirmed it in the next procedure. Step1) Start a node. Last

[Pacemaker] [Patch] Optional and message change of the crm migrate command.

2011-06-14 Thread renayama19661014
Hi all, There is the message which does not comply with real operation by the message of the crm command. When an operator executes migrate command, crm should display unmigrate in a message. In addition, the crm command does not have the option corresponding to the Q option of the

Re: [Pacemaker] [Patch] Optional and message change of the crm migrate command.

2011-06-15 Thread renayama19661014
Hi Dejan, Thank you for a reply. Many thanks for the patch. But we need a common procedure to fetch and remove options which are used in many commands from the list of arguments. Options such as force and quiet. Right now I'm quite busy elsewhere, so that may take time ... All right. I

Re: [Pacemaker] The active trap of the SNMP is delayed.

2011-06-16 Thread renayama19661014
Hi All, I registered this problem in Bugzilla. * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2604 Best Regards, Hideo Yamauch. --- On Wed, 2011/6/15, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi All, I found a problem with a trap of the SNMP.(from

[Pacemaker] [Problem]By the number of the resources, node do not succeed in resource movement.

2011-06-22 Thread renayama19661014
Hi all, I tested the movement of the resource in the next procedure in environment like PostgresSQL+drbd. Step1) Start a cluster. Last updated: Thu Jun 23 00:13:20 2011 Stack: Heartbeat Current DC: srv02 (4247e5e4-b76c-4bcd-b81a-d971781fc802) - partition with quorum Version:

[Pacemaker] [Question and Enhancement]About specifications of stonith-ng.

2011-07-07 Thread renayama19661014
Hi, We confirmed movement of stonith-ng of Pacemaker1.1. However, stonith-ng does movement unlike stonith of Pacemaker1.0. * In stonith-ng, cannot control the stonith resource that grouped.(Case 1) * In stonith-ng, cannot control the stonith resource that set the priority which does not

Re: [Pacemaker] [Question and Enhancement]About specifications of stonith-ng.

2011-07-07 Thread renayama19661014
Hi Andrew, Thank you for comment. * In stonith-ng, cannot control the stonith resource that grouped.(Case 1) * In stonith-ng, cannot control the stonith resource that set the priority which does not group.(Case 2) Yes, both of these need to be fixed. Could you file bugs for them so

Re: [Pacemaker] [Question and Enhancement]About specifications of stonith-ng.

2011-07-08 Thread renayama19661014
Hi Andrew, I registered a problem with Bugzilla. * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2616 Best Regards, Hideo Yamauchi. --- On Fri, 2011/7/8, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Andrew, Thank you for comment.   * In stonith-ng,

[Pacemaker] [Question]The value of the attribute is changed by a timing and is sent.

2011-08-22 Thread renayama19661014
Hi All, We confirmed the situation that a value from attrd was sent to by the change of the value just before that. (snip) Aug 14 20:05:33 srv01 attrd: [13880]: info: attrd_trigger_update: Sending flush op to all hosts for: default_ping_set (100) Aug 14 20:05:33 srv01 pingd: [14137]: info:

[Pacemaker] [Problem]Time-out(action lost) of completed monitor occurs.

2011-09-05 Thread renayama19661014
Hi All, We came across a mysterious phenomenon on a test of the drbd environment. It is the following procedure. Step 1) Start two nodes. Step 2) Cause the hang of the kernel in an active node. Step 3) In a standby node, the cancellation of the monitor of drbd is carried out. The

Re: [Pacemaker] [Question]The value of the attribute is changed by a timing and is sent.

2011-09-11 Thread renayama19661014
Hi All, We want an official answer about this movement. Best Regards, Hideo Yamauchi. --- On Tue, 2011/8/23, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi All, We confirmed the situation that a value from attrd was sent to by the change of the value just before that.

Re: [Pacemaker] [Question]The value of the attribute is changed by a timing and is sent.

2011-09-25 Thread renayama19661014
Hi Andrew, Thank you for comment. We are not dissatisfied with a function of attrd. We understand a function of this attrd as specifications. Best Regards, Hideo Yamauchi. --- On Mon, 2011/9/26, Andrew Beekhof and...@beekhof.net wrote: On Tue, Aug 23, 2011 at 11:29 AM, 

Re: [Pacemaker] [Problem]Time-out(action lost) of completed monitor occurs.

2011-09-26 Thread renayama19661014
Hi Andrew, Thank you for comment. Which still appears to be down :-( Do you have the tarball still? I may not be the completely same as the contents which I attached for Bugzilla. I send log and pe-file again. * 1655.tar.gz *

Re: [Pacemaker] [Problem]Time-out(action lost) of completed monitor occurs.

2011-10-11 Thread renayama19661014
Hi Andrew, Ok, I've recreated as http://bugs.clusterlabs.org/show_bug.cgi?id=5001 All right. Thanks!. Hideo Yamauchi. On Mon, Sep 26, 2011 at 6:27 PM,  renayama19661...@ybb.ne.jp wrote: Hi Andrew, Thank you for comment. Which still appears to be down :-( Do you have the

[Pacemaker] [Problem] The attrd does not sometimes stop.

2011-10-17 Thread renayama19661014
Hi, We sometimes fail in a stop of attrd. Step1. start a cluster in 2 nodes Step2. stop the first node.(/etc/init.d/heartbeat stop.) Step3. stop the second node after time passed a little.(/etc/init.d/heartbeat stop.) The attrd catches the TERM signal, but does not stop. (snip) Oct 5 02:37:38

Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2011-10-20 Thread renayama19661014
Hi Alan, Thank you for comment. We reproduce a problem, too and are going to send a report. However, the problem does not reappear for the moment. Best Regards, Hideo Yamauchi. --- On Thu, 2011/10/20, Alan Robertson al...@unix.sh wrote: Hi, I've seen a very similar problem in a recent

Re: [Pacemaker] CentOS RPM build for pacemaker-pygui/mgmt

2011-10-30 Thread renayama19661014
Hi Yan, Hi All, (Hideo, IIRC, you host some pre-built packages somewhere?) There is a package for RHEL5,RHEL6 in the next Japanese site. rpm of GUI is included in the file (pacemaker-1.0.11-1.2.1.XXX.XXX.repo.tar-repo.tar.gz), too. * RHEL6(x64) *

Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2011-11-03 Thread renayama19661014
Hi Andrew, Hi Alan, We work hard to collect the evidence of reproduction and the problem of the phenomenon. However, we do not yet get the evidence. I will wait for the information from Alan. Best Regards, Hideo Yamauchi. --- On Wed, 2011/11/2, Andrew Beekhof and...@beekhof.net wrote: On

[Pacemaker] [Problem]A monitor of Master stops when crm command repeat the movement of the resource.

2011-11-07 Thread renayama19661014
Hi All, We tested the movement of the resource in Master/Slave. Last updated: Tue Nov 8 14:12:23 2011 Stack: Heartbeat Current DC: bl460g1b (1b34eec8-1d62-488b-a7fb-8e4b38f95ec3) - partition with quorum Version: 1.0.11-9af47ddebcad19e35a61b2a20301dc038018e8e8 2 Nodes configured,

Re: [Pacemaker] [Problem]A monitor of Master stops when crm command repeat the movement of the resource.

2011-11-10 Thread renayama19661014
Hi All, In the first place lrmd does not return the result of the cancellation request that made pending. However, a result of the cancellation is necessary for crmd. Will not the following correction be necessary? (The correction does not consider errors properly; is temporary.) example :

Re: [Pacemaker] [Problem]A monitor of Master stops when crm command repeat the movement of the resource.

2011-11-11 Thread renayama19661014
Hi Dejan, Thank you for comment. This correction seems to be necessary for the cancellation of the monitor of the Master/Slave resource. This correction creates right states of Master/Slave. I tested it. And the correction solved a problem. Best Regards, Hideo Yamauchi. --- On Sat,

Re: [Pacemaker] [Problem]A monitor of Master stops when crm command repeat the movement of the resource.

2011-11-13 Thread renayama19661014
Hi Dejan, I attach a right patch for Reusable-Cluster-Components-glue--3b800f73ba59. Best Regards, Hideo Yamauchi. --- On Sat, 2011/11/12, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Dejan, Thank you for comment. This correction seems to be necessary for the

Re: [Pacemaker] [Problem]A monitor of Master stops when crm command repeat the movement of the resource.

2011-11-14 Thread renayama19661014
Hi Dejan, Patch applied. Many thanks! All right. Thanks! Hideo Yamauchi. --- On Tue, 2011/11/15, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi Hideo-san, On Mon, Nov 14, 2011 at 09:50:21AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, I attach a right patch for

Re: [Pacemaker] How to find stonith plugins

2011-12-19 Thread renayama19661014
Hi Mars, You seemed to encounter malfunction of next Pacemaker somehow or other. * http://www.gossamer-threads.com/lists/linuxha/pacemaker/73257 * https://bugzilla.redhat.com/show_bug.cgi?id=714879 Please update Pacemaker in a new version. Best Regards, Hideo Yamauchi. --- On Tue,

Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2011-12-21 Thread renayama19661014
Hi Dejan, Hi Lars, In our environment, the problem recurred with the patch of Mr. Lars. After a problem occurred, I sent TERM signal, but attrd does not seem to receive TERM at all. The reconsideration of the patch is necessary for the solution to problem. Best Regards, Hideo Yamauchi. ---

[Pacemaker] [Problem]It is judged that a stopping resource is starting.

2011-12-26 Thread renayama19661014
Hi All, When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message. Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource X was active at shutdown. You may ignore this error if it is unmanaged. Because the

Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2012-01-04 Thread renayama19661014
Hi Lars, If you are able to reproduce, you could try to find out what exactly attrd is doing. various ways to try to do that: cat /proc/pid-of-attrd/stack # if your platform supports that strace it, ltrace it, attach with gdb and provide a stack trace, or even start to single step it,

[Pacemaker] [Question] About the rotation of the pe-file.

2012-01-04 Thread renayama19661014
Hi All, Stored pe files usually begin with the zeroth. However, when I make pe-input-series-max=2, the rotation is carried out between 1 and 2, but is not performed with 0. Is it specifications that is not rotated with 0? I confirmed that 0 was not used by the next calculation. void

Re: [Pacemaker] [Question] About the rotation of the pe-file.

2012-01-05 Thread renayama19661014
Hi Andrew, Thank you for comments. Could you try with: while(max = 0 sequence max) { The problem is not settled by this correction. The rotation is carried out with a value except 0. Best Regards, Hideo Yamauchi. --- On Fri, 2012/1/6, Andrew Beekhof and...@beekhof.net

Re: [Pacemaker] [Problem]It is judged that a stopping resource is starting.

2012-01-05 Thread renayama19661014
Hi Andrew, Thank you for comment. But it should have a subsequent stop action which would set it back to being inactive. Did that not happen in this case? Yes. Log of verify_stopped is only recorded. The stop handling of resource that failed in probe was not carried out.

Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2012-01-09 Thread renayama19661014
Hi Lars, I attach strace file when a problem reappeared at the end of last year. I used glue which applied your patch for confirmation. It is the file which I picked with attrd by strace -p command right before I stop Heartbeat. Finally SIGTERM caught it, but attrd did not stop. The attrd

Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2012-01-11 Thread renayama19661014
Hi Lars, Hi Dejan, I got ltrace file when a problem occurred. I attach ltrace file. The investigation in gdb continues it and performs it. If there is suggestion of any improvement, please tell me. Best Regards, Hideo Yamauchi. --- On Tue, 2012/1/10, renayama19661...@ybb.ne.jp

Re: [Pacemaker] [Question] About the rotation of the pe-file.

2012-01-15 Thread renayama19661014
Hi Lars, Hi Andrew, If you want it to be between [0, max-1], obviously that should be while(max 0 sequence = max) { sequence -= max; } Thanks!!I try it. Though I wonder why not simply: if (max == 0) return; if (sequence max)

Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2012-01-15 Thread renayama19661014
Hi Lars, Thank you for comments and suggestion. poll([{fd=7, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}], 3, -1 Note the -1 (infinity timeout!) So even though the trigger was (presumably) set, and the -prepare() should have returned true,

Re: [Pacemaker] [Question] About the rotation of the pe-file.

2012-01-15 Thread renayama19661014
Hi Andrew, Hi Lars, If you want it to be between [0, max-1], obviously that should be while(max 0 sequence = max) { sequence -= max; } The rotation was carried out definitely from 0 to max-1. Though I wonder why not simply: if (max == 0)

Re: [Pacemaker] [Question] About the rotation of the pe-file.

2012-01-15 Thread renayama19661014
Hi Andrew, Hi Lars, Its in my private tree so far: https://github.com/beekhof/pacemaker/commit/bfbb73c It will make its way to clusterlabs when I merge next. All right! Many Thanks! Hideo Yamauchi. --- On Mon, 2012/1/16, Andrew Beekhof and...@beekhof.net wrote: On Mon, Jan 16, 2012

Re: [Pacemaker] [Problem]It is judged that a stopping resource is starting.

2012-01-15 Thread renayama19661014
Hi Andrew, Thank you for comments. Could you send me the PE file related to this log please? Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2 The old file disappeared. I send log and the

Re: [Pacemaker] How to deal with unix signals in a glib mainloop (was: [Problem] The attrd does not sometimes stop.)

2012-01-19 Thread renayama19661014
Hi Lars, Hi Andrew, I test it now in the environment that the problem reproduces with a patch of Mr. Lars. * The patch of msgfromIPC_ll does not apply it. * The patch of crm_trigger_prepare applies it. The problem does not reappear on the test of several days for the moment. I carry out a test

Re: [Pacemaker] How to deal with unix signals in a glib mainloop (was: [Problem] The attrd does not sometimes stop.)

2012-01-24 Thread renayama19661014
Hi Lars, Hi Andrew, I confirmed that a problem did not occur with a patch of Mr. Lars. The examination that I carried out is repetition by start and a stop. I tested it five times The results are as follows. Try 1. During 420 times, start/stop succeed. Try 2. During 396 times, start/stop

Re: [Pacemaker] How to deal with unix signals in a glib mainloop (was: [Problem] The attrd does not sometimes stop.)

2012-01-31 Thread renayama19661014
Hi Lars, Hi Andrew, I confirmed that a problem did not occur with a patch of Mr. Andrew. * https://github.com/beekhof/pacemaker/commit/2a6b296 The examination that I carried out is repetition by start and a stop. Try 1. During 405 times, start/stop succeed. Try 2. During 407 times,

Re: [Pacemaker] How to deal with unix signals in a glib mainloop (was: [Problem] The attrd does not sometimes stop.)

2012-02-01 Thread renayama19661014
Hi Andrew, It should already be in the main repo for 1.1 and we can backport to pacemaker-1.0 The next correction was included in Pacemaker1.1, and I confirmed a thing. * https://github.com/ClusterLabs/pacemaker/commit/2a6b296b7ca42a1b671563f5ab73853ff2a8fcef#lib/common I look forward to

Re: [Pacemaker] [Problem]It is judged that a stopping resource is starting.

2012-02-13 Thread renayama19661014
Hi Andrew, About this problem, how did it turn out afterwards? Best Regards, Hideo Yamauchi. --- On Mon, 2012/1/16, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Andrew, Thank you for comments. Could you send me the PE file related to this log please? Jan  6

Re: [Pacemaker] [Problem]It is judged that a stopping resource is starting.

2012-02-16 Thread renayama19661014
Hi Andrew, Thank you for comment. I'm getting to this soon, really :-) First it was corosync 2.0 stuff, so that /something/ in fedora-17 works, then fixing everything I broke when adding corosync 2.0 support. All right! I wait for your answer. Best Regards, Hideo Yamauchi. --- On Thu,

Re: [Pacemaker] [Problem]It is judged that a stopping resource is starting.

2012-02-21 Thread renayama19661014
Hi Andrew, Thank you for comment. Sorry...I cannot understand your answer well. Does your answer mean next? 1)It is necessary for the manager of the system to cope when rc is 6(fatal) log. 2)And it is necessary for this to be reflected by a document. And does it mean that the next log should

Re: [Pacemaker] [Problem]It is judged that a stopping resource is starting.

2012-02-23 Thread renayama19661014
Hi Andrwe, I overlooked it. We want Pacemaker1.0 to apply a similar correction. (e.g., like the patch which I contributed) Best Regards, Hideo Yamauchi. --- On Fri, 2012/2/24, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Andrew, Thank you for comment. 1)It is

[Pacemaker] [Patch]Patch for crmd-transition-delay processing.

2012-03-22 Thread renayama19661014
Hi All, The crmd-transition-delay waits for the update of the attribute to be late. However, crmd cannot realize the wait of the attribute well because a timer is not reset when the delay of the attribute occurs after a timer was set. As a result, the resource may not be placed definitely. I

Re: [Pacemaker] [Patch]Patch for crmd-transition-delay processing.

2012-03-22 Thread renayama19661014
Hi All, Sorry My patch was wrong. I send a right patch. Best Regards, Hideo Yamauchi. --- On Thu, 2012/3/22, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi All, The crmd-transition-delay waits for the update of the attribute to be late. However, crmd cannot

Re: [Pacemaker] [Patch]Patch for crmd-transition-delay processing.

2012-03-29 Thread renayama19661014
Hi Andrew, Thank you for comment. The patch makes sense, could you resend as a github pull request? :-D All right!! I send it if ready. Please wait Best Regards, Hideo Yamauchi. --- On Thu, 2012/3/29, Andrew Beekhof and...@beekhof.net wrote: The patch makes sense, could you resend as a

Re: [Pacemaker] [Problem] The cluster fails in the stop of the node.

2012-03-29 Thread renayama19661014
Hi Andrew, This appears to be resolved with 1.1.7, perhaps look for a patch to backport? I confirm movement of Pacemaker 1.1.7. And I talk about the backporting with Mr Mori. Best Regards, Hideo Yamauchi. --- On Thu, 2012/3/29, Andrew Beekhof and...@beekhof.net wrote: This appears to be

Re: [Pacemaker] [Patch]Patch for crmd-transition-delay processing.

2012-04-03 Thread renayama19661014
Hi Andrew, I published a pullrequest. * https://github.com/ClusterLabs/pacemaker-1.0/pull/3 I forgot a patch to te_action.c. I publish a pullrequest again. Best Regards, Hideo Yamauchi. --- On Fri, 2012/3/30, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Andrew, Thank

[Pacemaker] [Problem]colocation condition does not become effective.

2012-06-11 Thread renayama19661014
Hi All, We put pgsql resource of Master/Slave and original RA together and constituted a cluster. However, the problem that colocation condition did not become effective happened. Therefore promote is carried out. Step1) Start a node not to be used to Master earlier. Step2) VIPCheck resources

[Pacemaker] [Problem and Question] About negative setting of colocation.

2012-06-27 Thread renayama19661014
Hi All, We confirmed it about negative setting of colocation. It is colocation with with-rsc-role designation. We checked it in the next procedure. Step1) Start the first node. And we send cib. Last updated: Wed Jun 27 22:58:15 2012 Stack: Heartbeat Current DC: rh62-test1

Re: [Pacemaker] [Problem and Question] About negative setting of colocation.

2012-06-27 Thread renayama19661014
Hi Andrew, All right. We wait for a correction. Best Regards, Hideo Yamauchi. --- On Thu, 2012/6/28, Andrew Beekhof and...@beekhof.net wrote: On Wed, Jun 27, 2012 at 4:27 PM,  renayama19661...@ybb.ne.jp wrote: Hi All, I registered these contents with Bugzilla.  *

Re: [Pacemaker] [Problem]colocation condition does not become effective.

2012-06-27 Thread renayama19661014
Hi Andrew, Thank you for comment. All right. We wait for a correction. Best Regards, Hideo Yamauchi. --- On Thu, 2012/6/28, Andrew Beekhof and...@beekhof.net wrote: On Tue, Jun 12, 2012 at 3:33 PM,  renayama19661...@ybb.ne.jp wrote: Hi All, ...  I registered these contents with

[Pacemaker] [Problem] Order is ignored, and promote is carried out.

2012-06-28 Thread renayama19661014
Hi All, We identified order of the master/slave resource as primitive resource. We set order limitation as follows. However, promote was carried out even if primitvei resource caused start trouble. Last updated: Fri Jun 29 19:20:09 2012 Stack: Heartbeat Current DC: rh62-test1

Re: [Pacemaker] [Problem] Order is ignored, and promote is carried out.

2012-06-28 Thread renayama19661014
Hi All, Sorry...I forgot it We set order limitation as follows. rsc_colocation id=rsc_colocation-7 rsc=vipCheck score=INFINITY with-rsc=msPostgresql with-rsc-role=Master/ rsc_order first=vipCheck first-action=start id=rsc_order-7 score=INFINITY then=msPostgresql

Re: [Pacemaker] [Problem] Order is ignored, and promote is carried out.

2012-07-01 Thread renayama19661014
Hi Phillip, Thank you for comment. However, the result was the same even if I used the Group resource. (snip) group id=GrpvipCheck primitive class=ocf id=vipCheck provider=pacemaker type=Dummyinstance_attributes id=vipCheck-instance_attributes/instance_attributes

Re: [Pacemaker] [Problem] Order is ignored, and promote is carried out.

2012-07-01 Thread renayama19661014
Hi All, The constitution of the cluster became right by a redundant resource and addition and a change of the limitation as follows. However, we do not want to perform the redundant setting. (snip) primitive class=ocf id=vipCheck provider=pacemaker type=Dummy instance_attributes

[Pacemaker] [Problem] Though order limitation exists, start which is not carried out is made.

2012-07-05 Thread renayama19661014
Hi All, We performed setting avoiding the next problem. * http://www.gossamer-threads.com/lists/linuxha/pacemaker/80250 - http://bugs.clusterlabs.org/show_bug.cgi?id=5070 * http://www.gossamer-threads.com/lists/linuxha/pacemaker/80549 - http://bugs.clusterlabs.org/show_bug.cgi?id=5075

[Pacemaker] [Problem] Order which combined a master with clone is invalid.

2012-07-20 Thread renayama19661014
Hi All, We confirmed movement of order which combined a master with clone. We performed it by a very simple combination. Step1) We change it to produce start error in Dummy resource. (snip) dummy_start() { return $OCF_ERR_GENERIC dummy_monitor (snip) Step2) We start one node and send cib.

Re: [Pacemaker] [Problem] Order which combined a master with clone is invalid.

2012-07-20 Thread renayama19661014
Hi All, I registered hb_report file with Bugzilla. * http://bugs.clusterlabs.org/show_bug.cgi?id=5086 Best Regards, Hideo Yamauchi. --- On Fri, 2012/7/20, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi All, We confirmed movement of order which combined a master with

Re: [Pacemaker] [Problem] Order which combined a master with clone is invalid.

2012-07-22 Thread renayama19661014
Hi David, Thank you for comments. http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch06s03s02.html I confirmed it in INFINITY. (snip) rsc_colocation id=rsc_colocation-1 rsc=msDrPostgreSQLDB rsc-role=Master score=INFINITY with-rsc=clnPingd/ rsc_order

Re: [Pacemaker] [Problem] Order which combined a master with clone is invalid.

2012-07-29 Thread renayama19661014
Hi Andrew, Thank you or commets. Online: [ drbd1 drbd2 ] Master/Slave Set: msDrPostgreSQLDB Masters: [ drbd2 ] Slaves: [ drbd1 ] --- Started and Status Slave. Yep, looks like a bug. I'll follow up on the bugzilla. I talked with David by

Re: [Pacemaker] [Patch] When the ID of the resource changes, influence may be reflected on an application of colocation.

2012-08-07 Thread renayama19661014
Hi Andrew Thank you for comments. The problem with this approach is that ordering of the constraints in the cib is not preserved between the nodes. I will follow up further on the bugzilla. All right! Many Thanks, Hideo Yamauchi. --- On Tue, 2012/8/7, Andrew Beekhof and...@beekhof.net

Re: [Pacemaker] [Problem] Order which combined a master with clone is invalid.

2012-08-07 Thread renayama19661014
Hi Andrew, The first method) * Set colocation in clnPingd and msDrPostgreSQLDB. The second method) * Set interleave option in clnPingd. Do my two methods include a mistake? No. Looking closer, the initial constraint says only that the Master must be on a node running

Re: [Pacemaker] [Problem] Though order limitation exists, start which is not carried out is made.

2012-08-15 Thread renayama19661014
Hi Andrew, Thank you for comments. Is this specifications? Or is it a bug? * I registered a problem with Bugzilla. * http://bugs.clusterlabs.org/show_bug.cgi?id=5079 Excellent. I'll follow up there (likewise for the other bugs you've sent to the list recently :-) I understood

[Pacemaker] [Question] About the stop order at the time of the Probe error.

2012-08-22 Thread renayama19661014
Hi All, We found a problem at the time of Porobe error. It is the following simple resource constitution. Last updated: Wed Aug 22 15:19:50 2012 Stack: Heartbeat Current DC: drbd1 (6081ac99-d941-40b9-a4a3-9f996ff291c0) - partition with quorum Version: 1.0.12-c6770b8 1 Nodes

<    1   2   3   4   >