[Linux-ha-dev] [PATCH][crmsh] deal with the case-insentive hostname

2013-04-10 Thread Junko IKEDA
Hi, I set upper-case hostname (GUEST03/GUEST4) and run Pacemaker 1.1.9 + Corosync 2.3.0. [root@GUEST04 ~]# crm_mon -1 Last updated: Wed Apr 10 15:12:48 2013 Last change: Wed Apr 10 14:02:36 2013 via crmd on GUEST04 Stack: corosync Current DC: GUEST04 (3232242817) - partition with quorum Version:

Re: [Linux-ha-dev] [PATCH] handle idmapd using nfsserver RA

2012-05-30 Thread Junko IKEDA
Hi, My previous patch had a spelling error, revise it just a bit. Thanks, Junko 2012/5/30 Junko IKEDA tsukishima...@gmail.com: Hi, I am trying to setup NFSv4 server using nfsserver RA, and adding some handlings for rpc.idmad. http://linux.die.net/man/8/rpc.idmapd Please see the attached

Re: [Linux-ha-dev] [PATCH] nfsserver RA : add check statement to start function

2012-05-16 Thread Junko IKEDA
Hi, Thank you for your quick response! This one seems to be missing. Or is it covered now by the monitor test? nfsserver_start () can now return $OCF_SUCCESS if it detects that nfs server is already started. This ocf_log debug, which complains about the argument, will not be called anymore

[Linux-ha-dev] [PATCH] nfsserver RA : add check statement to start function

2012-05-15 Thread Junko IKEDA
, so ocf_log debug complains; Not enough arguments [1] to ocf_log. I added a check statement for this. Please see the attached. Regards, Junko IKEDA NTT DATA INTELLILINK CORPORATION nfsserver-validate-all.patch Description: Binary data nfsserver-check-start.patch Description: Binary data

Re: [Linux-ha-dev] [PATCH] Filesystem RA: remove a status file only when OCF_CHECK_LEVEL is set as 20

2012-05-13 Thread Junko IKEDA
Hi, Is my case hard to understand? multipath means the Fibre Channels, there are two cables for redundancy. Thanks, Junko 2012/5/9 Junko IKEDA tsukishima...@gmail.com: Hi, In my case, the umount succeed when the Fibre Channels is disconnected, so it seemed that the handling status file

[Linux-ha-dev] [PATCH] Filesystem RA: remove a status file only when OCF_CHECK_LEVEL is set as 20

2012-05-08 Thread Junko IKEDA
OCF_CHECK_LEVE), it's enough to try unmount the file system, isn't it? https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/Filesystem#L774 Regards, Junko IKEDA NTT DATA INTELLILINK CORPORATION Filesystem.patch Description: Binary data

Re: [Linux-ha-dev] [PATCH] Filesystem RA: remove a status file only when OCF_CHECK_LEVEL is set as 20

2012-05-08 Thread Junko IKEDA
Hi, In my case, the umount succeed when the Fibre Channels is disconnected, so it seemed that the handling status file caused a longer failover, as Dejan said. If the umount fails, it will go into a timeout, might call stonith action, and this case also makes sense (though I couldn't see this).

Re: [Linux-ha-dev] [PATCH] named RA: support IPv6

2012-01-16 Thread Junko IKEDA
Hi, Thank you for pointing that out! Regards, Junko IKEDA 2012/1/17 Dejan Muhamedagic de...@suse.de: On Mon, Jan 16, 2012 at 03:10:14PM +0100, Dejan Muhamedagic wrote: On Sat, Jan 14, 2012 at 12:32:20PM +0100, Lars Ellenberg wrote: On Mon, Jan 09, 2012 at 05:50:14PM +0100, Dejan Muhamedagic

[Linux-ha-dev] [PATCH] named RA: support IPv6

2011-12-12 Thread Junko IKEDA
, right? named_monitor() output=`$OCF_RESKEY_host $OCF_RESKEY_monitor_request $OCF_RESKEY_monitor_ip` if [ $? -ne 0 ] || ! echo $output | grep -q '.* has address '$OCF_RESKEY_monitor_response Would you please give me some advice? Regards, Junko IKEDA NTT DATA INTELLILINK CORPORATION named_ipv6

Re: [Linux-ha-dev] [PATCH] add the new parameter for replication network in mysql RA

2011-11-21 Thread Junko IKEDA
Hi Raoul, Thank you for your comments! this method should leave the slave be if the master did not change since the last sync. consider:   crm node standby node02; crm node online node02 the slave should pick up where it left using mysql's own way of saving the last replication information

Re: [Linux-ha-dev] [PATCH] add the new parameter for replication network in mysql RA

2011-11-14 Thread Junko IKEDA
Hi, sorry, agein. My previous patch was wrong. I attached the new one. Thanks, Junko 2011/11/11 Junko IKEDA tsukishima...@gmail.com: Hi, The current mysql RA, it set hostname (= uname -n) as its replication network, but I have the following restriction. # uname -n node01 # cat /etc

Re: [Linux-ha-dev] [PATCH] add the new parameter for replication network in mysql RA

2011-11-14 Thread Junko IKEDA
Hi Raoul, Sure, thanks! Regards, Junko 2011/11/14 Raoul Bhatia [IPAX] r.bha...@ipax.at: hello junko-san! i propose the following documentation update to clarify the parameter's usage. parameter name=replication_hostname_suffix unique=0 required=0 longdesc lang=en A hostname suffix that

Re: [Linux-ha-dev] [PATCH] prevent Slave promotion in mysql RA

2011-11-14 Thread Junko IKEDA
Hi Marek, Florian, Thank you for your comments! Did you set evict_outdated_slaves? No, If set to false (the default), then the slave will be allowed to stay in the cluster, but its master preference will be pushed down so it's not promoted, and this seems to be Ikeda-san's preferred

[Linux-ha-dev] [PATCH] change the monitor log level of mysql RA

2011-11-11 Thread Junko IKEDA
noisy. I think there is no problem if we change these log level from info to debug. Please see attached. Regards, Junko IKEDA NTT DATA INTELLILINK CORPORATION mysql-log.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux

[Linux-ha-dev] [PATCH] prevent Slave promotion in mysql RA

2011-11-11 Thread Junko IKEDA
]; then # Sanitize a below-zero preference to just zero master_pref=0 fi $CRM_MASTER -v $master_pref fi I'm less familiar with the replication behavior, please advise me how to do it. Regards, Junko IKEDA NTT DATA INTELLILINK CORPORATION mysql

[Linux-ha-dev] [PATCH] add the new parameter for replication network in mysql RA

2011-11-11 Thread Junko IKEDA
? Regards, Junko IKEDA NTT DATA INTELLILINK CORPORATION mysql-replication_hostname_suffix.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux

Re: [Linux-ha-dev] [PATCH] specify the full path for ipmi command

2011-09-26 Thread Junko IKEDA
Hi Dejan, Many thanks! Can I get it from http://hg.linux-ha.org/glue/ ? Regards, Junko 2011/9/22 Dejan Muhamedagic de...@suse.de: Hi Junko-san, On Wed, Aug 17, 2011 at 10:22:40AM +0900, Junko IKEDA wrote: Hi Dejan, Thank you for your reply! I attached the revised patch. Just applied

Re: [Linux-ha-dev] [PATCH] specify the full path for ipmi command

2011-08-16 Thread Junko IKEDA
Hi Dejan, Thank you for your reply! I attached the revised patch. http://www.gossamer-threads.com/lists/linuxha/pacemaker/74350 I don't see the connection between the two. I am trying to use /tmp/ipmitool command for some tests, and add its path for root. so $PATH for root is here; # echo

[Linux-ha-dev] [PATCH] specify the full path for ipmi command

2011-08-15 Thread Junko IKEDA
? Best Regards, Junko IKEDA NTT DATA INTELLILINK CORPORATION ipmi.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/

[Linux-ha-dev] [PATCH]add sfex_init man to .spec

2011-06-21 Thread Junko IKEDA
Hi, The latest resource-agent has man page for sfex_init, and I add it to .spec. Please see the attached patch. Best Regard, Junko IKEDA NTT DATA INTELLILINK CORPORATION sfex_init.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev

[Linux-ha-dev] [PATCH]modify description for ethmonitor RA

2011-06-21 Thread Junko IKEDA
' gmake[1]: *** [all-recursive] Error 1 gmake[1]: Leaving directory `/root/Desktop/work/20110622/resource-agents' make: *** [all] Error 2 Best Regards, Junko IKEDA NTT DATA INTELLILINK CORPORATION ethmonitor.patch Description: Binary data

Re: [Linux-HA] using the pacemaker logo for the xing group

2011-06-21 Thread Junko IKEDA
Hi Erkan, The pacemaker logos has been created by NTT group. I asked for the boss's permission, I think I can send them to you directory soon :) Did you post the similar mail to the Japanese mailing list before this? Sorry to inconvenience you. Thanks, Junko IKEDA NTT DATA INTELLILINK

[Linux-HA] [PATCH]modify description for ethmonitor RA

2011-06-21 Thread Junko IKEDA
-agents' make: *** [all] Error 2 Best Regards, Junko IKEDA NTT DATA INTELLILINK CORPORATION ethmonitor.patch Description: Binary data ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also

Re: [Linux-ha-dev] Translate crm_cli.txt to Japanese

2011-04-27 Thread Junko IKEDA
Hi, May I suggest that you go with the devel version, because crm_cli.txt was converted to crm.8.txt. There are not many textual changes, just some obsolete parts removed. OK, I got crm.8.txt from devel. Each directory structure for Pacemaker 1.0,1.1 and devel is just a bit different. Does

[Linux-ha-dev] execute permission for exportfs RA

2010-04-22 Thread Junko IKEDA
Hi, I tried to compile the latest agents package from mercurial repository, but new exportfs RA complained about something like this; # hg clone http://hg.linux-ha.org/agents/ # cd agents # ./autogen.sh # ./configure --localstatedir=/var --disable-fatal-warnings # make /bin/sh:

Re: [Linux-HA] socket of lrmd

2009-12-01 Thread Junko IKEDA
Hi Dejan, This is an old issue, http://www.gossamer-threads.com/lists/linuxha/users/56449 but I remember this from the release plan announcement of Heartbeat 3.0.2. I have done the test for the following your patch, it seemed there was no problem. Please apply this to the new release. On Thu,

[Linux-ha-dev] Fwd: Re: [PATCH] recovering from the online backup failure

2009-11-16 Thread Junko IKEDA
Hi, I had done some tests for this patch, and I could get the desired results. I think this patch wouldn't affect the current usage. Serge, Thank you for your review! Thanks, Junko --- Forwarded message --- From: Serge Dubrouski serge...@gmail.com To: Junko IKEDA ike

[Linux-ha-dev] [PATCH] recovering from the online backup failure

2009-11-10 Thread Junko IKEDA
Hi, If some failures happen during the online backup of PostgreSQL, pgsql can not handle the fail over, because backup_label, this is a file for a backup process of Postgres, remains on the shared disk. pgsql can not start DB if this file remains. Please see the attached. Thanks, Junko

Re: [Linux-ha-dev] route del in IPaddr RA

2009-11-10 Thread Junko IKEDA
; then - return $OCF_SUCCESS - fi +MSG=`$PING $PINGARGS 21` +if [ $? = 0 ]; then +return $OCF_SUCCESS +fi done - + +ocf_log err $MSG return $OCF_ERR_GENERIC } Thanks, Junko On Mon, 09 Nov 2009 18:13:29 +0900, Junko IKEDA ike...@intellilink.co.jp

[Linux-ha-dev] route del in IPaddr RA

2009-11-09 Thread Junko IKEDA
Hi, I wonder why IPaddr RA needs to run route del before it deletes the target interface. Does the old version of IPaddr contain route add? If route del fails, RA will be able to return $OCF_SUCCESS, but I feel a little strange when I see the error message from route command like this.

Re: [Linux-ha-dev] route del in IPaddr RA

2009-11-09 Thread Junko IKEDA
Hi, By the way, this is a really trivial thing, I have some requests about logging messages of IPaddr. Please see the modified attachment. Thanks, Junko On Mon, 09 Nov 2009 18:13:29 +0900, Junko IKEDA ike...@intellilink.co.jp wrote: Hi, I wonder why IPaddr RA needs to run route del

Re: [Linux-HA] LHAIFStatus Shows Down

2009-08-07 Thread Junko IKEDA
Hi, Heartbeat 2.1.3 (snmp_subagent) had some bugs around LHAIFStatus. I think you should use Heartbeat 2.1.4. or you can try to change the value for -r option for hbagent. ex.) respawn root /usr/lib64/heartbeat/hbagent -r 1 -d Thanks, Junko On Fri, 07 Aug 2009 15:12:42 +0900, Jiann-Ming Su

Re: [Linux-HA] SNMP Subagent in OpenAIS

2009-07-07 Thread Junko IKEDA
Hi, On Mon, 2009-07-06 at 16:14 +0200, Michael Schwartzkopff wrote: Am Montag, 6. Juli 2009 16:06:55 schrieb Michael Schwartzkopff: Hi, anybody gained already some experience using the heartbeat snmp subagent in an openais enviroment? any problems? Thanks, Ok. Found the answer

Re: [Linux-HA] SNMP Subagent in OpenAIS

2009-07-07 Thread Junko IKEDA
On Tue, 07 Jul 2009 17:23:41 +0900, Michael Schwartzkopff mi...@multinet.de wrote: Am Dienstag, 7. Juli 2009 10:10:09 schrieb Junko IKEDA: Hi, On Mon, 2009-07-06 at 16:14 +0200, Michael Schwartzkopff wrote: Am Montag, 6. Juli 2009 16:06:55 schrieb Michael Schwartzkopff: Hi

Re: [Linux-HA] socket of lrmd

2009-07-03 Thread Junko IKEDA
Hi, On Fri, 03 Jul 2009 10:59:21 +0900, Junko IKEDA ike...@intellilink.co.jp wrote: Hi Dejan, Your patch could stop the error message from LVM RA. Many thanks! But I run Heartbeat 2.1.4 so I worry about whether 2.1.4 still have a problem about stonithd that you pointed. By the way

Re: [Linux-HA] socket of lrmd

2009-07-03 Thread Junko IKEDA
the latest code, of course. :) Thanks, Junko On Fri, 03 Jul 2009 16:28:20 +0900, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi again Junko-san, On Fri, Jul 03, 2009 at 04:15:40PM +0900, Junko IKEDA wrote: Hi, On Fri, 03 Jul 2009 10:59:21 +0900, Junko IKEDA ike...@intellilink.co.jp wrote

Re: [Linux-HA] socket of lrmd

2009-07-03 Thread Junko IKEDA
, On Fri, Jul 03, 2009 at 04:37:44PM +0900, Junko IKEDA wrote: Hi again, :) Thank you for your quick reply. Our customer might hesitate to apply the new patch for their running system at once. Of course. (I don't know their upgrade plan unfortunately) So I want to know whether Heartbeat can

Re: [Linux-HA] Error compiling Heartbeat

2009-05-28 Thread Junko IKEDA
Hi, I'm not be sure... But I found the folloing message in config.log. /usr/bin/ld: crt1.o: No such file: No such file or directory I have RHEL5.3, so # rpm -qf /usr/bin/ld binutils-2.17.50.0.6-9.el5 I think RHEL4-3 should have binutils-2.15.92.0.2-1. Thanks, Junko -Original

Re: [Linux-HA] Error compiling Heartbeat

2009-05-28 Thread Junko IKEDA
Sorry, I'm really wrong... Thanks, Junko -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Junko IKEDA Sent: Thursday, May 28, 2009 6:41 PM To: 'General Linux-HA mailing list' Subject: Re: [Linux-HA] Error

Re: [Linux-HA] Error compiling Heartbeat

2009-05-28 Thread Junko IKEDA
Sorry for many posting. # rpm -qf /usr/lib64/crt1.o glibc-devel-2.5-34 need glibc-devel-2.3.4-2.19? -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Junko IKEDA Sent: Thursday, May 28, 2009 6:44 PM To: 'General

Re: [Linux-HA] crm CLI

2009-04-28 Thread Junko IKEDA
-Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Andrew Beekhof Sent: Tuesday, April 28, 2009 5:42 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] crm CLI On Tue, Apr 28, 2009 at 09:39, Cristina

RE: [Linux-HA] Cancel a STONITH

2009-04-15 Thread Junko IKEDA
Hi, I think I can use hb_delnode when I want to remove one node from the cluster, Should I do hb_delnode on DC? Is there any distinction between DC or not to do that? Thanks, Junko -Original Message- From: linux-ha-boun...@lists.linux-ha.org

RE: [Linux-ha-dev] xm dump-core from xen0

2009-03-18 Thread Junko IKEDA
Development List Subject: Re: [Linux-ha-dev] xm dump-core from xen0 Hi, On Mon, Mar 16, 2009 at 07:22:02PM +0900, Junko IKEDA wrote: Hi, I run the new xen0 on domU now, and need an additional feature for a dump destination. I have RHEL5.2 x86_64 and xen 3.1. This would dump

RE: [Linux-HA] STONITH to the node which active(have some resources) and DC

2009-03-17 Thread Junko IKEDA
I found the following stonithd behavior. It might be an expected one, but I'm just wondering. My operation is here; (1) start Heartbeat 2.1.4 on two nodes(dom-d1, dom-2). (2) start the resource on active node(dom-d2), and dom-d2 is also DC in this case. (3) modify the RA and cause

RE: [Linux-ha-dev] xm dump-core from xen0

2009-03-06 Thread Junko IKEDA
Hi, My operation is here; # ssh x3650g # export dom0=x3650g # export hostlist=dom-d1:/etc/xen/dom-d1 dom-d2:/etc/xen/dom-d2 # /usr/lib64/stonith/plugins/external/xen0 on dom-d1 # echo $? 0 dom-d1 was created well. # /usr/lib64/stonith/plugins/external/xen0 reset dom-d1 # echo $? # 1

RE: [Linux-ha-dev] xm dump-core from xen0

2009-03-06 Thread Junko IKEDA
Sorry for all of my mistakes... I have a wrong /etc/hosts. It works well for now. By the way, Could I config this plugin on two Dom0 and two DomU? ex.) domU-1 on Dom0-1, and domU-2 o Dom0-2 Thanks, Junko Hi, My operation is here; # ssh x3650g # export dom0=x3650g # export

RE: [Linux-ha-dev] xm dump-core from xen0

2009-03-06 Thread Junko IKEDA
I run the attached cib.xml. It seems that this configuration works well (but I need more tests) If there is any strange elements, please let me know. Thanks, Junko Sorry for all of my mistakes... I have a wrong /etc/hosts. It works well for now. By the way, Could I config this plugin on

RE: [Linux-ha-dev] xm dump-core from xen0

2009-03-04 Thread Junko IKEDA
be a big deal. I can add one more config parameter like run_dump, then if it's set the script will call xm dump-core before destroying xunU. On Tue, Mar 3, 2009 at 10:38 PM, Junko IKEDA ike...@intellilink.co.jp wrote: Hi Serge, I'm trying to manage xen domain-U with xen0 plugin

RE: [Linux-ha-dev] xm dump-core from xen0

2009-03-04 Thread Junko IKEDA
4, 2009 at 6:45 PM, Junko IKEDA ike...@intellilink.co.jp wrote: Hi, Attached is a patch that adds that functionality. Many thanks! I'll give it a try. By the way, xen0 plugin should run on domain-0, right? Is it possible to run it on domain-U? Thanks, Junko On Tue, Mar 3

[Linux-ha-dev] xm dump-core from xen0

2009-03-03 Thread Junko IKEDA
Hi Serge, I'm trying to manage xen domain-U with xen0 plugin. There are two xm command, like xm destroy and xm create in xen0, How do you think to add xm dump-core into it? If possible, I want to get the dump of domain-U when some fence events happen. Best Regards, Junko Ikeda

RE: [Linux-HA] The upper limit of cib.xml for cibadmin

2009-02-03 Thread Junko IKEDA
we always face the IPC problem... We handle the big cib.xml to put /var/lib/heartbeat/crm when its first boot, and modify some TCP/UDP parameters as makeshift measures. Thanks, Junko On Tue, Feb 3, 2009 at 08:12, Junko IKEDA ike...@intellilink.co.jp wrote: Hi, We have 16 nodes

[Linux-HA] The upper limit of cib.xml for cibadmin

2009-02-02 Thread Junko IKEDA
the timeout... Best Regards, Junko Ikeda NTT DATA INTELLILINK CORPORATION ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems

RE: [Linux-HA] The upper limit of cib.xml for cibadmin

2009-02-02 Thread Junko IKEDA
Hi, We have 16 nodes, and the size of cib.xml is now about 150kbyte. Heartbeat is 2.1.4. When I call cibadmin command, the following message comes. # cibadmin -U -x cib.xml No messages received in 30 seconds.. aborting Is the size of cib.xml too big? Quite possibly. I

RE: [Linux-HA] The upper limit of cib.xml for cibadmin

2009-02-02 Thread Junko IKEDA
Hi, We have 16 nodes, and the size of cib.xml is now about 150kbyte. Heartbeat is 2.1.4. When I call cibadmin command, the following message comes. # cibadmin -U -x cib.xml No messages received in 30 seconds.. aborting Is the size of cib.xml too big? Quite

RE: [Linux-HA] crm_mon --one-shot shouldnt hang/block forever if thereis an issue with heartbeat

2009-01-18 Thread Junko IKEDA
Hi, im using the crm_mon output to parse the cluster status for other applications. Problem is, when heartbeat is either not running or there is some connection issue in the cluster or some random issue i cant make out - crm_mon will never return (ok as soon as the issue is repaired it may be

[Linux-HA] resource restart after recovering split brain

2008-11-27 Thread Junko IKEDA
://developerbugs.linux-foundation.org//show_bug.cgi?id=2004 Best Regards, Junko Ikeda NTT DATA INTELLILINK CORPORATION ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org

RE: [Linux-HA] instance id after a split brain

2008-11-10 Thread Junko IKEDA
The latest Pacemaker 1.0 can help our problem which I posted to the following entry. http://developerbugs.linux-foundation.org/show_bug.cgi?id=1990 A split brain under 4 nodes circumstances can be recovered successfully! It seems that these patches have the effect for this behavior.

RE: [Linux-HA] instance id after a split brain

2008-11-09 Thread Junko IKEDA
...) and join the cluster member. hac02, hac06 received instance=17 again, and can notice the DC election, but they freeze... the newest id doesn't come. Other nodes would take hac02 and hac06 as OFFLINE node. This situation is very rare, so is this some timing bug? Best Regards, Junko

[Linux-HA] NACK'd after split brain

2008-11-06 Thread Junko IKEDA
DC nodes during a split brain, so it seems that DC election conflicts and some nodes can not share the status when its recovering stage. It might be a bug of ccm. Can I get any obvious evidences why the node is shut down from log? Best Regards, Junko Ikeda NTT DATA INTELLILINK CORPORATION

[Linux-HA] join id when split brain comes

2008-10-30 Thread Junko IKEDA
/10/30_14:38:32 debug: handle_request: Raising I_JOIN_OFFER: join-4 crmd[6912]: 2008/10/30_14:38:32 debug: handle_request: Raising I_JOIN_OFFER: join-5 crmd[6912]: 2008/10/30_14:38:32 debug: handle_request: Raising I_JOIN_RESULT: join-14 Best Regards, Junko Ikeda NTT DATA INTELLILINK CORPORATION

RE: [Linux-HA] Updated from 2.1.3 to 2.99.x w/ Pacemaker 1.x and CIBno longer conforms to DTD

2008-10-28 Thread Junko IKEDA
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alex Strachan Sent: Monday, October 27, 2008 10:31 AM To: 'General Linux-HA mailing list' Subject: RE: [Linux-HA] Updated from 2.1.3 to 2.99.x w/ Pacemaker 1.x and CIBno longer conforms to DTD See

RE: [Linux-ha-dev] SFEX resource agent for heartbeat

2008-10-16 Thread Junko IKEDA
Hi, See also this page, please. http://www.linux-ha.org/sfex/ Thanks, Junko -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Xinwei Hu Sent: Thursday, October 16, 2008 6:55 PM To: High-Availability Linux Development List Subject: Re: [Linux-ha-dev]

RE: [Linux-HA] the maximum message size which bcast can handle

2008-09-09 Thread Junko IKEDA
net/ipv4/udp.c(2.6.18-92.el5)   495 int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct \ msghdr *msg,   496 size_t len)   497 {   511 if (len 0x)   512 return -EMSGSIZE; in line 511, the limit for UDP packet is 65535

RE: [Linux-HA] the maximum message size which bcast can handle

2008-09-08 Thread Junko IKEDA
Hmm. Perhaps this (the maximum packet size) has been checked by somebody before, then forgotten and it never got into discussion about the message compression. When I started working on the compression, the MAXMSG was already temporarily set to 2MB. Also, I can distinctly recall that Lars

RE: [Linux-HA] the maximum message size which bcast can handle

2008-09-08 Thread Junko IKEDA
It means that heartbeat can't deliver a message if the uncompressed size is bigger than 2MB. It also means that heartbeat can't deliver a message if, after compressing the message, the size is still bigger than 256kB. I see, First control gate is 2MB, second is 256kB. If MAXMSG(256kbyte)

RE: [Linux-HA] the maximum message size which bcast can handle

2008-09-08 Thread Junko IKEDA
it, because the max size for sendto() is 64kbyte. 256kbyte message should be split into pieces before sending as packet. by the way, I set bcast in ha.cf as media. I assume you're working on a patch for this? That means, heartbeat doesn't care for 256kB message before putting it

[Linux-ha-dev] uninstall Heartbeat 2.1.4-1 on RedHat

2008-08-19 Thread Junko IKEDA
with /sbin/ldconfig for RedHat? Best Regards, Junko Ikeda NTT DATA INTELLILINK CORPORATION ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/

RE: [Linux-ha-dev] crm_mon doesn't exit immediately

2008-08-11 Thread Junko IKEDA
If there's no objection I would like to push this patch into the lha-2.1 repository, but any problem on that? sure It seems that the latest pacemaker also presents the same behavior so I think the both needs to be fixed as well. I thought it was fine? sorry, that might have been

RE: [Linux-HA] rsc_order constraints behavior changed?

2008-07-28 Thread Junko IKEDA
If you don't want non_clone_group1 to be restarted when this happens, make the ordering constraint advisory-only by setting adding score=0 to the constraint. I tried this configuration, but non_clone_group1 was restarted when clone1 resources fail-count was cleared. you're right -

RE: [Linux-HA] does cib process need a lot of cpu power?

2008-07-27 Thread Junko IKEDA
Of course. More resources == more actions to perform == more CIB updates to perform == more work for the CIB This is reasonable, but 1 group which contains 15 resources gets 100% CPU is overdone even if it's a fleeting behavior. So the machine should waste CPU cycles so that 100% of

RE: [Linux-HA] does cib process need a lot of cpu power?

2008-07-25 Thread Junko IKEDA
threading Best Regards, Junko Ikeda NTT DATA INTELLILINK CORPORATION opreport .txtlibcrmcommon.txt___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also

RE: [Linux-HA] does cib process need a lot of cpu power?

2008-07-25 Thread Junko IKEDA
power? Or are there any reasonable causes that cib process occupies lots of cpu instantaneously? By the way, cib_notify_client() is also called a lot. cpuinfo: Intel(R) Xeon(R) CPU 5160 @ 3.00GHz Core 2 Hiper threading Best Regards, Junko Ikeda NTT

RE: [Linux-HA] does cib process need a lot of cpu power?

2008-07-25 Thread Junko IKEDA
@ 3.00GHz Core 2 Hiper threading Best Regards, Junko Ikeda NTT DATA INTELLILINK CORPORATION opreport .txtlibcrmcommon.txt___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org

RE: [Linux-HA] does cib process need a lot of cpu power?

2008-07-24 Thread Junko IKEDA
which codebase is this? I tried this with Heartbeat 2.1.3 first. Heartbeat-Dev + Pacemaker+Dev also showed the same behavior. Thanks, Junko On Jul 24, 2008, at 1:58 PM, Junko IKEDA wrote: Hi, I have 4 nodes, (3 active + 1 standby), and each active node has 15 resources which

RE: [Linux-HA] New release ahead ?

2008-07-14 Thread Junko IKEDA
Hi, It seems that we can get Heartbeat 2.1.4 soon, so I am trying STABLE 2.1 (http://hg.linux-ha.org/lha-2.1/) as a release candidate. These are trivial bugs, but user can find them easily. It would be convenient if they are fixed before a release. 1) When I quit crm_mon -i1 using Ctrl + C,

RE: [Linux-HA] speed up fail over time

2008-07-11 Thread Junko IKEDA
Hi, We are now trying to show a good performance report to the potential customer. Our customer's requests are here; * There are more than 100 resources on one node. * 100 resources are included in one group, so they would start/stop sequentially. * Fail over for all of 100

RE: [Linux-HA] behavior of lrmd/crmd when lrmd process is killed

2008-06-30 Thread Junko IKEDA
the lrmd died but whatever mechanism the IPC code is using doesn't seem able to. On Fri, Jun 27, 2008 at 12:51, Junko IKEDA [EMAIL PROTECTED] wrote: It might be worth seeing if you can repeat the result with a resource based on a simple daemon process ( while(1) { sleep(1

RE: [Linux-HA] failcount not increased above 1

2008-06-27 Thread Junko IKEDA
Hi, It seems that you face the same problem which I did before. I think you shouldn't use 2.1.3. Please refer to this list: http://www.gossamer-threads.com/lists/linuxha/users/47008 http://developerbugs.linux-foundation.org/show_bug.cgi?id=1859 The latest package includes the above fix.

RE: [Linux-HA] sometimes crm_resource -F fails

2008-06-25 Thread Junko IKEDA
released 0.6.5). On Fri, Jun 20, 2008 at 09:51, Junko IKEDA [EMAIL PROTECTED] wrote: Hi, I run this combination; Pacemaker:0df5ae633188 Heartbeat:c94051dc16a5 There are three Filesystem and one IPaddr on one node. If IPaddr is forced into the other node with crm_resource

RE: [Linux-HA] sometimes crm_resource -F fails

2008-06-25 Thread Junko IKEDA
Ah ok, sorry just wanted to make sure the intended functionality was clear. I had a look at the report and analysis.txt highlights the problem quite well: pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op: Hard error: prmApPostgreSQLDB_fail_6 failed with rc=2.

RE: [Linux-HA] sometimes crm_resource -F fails

2008-06-25 Thread Junko IKEDA
Ah ok, sorry just wanted to make sure the intended functionality was clear. I had a look at the report and analysis.txt highlights the problem quite well: pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op: Hard error: prmApPostgreSQLDB_fail_6 failed with rc=2.

RE: [Linux-HA] A demand to the lrmd function.

2008-06-17 Thread Junko IKEDA
When the lrmd process falls, lrmd reboots. But, the monitor stops after having rebooted. In this status, lrmd cannot detect the obstacle of the resource after it. Actually, there may be little possibility that lrmd reboots. But, I think that it is necessary when I think about the worst

RE: [Linux-HA] A demand to the lrmd function.

2008-06-17 Thread Junko IKEDA
When the lrmd process falls, lrmd reboots. But, the monitor stops after having rebooted. In this status, lrmd cannot detect the obstacle of the resource after it. Actually, there may be little possibility that lrmd reboots. But, I think that it is necessary when I think about the

RE: [Linux-HA] A demand to the lrmd function.

2008-06-17 Thread Junko IKEDA
I can only think of two things - system load (or CPU power, both of which would affect the timing) and whether you both have stonith enabled. Intel(R) Xeon(R) CPU5160 @ 3.00GHz (x2) - lrmd restart. That's it. Intel(R) Pentium(R) 4 CPU 3.20GHz (x2) - lrmd restart, and somehow,

RE: [Linux-HA] A demand to the lrmd function.

2008-06-17 Thread Junko IKEDA
Intel(R) Xeon(R) CPU5160 @ 3.00GHz (x2) - lrmd restart. That's it. actually, i was wrong... if you're using crm on then the node should (based on how the code works) always commit suicide. can you create a bug and attach a hb_report for this case please? Is suicide the

RE: [Linux-HA] showscores script for Heartbeat 2.1.3

2008-06-05 Thread Junko IKEDA
Hi, iirc, 2.1.3 did not have the -s option on ptest. According to file revisions, ( http://hg.clusterlabs.org/pacemaker/stable-0.6/log/e63da7d9940d/contrib/sh owscores.sh ) this should be the version to try (before change to use ptest -s):

[Linux-HA] showscores script for Heartbeat 2.1.3

2008-06-04 Thread Junko IKEDA
://hg.clusterlabs.org/pacemaker/dev/file/tip/contrib/showscores.sh So some values can not be displayed if I use it with Heartbeat 2.1.3. Best Regards, Junko Ikeda NTT DATA INTELLILINK CORPORATION ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http

RE: [Linux-HA] the stop sequence for group resource

2008-06-03 Thread Junko IKEDA
Then (because of the probe) we find out it _is_ running afterall and we end up in the situation contained in pe-input-6.bz2 We only guarantee that the probe for rscX completes before we start the rscX. Start failures which are set as on_fail=block also induce the unmanaged status as the same

RE: [Linux-HA] SCSI Reservation OCF Agent ?

2008-06-02 Thread Junko IKEDA
. Don't know if anybody's using it. It might not be SCSI reservations to be exact, but it would control the ownership of shared disk. Try SFEX (Shared Disk File EXclusiveness Control Program) from here; http://linux-ha.org/sfex Best Regards, Junko Ikeda NTT DATA INTELLILINK CORPORATION

RE: [Linux-HA] unwanted moving of resource clones

2008-05-07 Thread Junko IKEDA
Hi, I'm using latest 2.1.3 from CentOS. If somebody's interested, hb_report is available at http://nik.lbox.cz/downloads/vb.tar.gz thanks a lot in advance BR nik Is this the same problem as this? Clone instance might be shuffled unexpectedly.

RE: [Linux-ha-dev][RFC]heartbeat-2.1.4---Masterresource'sdemoteoperationgoesintoaninfinite loop

2008-04-21 Thread Junko IKEDA
Btw. You do realize that setting ordered=false for the master resource also means that the group's actions wont be ordered either don't you? You mean, there's a possibility that slave resource will start/stop before master's action complete if I don't set ordered=true, right? No.

RE: [Linux-ha-dev][RFC]heartbeat-2.1.4---Masterresource'sdemoteoperationgoesinto aninfinite loop

2008-04-20 Thread Junko IKEDA
of it next time. Thanks, Junko 2008/4/18 Junko IKEDA [EMAIL PROTECTED]: Fixed by: http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/4817a7094683 It works well with group-master/slave, too. Many thanks! Please merge it into Heartbeat 2.1.4. Thanks, Junko

RE: [Linux-ha-dev] [RFC] heartbeat-2.1.4 --- build onRHEL5.1

2008-04-17 Thread Junko IKEDA
any ideas as to why the current code doesn't work for you? I failed to build rpm on open suse 10.1 too... It might be a potential problem in Heartbeat 2.1.3. See attached configure-213.log. It's sure that the summary says CIM provider and TSA plugin would not be built, Build CRM

RE: [Linux-ha-dev] [RFC] heartbeat-2.1.4 --- Master resource's demoteoperation goes into an infinite loop

2008-04-16 Thread Junko IKEDA
. Is there something wrong with cib.xml ? This is similar case to what Yamauchi-san posted. Best Regards, Junko Ikeda NTT DATA INTELLILINK CORPORATION ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux

RE: [Linux-ha-dev] [RFC] heartbeat-2.1.4 --- build onRHEL5.1

2008-04-16 Thread Junko IKEDA
Hi, I keep failing to build lha-2.1 on RHEL5.1 for now. It seems that --enable-cim-provider=no and --enable-tsa-plugin=no are ineffective for ConfigureMe. We don't need CIM providers or TSA plugin, so have a try to make patch about it. Please check the attached. Sorry for annoying. The

RE: [Linux-ha-dev] [RFC] heartbeat-2.1.4

2008-04-15 Thread Junko IKEDA
Hi again, Another request; Would it be possible to include the following patch in release 2.1.4? http://hg.linux-ha.org/dev/rev/6307bb091d02 It will help the problems which are posted into Bugzilla 1814, for all platform not only ppc.

RE: [Linux-ha-dev] [RFC] heartbeat-2.1.4

2008-04-14 Thread Junko IKEDA
Hi, So, that said, I've pushed my proposed code to http://hg.linux-ha.org/lha-2.1/. It, for reasons outlined above, likely doesn't build yet (because the in-tree packaging is broken), but I wanted to share the scope of changes with you. There are some fixes about failcount in

RE: [Linux-HA] HELP: can't get secondary HB server back up after trying to manually edit cib.xml file

2008-03-10 Thread Junko IKEDA
Hi, I finally have the primary server back up (fspbro213.rchland.ibm.com), but I can't get the secondary server back up. I get these messages in the /var/log/ha-log file of the secondary server (fspbro214.rchland.ibm.com): heartbeat[12894]: 2008/03/10_11:57:37 ERROR: should_drop_message:

RE: [Linux-HA] ERROR: crm_abort: ha_set_tm_time: Triggered assert atiso8601.c:887

2008-02-29 Thread Junko IKEDA
Hi, I've seen these messages appearing when I connect the hb_gui to the mgmtd: mgmtd[6819]: 2008/02/29_08:03:51 ERROR: crm_abort: ha_set_tm_time: Triggered assert at iso8601.c:887 : rhs-tm_mday 0 || lhs-days == rhs-tm_mday I also found the same error right now! Not only from mgmtd, crmd

[Linux-HA] crm_standby error out on glib (GLib-CRITICAL)

2008-01-30 Thread Junko IKEDA
); if (source) g_source_destroy (source); return source != NULL; } Best Regards, Junko Ikeda NTT DATA INTELLILINK CORPORATION # crm_standby -U prec370e -v on (process:10181): GLib-CRITICAL **: g_source_remove: assertion `tag 0' failed (process:10181): GLib-CRITICAL **: g_source_remove

  1   2   >