Hi, I think The problem is that: the /etc/hosts file is resolved the dns node3 to 192.168.1.15<eth1>, but the hbase inner sometime uses the 192.168.1.13<eth0>. When I use the command "ifdown eth0" on node3 and use stop-hbase.sh, there shows the message:
2011-07-06 10:25:50,683 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop 2011-07-06 10:25:50,683 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified 2011-07-06 10:26:00,685 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop 2011-07-06 10:26:00,685 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop 2011-07-06 10:26:00,685 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified 2011-07-06 10:26:10,687 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop 2011-07-06 10:26:10,687 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop 2011-07-06 10:26:10,687 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified 2011-07-06 10:26:20,689 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop 2011-07-06 10:26:20,689 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop 2011-07-06 10:26:20,689 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified 2011-07-06 10:26:30,691 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop And when I "ifup eth0" on node3, it will work well and stop the hbase normal: 2011-07-06 10:28:47,139 INFO org.apache.hadoop.hbase.master.ServerManager: Region server node3,60020,1309860160318 quiesced 2011-07-06 10:28:47,139 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables quiesced. Proceeding with shutdown 2011-07-06 10:28:47,139 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop 2011-07-06 10:28:47,139 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop 2011-07-06 10:28:47,139 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified 2011-07-06 10:28:47,338 INFO org.apache.hadoop.hbase.master.ServerManager: Removing server's info node3,60020,1309860160318 2011-07-06 10:28:47,338 INFO org.apache.hadoop.hbase.master.ServerManager: Region server node3,60020,1309860160318: MSG_REPORT_EXITING 2011-07-06 10:28:50,719 INFO org.apache.hadoop.hbase.master.HMaster: Stopping infoServer 2011/7/5 Jameson Li <hovlj...@gmail.com> > Hi, > > when I start my hbase cluster, there are some error logs in the master-log: > <the ip and hostname node3 192.168.1.15 192.168.1.13 are the same machine > that have two NIC> > 2011-07-05 17:13:13,820 INFO org.apache.zookeeper.ClientCnxn: > zookeeper.disableAutoWatchReset is false > 2011-07-05 17:13:13,840 INFO org.apache.zookeeper.ClientCnxn: Attempting > connection to server node3/192.168.1.15:2181 > .... > 2011-07-05 17:13:13,975 DEBUG org.apache.hadoop.hbase.master.HMaster: > Checking cluster state... > 2011-07-05 17:13:13,979 DEBUG > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode > /hbase/root-region-server got 192.168.1.13:60020 > .... > 2011-07-05 17:13:19,732 DEBUG > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Updated ZNode > /hbase/rs/1309857199677 with data 192.168.1.15:60020 > .... > 2011-07-05 17:22:01,041 INFO org.apache.hadoop.ipc.HbaseRPC: Server at / > 192.168.1.13:60020 could not be reached after 1 tries, giving up. > 2011-07-05 17:22:01,042 WARN org.apache.hadoop.hbase.master.BaseScanner: > Scan one META region: {server: 192.168.1.13:60020, regionname: .META.,,1, > startKey: <>}org.apache.hadoop.hbase.client.RetriesExhaustedException: > Failed setting up proxy to /192.168.1.13:60020 after attempts=1 > at > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:429) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:918) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:934) > at > org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:173) > at > org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73) > at > org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129) > at > org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153) > at org.apache.hadoop.hbase.Chore.run(Chore.java:68) > > Sometimes when the .META. region is not assigned to the server node3, which > has two NIC:eth0:192.168.1.13 and eth1:192.168.1.15 and resolve the dns/host > as:192.168.1.15 node3, I means, when the region .META. is assigned to the > others server that has only one NIC, the hbase will work well. > > here is some of my hbase cluster infos: > Hbase version:0.20.6 > Hadoop version:0.20-append+4 > Zookeeper version:3.3.0 > > the hbase-site.xml: > <configuration> > <property> > <name>hbase.rootdir</name> > <value>hdfs://node3:54310/hbase</value> > </property> > > <property> > <name>hbase.master</name> > <value>hadoop5:60000</value> > </property> > > <property> > <name>hbase.zookeeper.quorum</name> > <value>node3,hadoop5,hadoopoffice85,hadoopoffice88,hdofficelj001</value> > </property> > > <property> > <name>hbase.cluster.distributed</name> > <value>true</value> > </property> > > <!--property> > <name>hbase.master.dns.interface</name> > <value>eth1</value> > <description>The name of the Network Interface from which a master > should report its IP address. > </description> > </property> > > <property> > <name>hbase.regionserver.dns.interface</name> > <value>eth1</value> > <description>The name of the Network Interface from which a region > server > should report its IP address. > </description> > </property> > > <property> > <name>hbase.zookeeper.dns.interface</name> > <value>eth1</value> > <description>The name of the Network Interface from which a ZooKeeper > server > should report its IP address. > </description> > </property--> > > <!--property> > <name>hbase.zookeeper.property.clientPort</name> > <value>2222</value> > <description>Property from ZooKeeper's config zoo.cfg. > The port at which the clients will connect. > </description> > </property> > <property> > <name>hbase.zookeeper.property.dataDir</name> > <value>/opt/zookeeper/data</value> > <description>Property from ZooKeeper's config zoo.cfg. > The directory where the snapshot is stored. > </description> > </property--> > </configuration> > > cat /opt/hbase/conf/regionservers > hadoop5 > node3 > hadoopoffice85 > hadoopoffice88 > hdofficelj001 > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > And the below is the node3's info: > 192.168.1.13's ifconfig info: > [root@node3 ~]# ifconfig > eth0 Link encap:Ethernet HWaddr 00:0C:29:23:2E:D3 > inet addr:192.168.1.13 Bcast:192.168.1.255 Mask:255.255.255.0 > inet6 addr: fe80::20c:29ff:fe23:2ed3/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:1424620 errors:0 dropped:0 overruns:0 frame:0 > TX packets:17897973 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:150231810 (143.2 MiB) TX bytes:2834085782 (2.6 GiB) > Base address:0x2000 Memory:d8920000-d8940000 > > eth1 Link encap:Ethernet HWaddr 00:0C:29:23:2E:DD > inet addr:192.168.1.15 Bcast:192.168.1.255 Mask:255.255.255.0 > inet6 addr: fe80::20c:29ff:fe23:2edd/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:1172226 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1445 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:168873362 (161.0 MiB) TX bytes:293447 (286.5 KiB) > Base address:0x2040 Memory:d8940000-d8960000 > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:370550 errors:0 dropped:0 overruns:0 frame:0 > TX packets:370550 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:64864387 (61.8 MiB) TX bytes:64864387 (61.8 MiB) > > the hosts info: > [root@node3 ~]# cat /etc/hosts > # Do not remove the following line, or various programs > # that require network functionality will fail. > 127.0.0.1 localhost.localdomain localhost > ::1 localhost6.localdomain6 localhost6 > 192.168.1.27 hadoop5 > 192.168.1.15 node3 > 192.168.1.85 hadoopoffice85 > 192.168.1.88 hadoopoffice88 > 192.168.3.227 hdofficelj001 > > [root@node3 ~]# netstat -nap | grep 600 > tcp 0 0 ::ffff:192.168.1.15:60020 :::* > LISTEN 19064/java > tcp 0 0 :::60030 :::* > LISTEN 19064/java > tcp 0 0 ::ffff:192.168.1.13:44350 ::ffff:192.168.1.27:60000 > ESTABLISHED 19064/java > > [root@node3 ~]# route > Kernel IP routing table > Destination Gateway Genmask Flags Metric Ref Use > Iface > 239.2.11.71 * 255.255.255.255 UH 0 0 0 > eth1 > 192.168.1.0 * 255.255.255.0 U 0 0 0 > eth0 > 192.168.1.0 * 255.255.255.0 U 0 0 0 > eth1 > 169.254.0.0 * 255.255.0.0 U 0 0 0 > eth1 > default 192.168.1.254 0.0.0.0 UG 0 0 0 > eth0 > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > I have added the dns.interface configuration as eth1, but it still has the > same error. > <property> > <name>hbase.master.dns.interface</name> > <value>eth1</value> > <description>The name of the Network Interface from which a master > should report its IP address. > </description> > </property> > > <property> > <name>hbase.regionserver.dns.interface</name> > <value>eth1</value> > <description>The name of the Network Interface from which a region > server > should report its IP address. > </description> > </property> > > <property> > <name>hbase.zookeeper.dns.interface</name> > <value>eth1</value> > <description>The name of the Network Interface from which a ZooKeeper > server > should report its IP address. > </description> > </property> > > > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > But After I change the default route, but it still has the same error. > [root@node3 ~]# route del default > [root@node3 ~]# route add -net default gw 192.168.1.254 eth1 > [root@node3 ~]# route > Kernel IP routing table > Destination Gateway Genmask Flags Metric Ref Use > Iface > 239.2.11.71 * 255.255.255.255 UH 0 0 0 > eth1 > 192.168.1.0 * 255.255.255.0 U 0 0 0 > eth0 > 192.168.1.0 * 255.255.255.0 U 0 0 0 > eth1 > 169.254.0.0 * 255.255.0.0 U 0 0 0 > eth1 > default 192.168.1.254 0.0.0.0 UG 0 0 0 > eth1 > [root@node3 ~]# netstat -nap | grep 600 > tcp 0 0 ::ffff:192.168.1.15:60020 :::* > LISTEN 23282/java > tcp 0 0 :::60030 :::* > LISTEN 23282/java > tcp 0 0 ::ffff:192.168.1.13:45783 ::ffff:192.168.1.27:60000 > ESTABLISHED 23282/java > > Help. >