Re: Neverending DataNode.clienttrace logging after starting Hbase

2010-03-03 Thread Stack
This looks like the product of the below addtion: r687868 | cdouglas | 2008-08-21 14:27:31 -0700 (Thu, 21 Aug 2008) | 3 lines HADOOP-3062. Add metrics to DataNode and TaskTracker to record network traffic for HDFS reads/writ

Re: efficient storage of timeseries

2010-03-03 Thread Stack
On Wed, Mar 3, 2010 at 6:39 PM, Saptarshi Guha wrote: > But if I specify which column families I want, will it still load the > entire row? No. It will just load the content of the specified column families. Suppose row key R has col. families A,B, C, D and A itself > has 100 column labels, B

Re: Timestamp of specific row and colmun

2010-03-03 Thread Slava Gorelik
Hi Jonathan. Thank You for the clarification, yes, I saw this is in the code. I'll try avoid map creation. Best Regards. On Thu, Mar 4, 2010 at 12:21 AM, Jonathan Gray wrote: > Just to reiterate and confirm what Erik is saying, building the Map will > internally iterate all of the KeyValues, di

Region Server Count

2010-03-03 Thread Jean-Daniel Cryans
@Steve Severance It seems your email bounced on almost everyone's email account. I was able to retrieve it but you should see if there's anything wrong by contacting in...@apache.org So your email was: - From: "Severance, Steve" To: "hbase-user@hadoop.apache.org" Date: Wed, 17 Feb

Re: efficient storage of timeseries

2010-03-03 Thread Saptarshi Guha
But if I specify which column families I want, will it still load the entire row? Suppose row key R has col. families A,B, C, D and A itself has 100 column labels, B has 150 and C 5, D 5 I request row key R, but specify column families C and D. Will it still load A,B? Regards Saptarshi th > talle

Re: HFile backup while cluster running

2010-03-03 Thread Vaibhav Puranik
Kevin, Are you using EBS? If yes, just take a snapshot of your volumes. And create new volumes from the snapshot. Regards, Vaibhav Puranik GumGum On Wed, Mar 3, 2010 at 1:12 PM, Jonathan Gray wrote: > Kevin, > > Taking writes during the transition time will be the issue. > > If you don't take

RE: timestamps / versions revisited

2010-03-03 Thread Jonathan Gray
Yes, you could have issues if data has the same timestamp (only one of them being returned). As far as inserting things not in chronological order, there are no issues if you are doing scans and not deleting anything. If you're asking for the latest version of something with a Get, there are s

Re: Trying to understand HBase/ZooKeeper Logs

2010-03-03 Thread Chad Metcalf
The src rpms we package are always available. In this case its in the same repo. http://archive.cloudera.com/redhat/cdh/contrib/hbase-0.20-0.20.3-1.cloudera.src.rpm Cheers Chad On Wed, Mar 3, 2010 at 3:19 PM, A

Re: hbase shell count crashes

2010-03-03 Thread Bluemetrix Development
Thanks. I'll take a look at that in depth as soon as I have a chance. Seriously tho, brilliant work and thanks to all involved - its progressed such a great deal even in the last 9 months I've been following / using the product. Really enjoying it. On Wed, Mar 3, 2010 at 5:58 PM, Jean-Daniel Crya

timestamps / versions revisited

2010-03-03 Thread Bluemetrix Development
Hi, I've been having a few issues with certain versions of data "missing" and/or less data than expected in my tables. I've read over quite a few old threads and I think I understand what is actually happening, but just wanted to possibly confirm my thinking. I hope I am not rehashing too many old

Re: Trying to understand HBase/ZooKeeper Logs

2010-03-03 Thread Andrew Purtell
I built the HBase RPMs for Cloudera. Just for future reference if someone needs patched versions of those RPMs, it's easy enough for me to spin them for you. Just drop me a note. And/or you may want to send a note to Cloudera explaining your needs. I put together a version of Cloudera-ized HBas

Re: hbase shell count crashes

2010-03-03 Thread Jean-Daniel Cryans
Mmm then you might be hitting http://issues.apache.org/jira/browse/HBASE-2244 As you can see we are working hard to stabilize HBase as much as possible ;) J-D On Wed, Mar 3, 2010 at 2:56 PM, Bluemetrix Development wrote: > Yes, upgrading to 0.20.3 should be added to my list above. I have > sinc

Re: hbase shell count crashes

2010-03-03 Thread Bluemetrix Development
Yes, upgrading to 0.20.3 should be added to my list above. I have since done this. Thanks very much for that. On Wed, Mar 3, 2010 at 4:44 PM, Jean-Daniel Cryans wrote: > There were a lot of problems with Hadoop pre 0.20.2 for clusters > smaller than 10, especially 3 when having node failure. If y

Neverending DataNode.clienttrace logging after starting Hbase

2010-03-03 Thread Rod Cope
We have a problem on our Hadoop cluster where a random subset of our datanode logs are filling up with GB¹s of clienttrace messages after starting HBase. The logs are fine, fsck shows a perfect report, and all is well until HBase is started. Without running any MR jobs or using HBase/HDFS, we¹ve

RE: Timestamp of specific row and colmun

2010-03-03 Thread Jonathan Gray
Just to reiterate and confirm what Erik is saying, building the Map will internally iterate all of the KeyValues, dissect each one, and do lots of insertions in the map. It will be less efficient than just iterating the list of KVs directly yourself and pulling out only what you need from each. J

Re: Timestamp of specific row and colmun

2010-03-03 Thread Slava Gorelik
Hi Erik, Wow, if this 10 times faster it worth at less elegance. About the further usage of the data, I'm going to access columns by name (random access) and i think creating the map will help me to speed up the access later, so i think in my case map is good enough. But, I also will consider the a

Re: Timestamp of specific row and colmun

2010-03-03 Thread Erik Holstad
Hey Slava! I actually think that iterating the list is going to be faster since you don't have to create the map first, but that kinda depends on how you are planning to use the data afterwards, but please let me know if you get a different result. The reason that there isn't a more "elegant" way o

Re: Timestamp of specific row and colmun

2010-03-03 Thread Slava Gorelik
Hi Erik. No, you don't miss :-) In my situation I already have a Result with all columns for the specific family and i need to get the timestamp. Iterating the list i think could take a more time than finding in map. So I think I'll take a map using getMap() method and then I'll find a specific col

Re: hbase shell count crashes

2010-03-03 Thread Jean-Daniel Cryans
There were a lot of problems with Hadoop pre 0.20.2 for clusters smaller than 10, especially 3 when having node failure. If you are talking about just region servers, you are using 0.20.2 and 0.20.3 has stability fixes. J-D On Wed, Mar 3, 2010 at 12:41 PM, Bluemetrix Development wrote: > For com

RE: HFile backup while cluster running

2010-03-03 Thread Jonathan Gray
Kevin, Taking writes during the transition time will be the issue. If you don't take any writes, then you can flush all your tables and do an HDFS copy the same way. HBase doesn't actually have to be shutdown, that's just recommended to prevent things from changing mid-backup. If you're careful

Re: hbase shell count crashes

2010-03-03 Thread Bluemetrix Development
For completeness sake, I'll update here. The issue with shell counts and rowcounter crashing were fixed by upping - open files to 32K (ulimit -n) - dfs.datanode.max.xcievers to 2048 (I had overlooked this when moving to a larger cluster) As for recovering from crashes, I haven't had much luck. I'm

Re: NotServingRegionException

2010-03-03 Thread Ted Yu
I use org.apache.hadoop.hbase.mapreduce.Import to import which is launched on the same VM. On Wed, Mar 3, 2010 at 11:37 AM, Jean-Daniel Cryans wrote: > Yes that's one thing, also make sure your client has connectivity... > doesn't seem so. > > J-D > > On Wed, Mar 3, 2010 at 11:32 AM, Ted Yu wrot

Re: HFile backup while cluster running

2010-03-03 Thread Ted Yu
If you disable writing, you can use org.apache.hadoop.hbase.mapreduce.Export to export all your data, copy them to your new HDFS, then use org.apache.hadoop.hbase.mapreduce.Import, finally switch your clients to the new HBase cluster. On Wed, Mar 3, 2010 at 11:27 AM, Kevin Peterson wrote: > My cu

Re: NotServingRegionException

2010-03-03 Thread Jean-Daniel Cryans
Yes that's one thing, also make sure your client has connectivity... doesn't seem so. J-D On Wed, Mar 3, 2010 at 11:32 AM, Ted Yu wrote: > But querying zookeeper shows: >  lsr /hbase > > hbase >    safe-mode >    rs >   1267640372165 >    root-region-server >    master >    shutdown > > > On

HFile backup while cluster running

2010-03-03 Thread Kevin Peterson
My current setup in EC2 is a Hadoop Map Reduce cluster and HBase cluster sharing the same HDFS. That is, I have a batch of nodes that run datanode and tasktracker and a bunch of nodes that run datanode and regionserver. I'm trying to move HBase off this cluster to a new cluster with it's own HDFS.

RE: Trying to understand HBase/ZooKeeper Logs

2010-03-03 Thread Michael Segel
Thanks. Cloudera's release doesn't ship with the source code, but luckily we have the source when we wanted to test 20.3 code. Thanks again! > Date: Wed, 3 Mar 2010 10:54:26 -0800 > Subject: Re: Trying to understand HBase/ZooKeeper Logs > From: jdcry...@apache.org > To: hbase-user@hadoop.apac

Re: NotServingRegionException

2010-03-03 Thread Jean-Daniel Cryans
Looks like a connectivity issue, it says: > 10/03/03 10:45:55 WARN zookeeper.ZooKeeperWrapper: Failed to create /hbase > -- check quorum servers, currently=tyu-linux:2181 Do what it says to do ;) Also make sure that that client can reach that address. In my experience using a VM can be troublesom

Re: NotServingRegionException

2010-03-03 Thread Ted Yu
Hi, J-D: I restarted hbase and am not seeing NotServingRegionException now. I tried to import a table that I exported from hbase 0.20.1 into this 0.20.3 instance. After sometime I got: 10/03/03 10:45:55 WARN zookeeper.ZooKeeperWrapper: Failed to create /hbase -- check quorum servers, currently=tyu

Re: Trying to understand HBase/ZooKeeper Logs

2010-03-03 Thread Jean-Daniel Cryans
So get the patch in your hbase root, on linux do: wget https://issues.apache.org/jira/secure/attachment/12436659/HBASE-2174_0.20.3.patch then run: patch -p0 < HBASE-2174_0.20.3.patch finally compile: ant tar The new tar will be in build/ J-D On Wed, Mar 3, 2010 at 10:52 AM, Michael Segel wrot

RE: Trying to understand HBase/ZooKeeper Logs

2010-03-03 Thread Michael Segel
Hey! Thanks for the responses. It looks like the patch I was pointed to may solve the issue. We've had some network latency issues. Again the 50ms was something I found quickly in the logs and if I had a failure after turning on all of the debugging, I think I could have drilled down to the is

Re: Trying to understand HBase/ZooKeeper Logs

2010-03-03 Thread Patrick Hunt
Also check the ZK server logs and see if you notice any session expirations (esp during this timeframe). "grep -i expir " Patrick Jean-Daniel Cryans wrote: Michael, Grep your master log for "Received report from unknown server" and if you do find it, it means that you have DNS flapping. This

Re: NotServingRegionException

2010-03-03 Thread Jean-Daniel Cryans
Ted, With such a small snippet it's hard to tell ;) Looks like that region server wasn't assigned with .META. but -ROOT- contains that address for that region. Look at the logs for when 1) the master assigns the region and 2) when the region server opens the region. In between I expect you should

RE: Trying to understand HBase/ZooKeeper Logs

2010-03-03 Thread Jonathan Gray
What version of HBase are you running? There were some recent fixes related to DNS issues causing regionservers to check-in to the master as a different name. Anything strange about the network or DNS setup of your cluster? ZooKeeper is sensitive to causes and network latency, as would any fault

Re: Trying to understand HBase/ZooKeeper Logs

2010-03-03 Thread Jean-Daniel Cryans
Michael, Grep your master log for "Received report from unknown server" and if you do find it, it means that you have DNS flapping. This may explain why you see a "new instance" which in this case would be the master registering the region server a second or third time. This patch in this jira fix

Re: fail to startup regionserver

2010-03-03 Thread Jean-Daniel Cryans
WRT the connection refused, I see this: 2010-03-02 19:18:06,565 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server cactus208/127.0.0.1: Which means that that machine is cactus208 since it resolves to 127.0.0.1 and either that Zookeeper server is full of client connections f

Re: NotServingRegionException

2010-03-03 Thread Ted Yu
Previous attempt wasn't delivered. On Wed, Mar 3, 2010 at 9:30 AM, Ted Yu wrote: > Hi, > I started hbase 0.20.3 successfully on my Linux VM. Master and regionserver > are on the same VM. > There're two empty tables. > > Soon I saw the following in regionserver.log: > 2010-03-03 09:18:31,643 INFO

Re: Timestamp of specific row and colmun

2010-03-03 Thread Erik Holstad
Hey Slava! Do you want to get the timestamp for a specific column after you have gotten multiple columns already? I think the fastest way to do that is to iterate the List that you have and look for the specific column. You can also create one of the different maps, which will take longer. Or am I

RE: Trying to understand HBase/ZooKeeper Logs

2010-03-03 Thread Michael Segel
> Date: Wed, 3 Mar 2010 09:17:06 -0800 > From: ph...@apache.org > To: hbase-user@hadoop.apache.org > Subject: Re: Trying to understand HBase/ZooKeeper Logs [SNIP] > There are a few issues involved with the ping time: > > 1) the network (obv :-) ) > 2) the zk server - if the server is highly loa

Re: Trying to understand HBase/ZooKeeper Logs

2010-03-03 Thread Patrick Hunt
Michael Segel wrote: Hi, I'm trying to debug an issue where I am getting 'partial' failures. For some reason the region servers seem to end up with multiple 'live' servers on a node. (We start with 3 servers and the next morning we see 4,5 or 6 servers where a server has multiple servers 'li

Timestamp of specific row and colmun

2010-03-03 Thread Slava Gorelik
Hi. I'm trying to find a way to get timestamp for a specific row and column (cell). But, since the Cell class is deprecated as well as RowResult. The only way that I found is to get a list of KeyValue from the Result class and to find a particular column. I'm wondering is there more elegant way to

Trying to understand HBase/ZooKeeper Logs

2010-03-03 Thread Michael Segel
Hi, I'm trying to debug an issue where I am getting 'partial' failures. For some reason the region servers seem to end up with multiple 'live' servers on a node. (We start with 3 servers and the next morning we see 4,5 or 6 servers where a server has multiple servers 'live'. ) Yet if you do a

Re: fail to startup regionserver

2010-03-03 Thread Zheng Lv
Hello J-D, We updated to hbase0.20.3, and the NPE did not appear, and the regionserver startted up succesully. But when our job was running, there were some exceptions like that: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 172.16.1.208:60020 for reg