Re: WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping

2009-04-11 Thread Billy Pearson
greping the datanode looks like I get these messages when it happends [r...@server-5 hadoop]# tail -n500 -f hadoop-root-datanode-server-5.log | grep WARN 2009-04-12 01:06:51,099 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.0.1.5:50010, storageID=DS-234949010-1

Re: WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping

2009-04-11 Thread Andrew Purtell
The "Blocks not replicated yet" is a HDFS problem. Maybe I am not understanding what you are saying? So you have not increased the number of xceivers in the datanode configs? Are there any messages of interest in the datanode logs? - Andy > From: Billy Pearson > Subject: Re: WARN org.apache

Re: WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping

2009-04-11 Thread Billy Pearson
Everything is default on them except max open files its some reaily high number the only change I know that could be effecting it is nice level of hbase and hadoop hadoop nice = 5 hbase nice = 10 That way hbase runs slower then the rest when we get a load I run other stuff on the nodes about 6

Re: WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping

2009-04-11 Thread Andrew Purtell
Hi Billy, It makes sense to me that you'd see this on the HLogs first. HDFS blocks are allocated most frequently for them, except during compaction. Seems like a classic sign of DFS stress to me. What are your configuration details in terms of max open files, maximum xceiver limit, and datanode

WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping

2009-04-11 Thread Billy Pearson
I getting a bunch of WARNS WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping This is only happening on the hlogs on the servers while under heave import 30K/sec on 7 server I tried to bump the hlog size between rolls to 100K in stead of 10K thing that would help but the

Re: Setting "dfs.datanode.socket.write.timeout=0" in heavy write environment

2009-04-11 Thread Andrew Purtell
Thanks Yair for the observation and advice. - Andy > From: Yair Even-Zohar > Subject: Setting "dfs.datanode.socket.write.timeout=0" in heavy write > environment > To: hbase-user@hadoop.apache.org > Date: Thursday, April 2, 2009, 11:09 AM > I have seen several emails regarding setting > "dfs.

Re: Scan across multiple columns

2009-04-11 Thread Ryan Rawson
Unless the row is read from disk, how can one know its not the one you want? This is true for any db system, relational dbs can hide the extra reads better. Hbase doesn't provide any query language, so the full cost is realized and apparent. Server side filters can help reduce network io, but ulti

Re: shell 'table_att'

2009-04-11 Thread Billy Pearson
I had to search for the issued that mad these options active that's why I thank it should be on the wiki somewhere so others will know about them or maybe in the help command a list of options Billy "Jean-Daniel Cryans" wrote in message news:31a243e70904111456l69bd7aedld780741f65087...@mai

Re: shell 'table_att'

2009-04-11 Thread Jean-Daniel Cryans
Billy, These are currently the only attributes that you can set on the table level in the shell. J-D On Sat, Apr 11, 2009 at 5:37 PM, Billy Pearson wrote: > can we get someone to post the all the correct options for 'table_att' in > the shell in the wiki faq or somewhere? > I know there is thes

shell 'table_att'

2009-04-11 Thread Billy Pearson
can we get someone to post the all the correct options for 'table_att' in the shell in the wiki faq or somewhere? I know there is these below but I thank their is a major compaction setting also but can not find anywhere all the table level options are listed. alter 't1', {METHOD => 'table_att',

Re: How to get all versions of the data in HBase

2009-04-11 Thread Erik Holstad
Hi Ideal! It looks like the get call is the right call to make, not really sure why you are not getting more than 1 in return, you should at least get 3 back since that it the default setting of versions to keep. Not sure if you changed this setting but when you create your HColumnDescriptor you se

Re: Scan across multiple columns

2009-04-11 Thread Lars George
Hi Vincent, What I did is also have a custom getSplits() implementation in the TableInputFormat. When the splits are determined I mask out those regions that have no key of interest. Since the start and end key are ordered as a total list I can safely assume that if I scan the last few thousa

Re: Region Servers going down frequently

2009-04-11 Thread Jean-Daniel Cryans
Ninad, I'm not sure why you posted in this thread as it does not seem related, but for your answer a region will only split when a family reaches 256MB, so I guess your small number of records wasn't enough. To force a split, go in the shell and type " split 'tablename' " with the name of your tab

Re: Scan across multiple columns

2009-04-11 Thread Vaibhav Puranik
I tried to solve the same problem a week ago. Here is what I learned: There are no good indexing solutions. 0.19.1 has indexing in it, but it's not very helpful if you are using column name as data. All the other current solutions involve iterating over rows. The only good way is to denormaliz y

RE: Data not stored in lexicographical order exactly in HBase

2009-04-11 Thread Puri, Aseem
Thanks J-D, I got my answer. Aseem -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent: Friday, April 10, 2009 6:54 PM To: hbase-user@hadoop.apache.org Subject: Re: Data not stored in lexicographical order exactly

Re: How to check the distributed degree of table?

2009-04-11 Thread Edward J. Yoon
Oh, Thanks for nice information. On Fri, Apr 10, 2009 at 10:31 AM, Ryan Rawson wrote: > Hey, > > In HBase, each table is split into regions.  Each region is a contiguous set > of keys.  Once any specific region has a file that exceeds 256 MB, it is > split in half to 2 regions. HBase master gener