Re: Should a data node restart cause a region server to go down?

2012-02-06 Thread Andrew Purtell
I'm guessing HBASE-4222 is not in that version of CDH HBase?   Best regards,      - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) - Original Message - > From: Ted Yu > To: user@hbase.apache.org > Cc: > Sent: Tuesday, February 7, 2012

Re: Hadoop Version

2012-02-06 Thread Saumitra Chowdhury
Thanks Harsh I had no problem with hbase-90.2 and hadoop-0.20. Before move to hbase-92.0 I am just confirming that it works perfectly. On Tue, Feb 7, 2012 at 1:06 PM, Harsh J wrote: > Saumita, > > You do not necessarily need to change Hadoop version with HBase > upgrades (Or at least not for 0.9

{kundera-discuss} Kundera 2.0.5 Released

2012-02-06 Thread Vivek Mishra
Hi All, We are happy to announce release of Kundera 2.0.5. Kundera is a JPA 2.0 based, Object-Datastore Mapping Library for NoSQL Datastores. The idea behind Kundera is to make working with NoSQL Databases drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB and relational d

Re: Hadoop Version

2012-02-06 Thread Harsh J
Saumita, You do not necessarily need to change Hadoop version with HBase upgrades (Or at least not for 0.92 yet). 0.92 will work just fine with CDH3 (Which carries 0.20-append). Are you facing any specific issues? The guidelines at http://hbase.apache.org/book.html#hadoop would still apply for 0

Re: Hadoop Version

2012-02-06 Thread Saumitra Chowdhury
Sorry for the last line Is there anyone who are using hbase-92.0 with CDH3 ? Should I continue with that ? On Tue, Feb 7, 2012 at 11:37 AM, Saumitra Chowdhury < saumi...@smartitengineering.com> wrote: > Dear all . > > We are going to setup our hbase cluster with hadoop . We were in test > fo

Re: xceiver count, regionserver shutdown

2012-02-06 Thread Jean-Daniel Cryans
On Mon, Feb 6, 2012 at 4:47 PM, Bryan Keller wrote: > I increased the max region file size to 4gb so I should have fewer than 200 > regions per node now, more like 25. With 2 column families that will be 50 > memstores per node. 5.6gb would then flush files of 112mb. Still not close to > the me

Re: xceiver count, regionserver shutdown

2012-02-06 Thread Bryan Keller
I increased the max region file size to 4gb so I should have fewer than 200 regions per node now, more like 25. With 2 column families that will be 50 memstores per node. 5.6gb would then flush files of 112mb. Still not close to the memstore limit but shouldn't I be much better off than before?

Re: xceiver count, regionserver shutdown

2012-02-06 Thread Jean-Daniel Cryans
Good but... Keep in mind that if you just increase max filesize and memstore size without changing anything else then you'll be in the same situation except with 16GB it'll take just a bit more time to get there. Here's the math: 200 regions of 2 families means 400 memstores to fill. Assuming a

Re: xceiver count, regionserver shutdown

2012-02-06 Thread Bryan Keller
Yes, insert pattern is random, and yes, the compactions are going through the roof. Thanks for pointing me in that direction. I am going to try increasing the region max filesize to 4gb (it was set to 512mb) and the memstore flush size to 512mb (it was 128mb). I'm also going to increase the hea

Kundera 2.0.5 Released

2012-02-06 Thread Amresh Singh
Hi All, We are happy to announce release of Kundera 2.0.5. Kundera is a JPA 2.0 based, Object-Datastore Mapping Library for NoSQL Datastores. The idea behind Kundera is to make working with NoSQL Databases drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB and relational da

Re: xceiver count, regionserver shutdown

2012-02-06 Thread Jean-Daniel Cryans
Ok this helps, we're still missing your insert pattern regarding but I bet it's pretty random considering what's happening to your cluster. I'm guessing you didn't set up metrics else you would have told us that the compaction queues are through the roof during the import, but at this point I'm pr

Re: xceiver count, regionserver shutdown

2012-02-06 Thread Bryan Keller
This is happening during heavy update. I have a "wide" table with around 4 million rows that have already been inserted. I am adding billions of columns to the rows. Each row can have 20+k columns. I perform the updates in batch, i.e. I am using the HTable.put(List) API. The batch size is 1000

Re: Should a data node restart cause a region server to go down?

2012-02-06 Thread Jeff Whiting
So I restart one of the data nodes and everything continues to work just fine even though the local one is no longer valid. Additionally I can restart n-1 nodes without any problem and hbase continues to work. However as soon as I restart the last data node RSs start dying. hbck and fsck say

Re: Should a data node restart cause a region server to go down?

2012-02-06 Thread Harsh J
This is the normal behavior of the sync-API (that when the first DN in pipeline fails, the whole op is failed), correct me if am wrong. The rule here I think was that you do not want RSes to go switch over writing to a remote DN cause the first one in the pipeline (always the local one) failed. He

Re: Should a data node restart cause a region server to go down?

2012-02-06 Thread Jeff Whiting
What would "hadoop fsck /" that type of problem if there really were no nodes with that data? The worst I've seen is: Target Replicas is 4 but found 3 replica(s). ~Jeff On 2/6/2012 12:45 PM, Ted Yu wrote: In your case Error Recovery wasn't successful because of: All datanodes 10.49.29.92:500

Re: Should a data node restart cause a region server to go down?

2012-02-06 Thread Jeff Whiting
I've been able to reproduce this on multiple clusters. I'm basically doing a rolling restart of data nodes with 1 every 5-10+ minutes. However the region servers will just die. "hadoop fsck /" shows it is healthy, the web interface says all the data nodes are up, and region servers logs seem q

Re: xceiver count, regionserver shutdown

2012-02-06 Thread Jean-Daniel Cryans
The number of regions is the first thing to check, then it's about the actual number of blocks opened. Is the issue happening during a heavy insert? In this case I guess you could end up with hundreds of opened files if the compactions are piling up. Setting a bigger memstore flush size would defin

Re: Should a data node restart cause a region server to go down?

2012-02-06 Thread Ted Yu
In your case Error Recovery wasn't successful because of: All datanodes 10.49.29.92:50010 are bad. Aborting... On Mon, Feb 6, 2012 at 10:28 AM, Jeff Whiting wrote: > I was increasing the storage on some of my data nodes and thus had to do a > restart of the data node. I use cdh3u2 and ran "/etc

xceiver count, regionserver shutdown

2012-02-06 Thread Bryan Keller
I am trying to resolve an issue with my cluster when I am loading a bunch of data into HBase. I am reaching the "xciever" limit on the data nodes. Currently I have this set to 4096. The data node is logging "xceiverCount 4097 exceeds the limit of concurrent xcievers 4096". The regionservers even

Re: randomWrite tests gives random results

2012-02-06 Thread Jean-Daniel Cryans
If you didn't configure anything more than the heap, PE will by default create a table with 1 region and a low (albeit default) memstore size. This means it's spending its time waiting on splits and it's recompacting your data all the time which wastes a lot of iops. You didn't tell use which vers

Re: Problem accessing /master-status

2012-02-06 Thread Jimmy Xiang
At first, you can check how you configure the region servers, i.e. the host names of your region servers. It is in file regionservers. Then you can check which host name is not properly configured in the /etc/hosts or DNS. On Mon, Feb 6, 2012 at 10:30 AM, devrant wrote: > > Thanks for the res

Re: Problem accessing /master-status

2012-02-06 Thread devrant
Thanks for the response Jimmy. Do you know if this a error on the server side (/etc/hosts etc) or config files for hbase? (ie. conf/hbase-site.xml etc). Jimmy Xiang wrote: > > It may not be null actually. It is most likely because the hostname > cannot > be resolved to an IP address. > > Than

Should a data node restart cause a region server to go down?

2012-02-06 Thread Jeff Whiting
I was increasing the storage on some of my data nodes and thus had to do a restart of the data node. I use cdh3u2 and ran "/etc/init.d/hadoop-0.20-datanode restart" (I don't think this is a cdh problem). Unfortunately doing the restart caused region servers to go offline. Is this expected beha

Re: Problem accessing /master-status

2012-02-06 Thread Jimmy Xiang
It may not be null actually. It is most likely because the hostname cannot be resolved to an IP address. Thanks, Jimmy On Mon, Feb 6, 2012 at 10:10 AM, devrant wrote: > > I received this error below...does anyone know why the hostname is "null"? > > HTTP ERROR 500 > > Problem accessing /master

Problem accessing /master-status

2012-02-06 Thread devrant
I received this error below...does anyone know why the hostname is "null"? HTTP ERROR 500 Problem accessing /master-status. Reason: hostname can't be null Caused by: java.lang.IllegalArgumentException: hostname can't be null at java.net.InetSocketAddress.(InetSocketAddress.java:12

Re: HBase Read Performance - Multiget vs TableInputFormat Job

2012-02-06 Thread Stack
On Mon, Feb 6, 2012 at 8:58 AM, Jon Bender wrote: > When you say it'll sort regions by you, does that mean I'll need to > identify the regions before dividing up the maps?  Or just deal with the > fact that multiple maps might read from the same regionserver? > If you do a multiget on N rows, int

Re: randomWrite tests gives random results

2012-02-06 Thread Ben West
You can try turning on verbose garbage collection logs and see if the slow times correspond to a GC pause. Cloudera has a series of blog posts regarding GC pauses in HBase and how to avoid them: http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffer

Re: HBase Read Performance - Multiget vs TableInputFormat Job

2012-02-06 Thread Jon Bender
Thanks for the responses! >What percentage of total data is the 300k new rows? A constantly shrinking percentage--we may retain upwards of 5 years of data here, so running against the full table will get very expensive going forward. I think the second approach sounds best. >If you have the lis

Re: HBase Read Performance - Multiget vs TableInputFormat Job

2012-02-06 Thread Stack
On Sun, Feb 5, 2012 at 8:56 PM, Jon Bender wrote: > The two alternatives I am exploring are > >   1. Running a TableInputFormat MR job that filters for data added in the >   past day (Scan on the internal timestamp range of the cells) You'll touch all your data when you do this. What percentage

Re:HBase Read Performance - Multiget vs TableInputFormat Job

2012-02-06 Thread Gill
1. TableInputFormat splits the rows by regionLocation. with multiGet you should do it yourself 2. Get is Scan, multiGet means multiScan ? (not sure) 3. with Scan, you can use the feature of batch & caching Gill -- Original -- From: "Jon Bender"; Date: Mon, Feb

Re: storing logs in hbase

2012-02-06 Thread Eric
It sounds to me that you are better off using Hive. HBase is suitable for real time access to specific records. If you want to do batch processing (Map Reduce) on your data, like you said yourself, then Hive removes all the HBase overhead and gives you a powerful query language to search through yo

randomWrite tests gives random results

2012-02-06 Thread Assarsson, Emil
Hi, I'm tryng to optimize a hbase cluster (on hdfs) with the test randomWrite. I have 7 nodes: 1 zookeeper/name/hbase-master/jobtracker and 6 region/data/tasktrackers. Each with 1 disk, 16G memory, 2 x 4 cores. I know that I really should have more disks but for the time being I'm trying to do

Re: help with schema

2012-02-06 Thread Michel Segel
Not easy to visualize... Assuming your access path to the data is based on students, then you would serialize your college data as a column in the student's table. You need to forget your relational way of thinking. You need to think not just in terms of data, but how you intend to use the d

LeaseException while extracting data via pig/hbase integration

2012-02-06 Thread Mikael Sitruk
Hi all Recently I have upgraded my cluster from Hbase 0.90.1 to 0.90.4 (using cloudera from cdh3u0 to cdh3u2) Everything was ok till I ran pig extract on the new cluster, from the old cluster everything worked well. Now each time i run the extract in conjunction to other work performed on the clus