date:20120328

Re: 0.92 and Read/writes not scaling

2012-03-28 Thread Stack

On Wed, Mar 28, 2012 at 7:36 PM, Juhani Connolly wrote: > Since we haven't heard anything on expected throughput we're downgrading our > hdfs back to 0.20.2, I'd be curious to hear how other people do with 0.23 > and the throughput they're getting. > We don't have much experience running on 0.23,

Re: 0.92 and Read/writes not scaling

2012-03-28 Thread Dave Wang

As you said, the amount of errors and drops you are seeing are very small compared to your overall traffic, so I doubt that is a significant contributor to the throughput problems you are seeing. - Dave On Wed, Mar 28, 2012 at 7:36 PM, Juhani Connolly < juhani_conno...@cyberagent.co.jp> wrote: >

Re: 0.92 and Read/writes not scaling

2012-03-28 Thread Juhani Connolly

Ron, thanks for sharing those settings. Unfortunately they didn't help with our read throughput, but every little bit helps. Another suspicious thing that has come up is with the network... While overall throughput has been verified to be able to go much higher than the tax hbase is putting

Re: Starting Abnormally After Shutting Down For Some Time

2012-03-28 Thread Suraj Varma

Bing: Your pid file location can be setup via hbase-env.sh; default is /tmp ... # The directory where pid files are stored. /tmp by default. # export HBASE_PID_DIR=/var/hadoop/pids On Wed, Mar 28, 2012 at 3:04 PM, Peter Vandenabeele wrote: > On Wed, Mar 28, 2012 at 9:53 PM, Bing Li wrote: >> D

Re: Starting Abnormally After Shutting Down For Some Time

2012-03-28 Thread Peter Vandenabeele

On Wed, Mar 28, 2012 at 9:53 PM, Bing Li wrote: > Dear Peter, > > When I just started the Ubuntu machine, there was nothing in /tmp. > > After starting $HADOOP/bin/start-dfs.sh and $HBase/bin/start-hbase.sh, the > following files were under /tmp. Do you think anything wrong? Thanks! > > libing@gre

Re: HBase RefGuide updated

2012-03-28 Thread Doug Meil

The one thing I wanted to point out in this latest update was that I broke the Case Studies into a separate chapter (from the single entry that I put in Troubleshooting a few weeks ago). http://hbase.apache.org/book.html#casestudies Several people have posted links to some great research, so e

HBase RefGuide updated

2012-03-28 Thread Doug Meil

Hi folks- The HBase RefGuide has been updated on the website. Doug Meil Chief Software Architect, Explorys doug.m...@explorys.com

Re: Starting Abnormally After Shutting Down For Some Time

2012-03-28 Thread Bing Li

Dear Peter, When I just started the Ubuntu machine, there was nothing in /tmp. After starting $HADOOP/bin/start-dfs.sh and $HBase/bin/start-hbase.sh, the following files were under /tmp. Do you think anything wrong? Thanks! libing@greatfreeweb:/tmp$ ls -alrt total 112 drwxr-xr-x 22 root root

Re: Region server shutting down due to HDFS error

2012-03-28 Thread Eran Kutner

hmmm... I couldn't find it either, so I've looked at the history of that file and sure enough a few check-ins back it had that message. I have no idea how something like this could happen. I know I had some merge issues when I first got the latest version and built that project but I've then revert

Re: Starting Abnormally After Shutting Down For Some Time

2012-03-28 Thread Peter Vandenabeele

On Wed, Mar 28, 2012 at 7:27 PM, Bing Li wrote: > Dear all, > > I found some configuration information was saved in /tmp in my system. So > when some of the information is lost, the HBase cannot be started normally. > > But in my system, I have tried to change the HDFS directory to another > locat

Re: Starting Abnormally After Shutting Down For Some Time

2012-03-28 Thread Bing Li

Dear all, I found some configuration information was saved in /tmp in my system. So when some of the information is lost, the HBase cannot be started normally. But in my system, I have tried to change the HDFS directory to another location. Why are there still some files under /tmp? To change th

Re: Hbase RegionServer stalls on initialization

2012-03-28 Thread N Keywal

Then you should have an error in the master logs. If not, it worths checking that the master & the region servers speak to the same ZK... As it's hbase related, I redirect the question to hbase user mailing list (hadoop common is in bcc). On Wed, Mar 28, 2012 at 8:03 PM, Nabib El-Rahman < nabib.e

RE: 0.92 and Read/writes not scaling

2012-03-28 Thread Buckley,Ron

Stack, We're about 80% random read and 20% random write. So, that would have been the mix that we were running. We'll try a test with Nagel On and then Nagel off, random write only, later this afternoon and see if the same pattern emerges. Ron -Original Message- From: saint@gmail.

Re: Dealing with large data sets in client

2012-03-28 Thread Bryan Beaudreault

Thanks Stack, that's correct. It is kind of hard to describe, though I guess it's easiest to think of it as a 2d array where the 2nd dimension is sorted. I think your idea would be doable, too. I'm going to try testing them both and see how well they perform. Luckily I'm not TOO concerned about

Re: 0.92 and Read/writes not scaling

2012-03-28 Thread Stack

On Wed, Mar 28, 2012 at 5:41 AM, Buckley,Ron wrote: > For us, setting these two, got rid of all of the 20 and 40 ms response > times and dropped the average response time we measured from HBase by > more than half. Plus, we can push HBase a lot harder. > That had an effect on random read worklo

Re: Dealing with large data sets in client

2012-03-28 Thread Stack

On Tue, Mar 27, 2012 at 2:36 PM, Bryan Beaudreault wrote: > I imagine it isn't a great idea to create a ton of scans > (1 for each row), which is the only way I can think to do the above with > what we have. > You want to step through some set of rows in lock-step? That is, get first N on row A,

Re: apache.hadoop.ipc.HBaseServer: (responseTooSlow)

2012-03-28 Thread Stack

On Tue, Mar 27, 2012 at 5:56 PM, Sindy wrote: > Hadoop 1.0.1 > HBase 0.90.2 > You mean 0.92.0? > 2012-03-27 22:04:06,607 WARN org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 60 on 60120 caught: > java.nio.channels.ClosedChannelException Client timed out. Check its logs. When server w

Re: efficient export w/o HDFS/copying

2012-03-28 Thread Stack

On Wed, Mar 28, 2012 at 12:59 AM, Michel Segel wrote: > Wouldn't that mean having the NAS attached to all of the nodes in the cluster? > Yes. That was the presumption. St.Ack

Re: Region server shutting down due to HDFS error

2012-03-28 Thread Ted Yu

Eran: The error indicated some zookeeper related issue. Do you see KeeperException after the Error log ? I searched 90 codebase but couldn't find the exact log phrase: zhihyu$ find src/main -name '*.java' -exec grep "getting node's version in CLOSI" {} \; -print zhihyu$ find src/main -name '*.jav

Re: Region server shutting down due to HDFS error

2012-03-28 Thread Jean-Daniel Cryans

Can you look even further? Like a day? J-D On Wed, Mar 28, 2012 at 9:45 AM, Eran Kutner wrote: > I don't see any prior HDFS issues in the 15 minutes before this exception. > The logs on the datanode reported as problematic are clean as well. > However, I now see the log is full of errors like th

Re: Region server shutting down due to HDFS error

2012-03-28 Thread Eran Kutner

I don't see any prior HDFS issues in the 15 minutes before this exception. The logs on the datanode reported as problematic are clean as well. However, I now see the log is full of errors like this: 2012-03-28 00:15:05,358 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Proce

Re: Get failing rarely by retrieving the next row

2012-03-28 Thread Stack

On Wed, Mar 28, 2012 at 7:49 AM, Whitney Sorenson wrote: > This happens reliably when we run large jobs against our cluster which > perform many reads and writes, but it does not always happen on the > same keys. > Interesting. Anything you can figure about a particular key if you dig in some?

Re: distributed cluster proble

2012-03-28 Thread Stack

On Wed, Mar 28, 2012 at 1:18 AM, Roberto Alonso wrote: > Hello, > > I think the problem is not that. If I don't put this env > variable $HBASE_CONF_DIR, everything starts correctly, I don't think I need > to put it. My problem is that the map reduce is not executing in parallel. How are you start

Re: Region server shutting down due to HDFS error

2012-03-28 Thread Jean-Daniel Cryans

Any chance we can see what happened before that too? Usually you should see a lot more HDFS spam before getting that all the datanodes are bad. J-D On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner wrote: > Hi, > > We have region server sporadically stopping under load due supposedly to > errors writ

Re: Region server shutting down due to HDFS error

2012-03-28 Thread Eran Kutner

Thanks Stack and Harsh, I'll try both suggestions and update the list with the results. -eran On Wed, Mar 28, 2012 at 17:21, Harsh J wrote: > Eran, > > For 0.90.7 SNAPSHOT, set "hbase.regionserver.logroll.errors.tolerated" > to > 0 (default). This will help RS survive transient HLog sync > fa

Re: Region server shutting down due to HDFS error

2012-03-28 Thread Harsh J

Eran, For 0.90.7 SNAPSHOT, set "hbase.regionserver.logroll.errors.tolerated" to > 0 (default). This will help RS survive transient HLog sync failures (with local DN) by retrying a few times before the RS decides to shut itself down. Also worth investigating if you had too much IO load/etc. on the

Re: Region server shutting down due to HDFS error

2012-03-28 Thread Stack

On Wed, Mar 28, 2012 at 8:09 AM, Eran Kutner wrote: > Hi Jimmy, > HBase is built from latest sources of 0.90 branch (0.90.7-SNAPSHOT), I had > the same problem with 0.90.4 > Hadoop 0.20.2 from Cloudera CDH3u1 > Can you upgrade to CDH3u3 Eran? I don't remember if CDH3u1 had support for sync (The

Re: Region server shutting down due to HDFS error

2012-03-28 Thread Eran Kutner

Hi Jimmy, HBase is built from latest sources of 0.90 branch (0.90.7-SNAPSHOT), I had the same problem with 0.90.4 Hadoop 0.20.2 from Cloudera CDH3u1 This failure happens during large M/R jobs, I have 10 servers and usually no more than 1 would fail like this, sometimes none. One thing worth mentio

Get failing rarely by retrieving the next row

2012-03-28 Thread Whitney Sorenson

I'm noticing a failure on about .0001% of Gets, wherein instead of the actual row I request, I get the next logical row. For example, I create a Get with this key: \x00\x00\xB8\xB210291 and instead get back the row key: \x00\x00\xB8\xB2103 . This happens reliably when we run large jobs against ou

Re: Region server shutting down due to HDFS error

2012-03-28 Thread Jimmy Xiang

Which version of HDFS and HBase are you using? When the problem happens, can you access the HDFS, for example, from hadoop dfs? Thanks, Jimmy On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner wrote: > Hi, > > We have region server sporadically stopping under load due supposedly to > errors writing t

RE: 0.92 and Read/writes not scaling

2012-03-28 Thread Buckley,Ron

Juhani, We've been working on some similar performance testing on our 50 node cluster running 0.92.1 and CDH3U3. We were looking mostly at reads, but observed similar behavior. HBase wasn't particularly busy, but we couldn't make it go faster. Some debugging later, we found that many (sometimes

Re: Starting Abnormally After Shutting Down For Some Time

2012-03-28 Thread Agarwal, Saurabh

R - Original Message - From: Bing Li [mailto:lbl...@gmail.com] Sent: Wednesday, March 28, 2012 01:32 AM To: user@hbase.apache.org ; hbase-u...@hadoop.apache.org Subject: Re: Starting Abnormally After Shutting Down For Some Time Jean-Daniel, I changed dfs.data.dir and dfs.name.dir to ne

Region server shutting down due to HDFS error

2012-03-28 Thread Eran Kutner

Hi, We have region server sporadically stopping under load due supposedly to errors writing to HDFS. Things like: 2012-03-28 00:37:11,210 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing java.io.IOException: All datanodes 10.1.104.10:50010 are bad. Aborting.. It's happening with a diff

Re: ArrayIndexOutOfBoundsException in 0.90.7-SNAPSHOT

2012-03-28 Thread Daniel Lamberger

Thank you very much! On Tue, Mar 27, 2012 at 6:54 PM, Ted Yu wrote: > Index 20 corresponds to RS_ZK_REGION_FAILED_OPEN which was added by: > > HBASE-5490 Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum > list in 0.90 EventHandler > (Ram) > > As of now, is there any

Re: 0.92 and Read/writes not scaling

2012-03-28 Thread Juhani Connolly

I think there is a lot of stuff in this and the situation has changed a bit so I'd like to summarize the current situation and verify a few points: Our current environment: - CDH 4b1: hdfs 0.23 and hbase 0.92 - separate master and namenode, 64gb, 24 cores each, colocating with zookeepers(third

Re: Starting Abnormally After Shutting Down For Some Time

2012-03-28 Thread Bing Li

Dear Manish, I appreciate so much for your replies! The system tmp directory is changed to anther location in my hdfs-site.xml. If I ran $HADOOP_HOME/bin/start-all.sh, all of the services were listed, including job tracker and task tracker. 10211 SecondaryNameNode 10634 Jps 9992 Dat

Re: distributed cluster proble

2012-03-28 Thread Roberto Alonso

Hello, I think the problem is not that. If I don't put this env variable $HBASE_CONF_DIR, everything starts correctly, I don't think I need to put it. My problem is that the map reduce is not executing in parallel. if I ask: Configuration config = HBaseConfiguration.create(); config.get("hbase.c

Re: efficient export w/o HDFS/copying

2012-03-28 Thread Michel Segel

Wouldn't that mean having the NAS attached to all of the nodes in the cluster? Sent from a remote device. Please excuse any typos... Mike Segel On Mar 26, 2012, at 11:07 PM, Stack wrote: > On Mon, Mar 26, 2012 at 4:31 PM, Ted Tuttle > wrote: >> Is there a method of exporting that skips the

Re: Starting Abnormally After Shutting Down For Some Time

2012-03-28 Thread Manish Bhoge

Bing, As per my experience on the configuration I can list down some points one of which may be your solution. - first and foremost don't store your service metadata into system tmp directory because it may get cleaned up in every start and you loose all your job tracker, datanode information.

39 matches

Mail list logo