On Wed, Mar 28, 2012 at 7:36 PM, Juhani Connolly
wrote:
> Since we haven't heard anything on expected throughput we're downgrading our
> hdfs back to 0.20.2, I'd be curious to hear how other people do with 0.23
> and the throughput they're getting.
>
We don't have much experience running on 0.23,
As you said, the amount of errors and drops you are seeing are very small
compared to your overall traffic, so I doubt that is a significant
contributor to the throughput problems you are seeing.
- Dave
On Wed, Mar 28, 2012 at 7:36 PM, Juhani Connolly <
juhani_conno...@cyberagent.co.jp> wrote:
>
Ron,
thanks for sharing those settings. Unfortunately they didn't help with
our read throughput, but every little bit helps.
Another suspicious thing that has come up is with the network... While
overall throughput has been verified to be able to go much higher than
the tax hbase is putting
Bing:
Your pid file location can be setup via hbase-env.sh; default is /tmp ...
# The directory where pid files are stored. /tmp by default.
# export HBASE_PID_DIR=/var/hadoop/pids
On Wed, Mar 28, 2012 at 3:04 PM, Peter Vandenabeele
wrote:
> On Wed, Mar 28, 2012 at 9:53 PM, Bing Li wrote:
>> D
On Wed, Mar 28, 2012 at 9:53 PM, Bing Li wrote:
> Dear Peter,
>
> When I just started the Ubuntu machine, there was nothing in /tmp.
>
> After starting $HADOOP/bin/start-dfs.sh and $HBase/bin/start-hbase.sh, the
> following files were under /tmp. Do you think anything wrong? Thanks!
>
> libing@gre
The one thing I wanted to point out in this latest update was that I broke the
Case Studies into a separate chapter (from the single entry that I put in
Troubleshooting a few weeks ago).
http://hbase.apache.org/book.html#casestudies
Several people have posted links to some great research, so e
Hi folks-
The HBase RefGuide has been updated on the website.
Doug Meil
Chief Software Architect, Explorys
doug.m...@explorys.com
Dear Peter,
When I just started the Ubuntu machine, there was nothing in /tmp.
After starting $HADOOP/bin/start-dfs.sh and $HBase/bin/start-hbase.sh, the
following files were under /tmp. Do you think anything wrong? Thanks!
libing@greatfreeweb:/tmp$ ls -alrt
total 112
drwxr-xr-x 22 root root
hmmm... I couldn't find it either, so I've looked at the history of that
file and sure enough a few check-ins back it had that message.
I have no idea how something like this could happen. I know I had some
merge issues when I first got the latest version and built that project but
I've then revert
On Wed, Mar 28, 2012 at 7:27 PM, Bing Li wrote:
> Dear all,
>
> I found some configuration information was saved in /tmp in my system. So
> when some of the information is lost, the HBase cannot be started normally.
>
> But in my system, I have tried to change the HDFS directory to another
> locat
Dear all,
I found some configuration information was saved in /tmp in my system. So
when some of the information is lost, the HBase cannot be started normally.
But in my system, I have tried to change the HDFS directory to another
location. Why are there still some files under /tmp?
To change th
Then you should have an error in the master logs.
If not, it worths checking that the master & the region servers speak to
the same ZK...
As it's hbase related, I redirect the question to hbase user mailing list
(hadoop common is in bcc).
On Wed, Mar 28, 2012 at 8:03 PM, Nabib El-Rahman <
nabib.e
Stack,
We're about 80% random read and 20% random write. So, that would have been the
mix that we were running.
We'll try a test with Nagel On and then Nagel off, random write only, later
this afternoon and see if the same pattern emerges.
Ron
-Original Message-
From: saint@gmail.
Thanks Stack, that's correct. It is kind of hard to describe, though I
guess it's easiest to think of it as a 2d array where the 2nd dimension is
sorted.
I think your idea would be doable, too. I'm going to try testing them both
and see how well they perform. Luckily I'm not TOO concerned about
On Wed, Mar 28, 2012 at 5:41 AM, Buckley,Ron wrote:
> For us, setting these two, got rid of all of the 20 and 40 ms response
> times and dropped the average response time we measured from HBase by
> more than half. Plus, we can push HBase a lot harder.
>
That had an effect on random read worklo
On Tue, Mar 27, 2012 at 2:36 PM, Bryan Beaudreault
wrote:
> I imagine it isn't a great idea to create a ton of scans
> (1 for each row), which is the only way I can think to do the above with
> what we have.
>
You want to step through some set of rows in lock-step? That is, get
first N on row A,
On Tue, Mar 27, 2012 at 5:56 PM, Sindy wrote:
> Hadoop 1.0.1
> HBase 0.90.2
>
You mean 0.92.0?
> 2012-03-27 22:04:06,607 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 60 on 60120 caught:
> java.nio.channels.ClosedChannelException
Client timed out. Check its logs. When server w
On Wed, Mar 28, 2012 at 12:59 AM, Michel Segel
wrote:
> Wouldn't that mean having the NAS attached to all of the nodes in the cluster?
>
Yes. That was the presumption.
St.Ack
Eran:
The error indicated some zookeeper related issue.
Do you see KeeperException after the Error log ?
I searched 90 codebase but couldn't find the exact log phrase:
zhihyu$ find src/main -name '*.java' -exec grep "getting node's version in
CLOSI" {} \; -print
zhihyu$ find src/main -name '*.jav
Can you look even further? Like a day?
J-D
On Wed, Mar 28, 2012 at 9:45 AM, Eran Kutner wrote:
> I don't see any prior HDFS issues in the 15 minutes before this exception.
> The logs on the datanode reported as problematic are clean as well.
> However, I now see the log is full of errors like th
I don't see any prior HDFS issues in the 15 minutes before this exception.
The logs on the datanode reported as problematic are clean as well.
However, I now see the log is full of errors like this:
2012-03-28 00:15:05,358 DEBUG
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Proce
On Wed, Mar 28, 2012 at 7:49 AM, Whitney Sorenson wrote:
> This happens reliably when we run large jobs against our cluster which
> perform many reads and writes, but it does not always happen on the
> same keys.
>
Interesting. Anything you can figure about a particular key if you
dig in some?
On Wed, Mar 28, 2012 at 1:18 AM, Roberto Alonso wrote:
> Hello,
>
> I think the problem is not that. If I don't put this env
> variable $HBASE_CONF_DIR, everything starts correctly, I don't think I need
> to put it. My problem is that the map reduce is not executing in parallel.
How are you start
Any chance we can see what happened before that too? Usually you
should see a lot more HDFS spam before getting that all the datanodes
are bad.
J-D
On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner wrote:
> Hi,
>
> We have region server sporadically stopping under load due supposedly to
> errors writ
Thanks Stack and Harsh, I'll try both suggestions and update the list with
the results.
-eran
On Wed, Mar 28, 2012 at 17:21, Harsh J wrote:
> Eran,
>
> For 0.90.7 SNAPSHOT, set "hbase.regionserver.logroll.errors.tolerated"
> to > 0 (default). This will help RS survive transient HLog sync
> fa
Eran,
For 0.90.7 SNAPSHOT, set "hbase.regionserver.logroll.errors.tolerated"
to > 0 (default). This will help RS survive transient HLog sync
failures (with local DN) by retrying a few times before the RS decides
to shut itself down.
Also worth investigating if you had too much IO load/etc. on the
On Wed, Mar 28, 2012 at 8:09 AM, Eran Kutner wrote:
> Hi Jimmy,
> HBase is built from latest sources of 0.90 branch (0.90.7-SNAPSHOT), I had
> the same problem with 0.90.4
> Hadoop 0.20.2 from Cloudera CDH3u1
>
Can you upgrade to CDH3u3 Eran? I don't remember if CDH3u1 had
support for sync (The
Hi Jimmy,
HBase is built from latest sources of 0.90 branch (0.90.7-SNAPSHOT), I had
the same problem with 0.90.4
Hadoop 0.20.2 from Cloudera CDH3u1
This failure happens during large M/R jobs, I have 10 servers and usually
no more than 1 would fail like this, sometimes none.
One thing worth mentio
I'm noticing a failure on about .0001% of Gets, wherein instead of the
actual row I request, I get the next logical row.
For example, I create a Get with this key: \x00\x00\xB8\xB210291 and
instead get back the row key: \x00\x00\xB8\xB2103 .
This happens reliably when we run large jobs against ou
Which version of HDFS and HBase are you using?
When the problem happens, can you access the HDFS, for example, from hadoop dfs?
Thanks,
Jimmy
On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner wrote:
> Hi,
>
> We have region server sporadically stopping under load due supposedly to
> errors writing t
Juhani,
We've been working on some similar performance testing on our 50 node
cluster running 0.92.1 and CDH3U3.
We were looking mostly at reads, but observed similar behavior. HBase
wasn't particularly busy, but we couldn't make it go faster.
Some debugging later, we found that many (sometimes
R
- Original Message -
From: Bing Li [mailto:lbl...@gmail.com]
Sent: Wednesday, March 28, 2012 01:32 AM
To: user@hbase.apache.org ; hbase-u...@hadoop.apache.org
Subject: Re: Starting Abnormally After Shutting Down For Some Time
Jean-Daniel,
I changed dfs.data.dir and dfs.name.dir to ne
Hi,
We have region server sporadically stopping under load due supposedly to
errors writing to HDFS. Things like:
2012-03-28 00:37:11,210 WARN org.apache.hadoop.hdfs.DFSClient: Error while
syncing
java.io.IOException: All datanodes 10.1.104.10:50010 are bad. Aborting..
It's happening with a diff
Thank you very much!
On Tue, Mar 27, 2012 at 6:54 PM, Ted Yu wrote:
> Index 20 corresponds to RS_ZK_REGION_FAILED_OPEN which was added by:
>
> HBASE-5490 Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum
> list in 0.90 EventHandler
> (Ram)
>
> As of now, is there any
I think there is a lot of stuff in this and the situation has changed a
bit so I'd like to summarize the current situation and verify a few points:
Our current environment:
- CDH 4b1: hdfs 0.23 and hbase 0.92
- separate master and namenode, 64gb, 24 cores each, colocating with
zookeepers(third
Dear Manish,
I appreciate so much for your replies!
The system tmp directory is changed to anther location in my hdfs-site.xml.
If I ran $HADOOP_HOME/bin/start-all.sh, all of the services were listed,
including job tracker and task tracker.
10211 SecondaryNameNode
10634 Jps
9992 Dat
Hello,
I think the problem is not that. If I don't put this env
variable $HBASE_CONF_DIR, everything starts correctly, I don't think I need
to put it. My problem is that the map reduce is not executing in parallel.
if I ask:
Configuration config = HBaseConfiguration.create();
config.get("hbase.c
Wouldn't that mean having the NAS attached to all of the nodes in the cluster?
Sent from a remote device. Please excuse any typos...
Mike Segel
On Mar 26, 2012, at 11:07 PM, Stack wrote:
> On Mon, Mar 26, 2012 at 4:31 PM, Ted Tuttle
> wrote:
>> Is there a method of exporting that skips the
Bing,
As per my experience on the configuration I can list down some points one of
which may be your solution.
- first and foremost don't store your service metadata into system tmp
directory because it may get cleaned up in every start and you loose all your
job tracker, datanode information.
39 matches
Mail list logo