Re: provide a 0.20-append tarball?

2010-12-21 Thread Bill Graham
Hi Andrew, Just to make sure I'm clear, are you saying that HBase 0.90.0 is incompatible with CDH3b3 due to the security changes? We're just getting going with HBase and have been running 0.90.0rc1 on an un-patched version of Hadoop in dev. We were planning on upgrading to CDH3b3 to get the sync

Non DFS space usage blows up.

2010-12-21 Thread rajgopalv
I'm doing a map reduce job to create the HFileOutputFormat out of CSVs. * The mapreduce job, operates on 75files, each containing 1Million rows. Total comes up to 16GB. [with replication factor as 2, the total DFS used is 32GB ] * There are 300 Map jobs. * The map job ends perfectly. * There are

hadoop jar in hbase lib?

2010-12-21 Thread Pete Haidinyak
Is there a reason there is a hadoop jar in the hbase lib directory? Couldn't hbase us the jar in the Hadoop directory? THanks -Pete On Tue, 21 Dec 2010 21:37:28 -0800, Pete Haidinyak wrote: Good call, looks like that might have been my problem. Seems simple now. ;-) Thanks -Pete O

Re: Slow MR data load to table

2010-12-21 Thread Bradford Stephens
Unfortunately, changing swappiness didn't seem to help. On Tue, Dec 21, 2010 at 4:28 PM, Andrew Purtell wrote: >> Yes, a good point. Swappiness is set to 60 -- suppose I should set it to 0? > > Yes. > > Best regards, > >    - Andy > > Problems worthy of attack prove their worth by hitting back. >

Re: I give up, help please

2010-12-21 Thread Pete Haidinyak
Good call, looks like that might have been my problem. Seems simple now. ;-) Thanks -Pete On Tue, 21 Dec 2010 20:33:31 -0800, Claudio Martella wrote: Could you check if you're using the same hadoop jar version on hdfs and hbase? I had a similar problem once. On 12/22/10 12:32 AM, Pet

Works

2010-12-21 Thread Pete Haidinyak
Hi, Well after 21 hours I got my cluster to work. I went back to hadoop version 0.20.2 and hbase version 0.20.6 -Pete

RE: Query on getFirstMetaRegionForRegion() in RegionManager.java;Very Intersesting logic and unusual at same time.

2010-12-21 Thread Mohit
Thanks a lot, Stack. You explained me very well. No, I'm not seeing the issue till now. *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity

Re: question on indexes in RDBMS vs. noSQL self created indexes...(disk space wise)

2010-12-21 Thread Hari Sreekumar
A related question JG, do null column families take space? e.g, what if I create a column family which gets filles only in like 1 in a million rows and remains empty otherwise? thanks, hari On Wed, Dec 22, 2010 at 7:14 AM, Jonathan Gray wrote: > > > 1. It's a column based sparse table so

Re: I give up, help please

2010-12-21 Thread Claudio Martella
Could you check if you're using the same hadoop jar version on hdfs and hbase? I had a similar problem once. On 12/22/10 12:32 AM, Pete Haidinyak wrote: > After 11 hours I give up. I am trying to get a 5 node system up on my > ESXi server. > OS CentOS 5.5 x64 > Hadoop Version 0.20.2+737 > HBase V

provide a 0.20-append tarball?

2010-12-21 Thread Andrew Purtell
The latest CDH3 beta includes security changes that currently HBase 0.90 and trunk don't incorporate. Of course we can help out with clear HBase issues, but for security exceptions or similar, what about that? Do we draw a line? Where? I've looked over the CDH3B3 installation documentation but

Re: rest memory issues

2010-12-21 Thread Andrew Purtell
If that doesn't help and you can make a heap dump available via jhat somehow, I'll take a look. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) --- On Tue, 12/21/10, Jack Levin wrote: > From: Jack Levin > Subject: Re: res

RE: I give up, help please

2010-12-21 Thread Jonathan Gray
>> Caused by: java.io.EOFException >> at >> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323) >> at java.io.DataInputStream.readUTF(DataInputStream.java:572) >> at >> org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:151) >> at >> org

RE: question on indexes in RDBMS vs. noSQL self created indexes...(disk space wise)

2010-12-21 Thread Jonathan Gray
> 1. It's a column based sparse table so null's take up no space(ie. > More room when we need to duplicate) Correct. Nulls take up no space. > 2. Indexes take up space in an RDBMS already and are essentially > duplication in your old RDBMS anyways Secondary indexes in an RDBMS use

Re: I give up, help please

2010-12-21 Thread Pete Haidinyak
Tried that, same thing. Quick question, when I do a 'stop-hbase.sh' zookeeper dies but then an HMaster process appears? Thanks -Pete On Tue, 21 Dec 2010 17:24:36 -0800, Jonathan Gray wrote: You have existing data? Try clearing out your hbase directory in hdfs. Looks like some weird pro

RE: I give up, help please

2010-12-21 Thread Jonathan Gray
You have existing data? Try clearing out your hbase directory in hdfs. Looks like some weird problem reading the hbase.version file out of HDFS. > -Original Message- > From: Pete Haidinyak [mailto:javam...@cox.net] > Sent: Tuesday, December 21, 2010 3:32 PM > To: HBase Group > Subject:

Re: rest memory issues

2010-12-21 Thread Jack Levin
We can, sure... Will let you now if it gets better. -Jack On Tue, Dec 21, 2010 at 5:05 PM, Ryan Rawson wrote: > Try turning off thee incremental mode? > On Dec 21, 2010 5:04 PM, "Jack Levin" wrote: >> bummer, gc log rolled over... our Xmx: >> >> root 12774 8.2 1.5 8833008 248832 pts/3 Sl 14:57

Re: rest memory issues

2010-12-21 Thread Ryan Rawson
Try turning off thee incremental mode? On Dec 21, 2010 5:04 PM, "Jack Levin" wrote: > bummer, gc log rolled over... our Xmx: > > root 12774 8.2 1.5 8833008 248832 pts/3 Sl 14:57 10:02 > /usr/java/jdk1.6.0_12/bin/java -Xmx6000m -XX:+UseConcMarkSweepGC > -XX:+CMSIncrementalMode -XX:MaxDirectMemorySi

Re: rest memory issues

2010-12-21 Thread Jack Levin
bummer, gc log rolled over... our Xmx: root 12774 8.2 1.5 8833008 248832 pts/3 Sl 14:57 10:02 /usr/java/jdk1.6.0_12/bin/java -Xmx6000m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:MaxDirectMemorySize=6G -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/usr/lib/hbase/

Re: Slow MR data load to table

2010-12-21 Thread Andrew Purtell
> Yes, a good point. Swappiness is set to 60 -- suppose I should set it to 0? Yes. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

Re: I give up, help please

2010-12-21 Thread Andrew Purtell
My bad, their latest is 0.89.20100924+28, and now I see that you just transposed 87 for 89, so please disregard that. But do ask Cloudera for assistance. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) --- On Tue, 12/21/10

Re: I give up, help please

2010-12-21 Thread Andrew Purtell
This is a mailing list for the Apache version of HBase. Since you are using the HBase of CDH3, you need to ask Cloudera for help. > HBase Version 0.87.20100924+28 The current version of HBase in CDH3 is 0.89.20100621+17. I think you should start there. Best regards, - Andy Problems wor

Re: rest memory issues

2010-12-21 Thread Andrew Purtell
What is your -Xmx ? Easiest way to scale is run more REST gateways. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) --- On Tue, 12/21/10, Jack Levin wrote: > From: Jack Levin > Subject: rest memory issues > To: user@hbas

Re: rest memory issues

2010-12-21 Thread Jean-Daniel Cryans
Jack, the timestamps have a 2 hours difference. You sure it's the right GC log? J-D On Tue, Dec 21, 2010 at 3:00 PM, Jack Levin wrote: > here is the gc right around that time: > > 2010-12-21T14:12:05.220-0800: 6925.670: [GC [PSYoungGen: > 2336K->64K(2432K)] 11042K->8778K(14400K), 0.0012160 secs]

question on indexes in RDBMS vs. noSQL self created indexes...(disk space wise)

2010-12-21 Thread Hiller, Dean (Contractor)
I was asked a question about a concern on having indexes and in one case having to duplicate 7 times the data if we move from RDBMS to noSQL DB. My reply that I wanted to get feedback on(please let me know if I am dead wrong or what else I may be missing) was 1. It's a column based sp

I give up, help please

2010-12-21 Thread Pete Haidinyak
After 11 hours I give up. I am trying to get a 5 node system up on my ESXi server. OS CentOS 5.5 x64 Hadoop Version 0.20.2+737 HBase Version 0.87.20100924+28 Name Node - Zoo Keeper - HBase Master on one machine 4 Region Server Hadoop starts up on every node without any problems. When I try to

Re: Slow MR data load to table

2010-12-21 Thread Bradford Stephens
Yes, a good point. Swappiness is set to 60 -- suppost I should set it to 0? On Tue, Dec 21, 2010 at 5:58 AM, Lars George wrote: > Hi Bradford, > > I heard this before recently and one of the things that bit the person > in question in the butt was swapping. Could you check that all > machines are

Re: rest memory issues

2010-12-21 Thread Jack Levin
here is the gc right around that time: 2010-12-21T14:12:05.220-0800: 6925.670: [GC [PSYoungGen: 2336K->64K(2432K)] 11042K->8778K(14400K), 0.0012160 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 2010-12-21T14:12:05.677-0800: 6926.128: [GC [PSYoungGen: 2304K->128K(2432K)] 11018K->8858K(14400K),

rest memory issues

2010-12-21 Thread Jack Levin
We are getting those errors in the logs: 12781689 is mtag2.prod.imageshack.com:60020 2010-12-21 12:07:51,700 ERROR org.apache.zookeeper.ClientCnxn: from 1741604...@qtp-1278414937-4-sendthread(mtag3:2181) java.lang.OutOfMemoryError: GC overhead limit exceeded 2010-12-21 12:07:51,701 DEBUG org.apac

RE: Jean-Daniel: RE: some data replication support in hbase?

2010-12-21 Thread Jonathan Gray
Seems like hooking into replication would be a good approach. There's also a JIRA open about a changes API. https://issues.apache.org/jira/browse/HBASE-3247 Or you could use Coprocessors which are committed in 0.92 / trunk. The pre/post hooks can be used as a per-operation trigger mechanism.

Jean-Daniel: RE: some data replication support in hbase?

2010-12-21 Thread Hiller, Dean (Contractor)
Actually yes. Very nice read. Is there an api that we could hook in triggers too? Ie. This could help us two ways...we could more easily to replication to Sybase AND we have this need where we want to just write data into clusters and have code that just runs as soon as that happens on those

Re: all regionserver shutdown after close hdfs datanode

2010-12-21 Thread Stack
2010/12/20 Zhou Shuaifeng : > Hi, > I checked the log, It's not the master caused the regionserver shutdown, but > the regionserver log rolling failed caused regionserver shutdown. > The problem block only had one replica? If you look in the hdfs emissions, it'll usually log other nodes that have

hbase triggers? / map/reduce

2010-12-21 Thread Hiller, Dean (Contractor)
I have what seems to be unique situation or I can't find a comparative example. Every night, we slam our data storage with new information coming in and need to use old and new information to calculate values to be used by the web apps which is very very intensive(some calculus involved as well)..

Re: some data replication support in hbase?

2010-12-21 Thread Jean-Daniel Cryans
Have you read this? http://hbase.apache.org/docs/r0.89.20100924/replication.html It's still experimental but we've been using it here since September with success (I also happen to be the one who wrote the feature). J-D On Tue, Dec 21, 2010 at 9:14 AM, Hiller, Dean (Contractor) wrote: > Are th

Re: Query on getFirstMetaRegionForRegion() in RegionManager.java;Very Intersesting logic and unusual at same time.

2010-12-21 Thread Stack
On Mon, Dec 20, 2010 at 10:32 PM, Mohit wrote: > I call it static assignment, i.e. before putting it into regionInTransistion > and before load factor actually being considered. > > Correct me if I'm wrong, > Sounds right. > This static  assignment is based on the byte comparison result, whichev

some data replication support in hbase?

2010-12-21 Thread Hiller, Dean (Contractor)
Are there any hooks in hbase to do data replication? We have to try to move our 12 hour batch jobs down to 3 hours or so and are looking at moving into a noSQL environment, but currently, customers have replicated data(only a small subset of tables because our data set size is so big). Are there

Re: Slow MR data load to table

2010-12-21 Thread Lars George
Hi Bradford, I heard this before recently and one of the things that bit the person in question in the butt was swapping. Could you check that all machines are positively healthy and not swapping etc. - just to rule out the (not so) obvious stuff. Lars On Mon, Dec 20, 2010 at 8:22 PM, Bradford S