Re: Hmaster is OutOfMemory

2011-05-09 Thread Gaojinchao
My first cluster needs save 147 TB data. If one region has 512M or 1 GB, It will be 300 K regions or 147K regions. In the future, If store several PB, It will more regions. I have a silly question. how do it in Facebook? -邮件原件- 发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Sta

Re: A question about client

2011-05-09 Thread Gaojinchao
Hbase version: 0.90.2 . I merged patches: HBASE-3773 Set ZK max connections much higher in 0.90 HBASE-3771 All jsp pages don't clean their HBA HBASE-3783 hbase-0.90.2.jar exists in hbase root and in 'lib/' HBASE-3756 Can't move META or ROOT from shell HBASE-3744 createTable

Re: Adding new disks to an Hadoop Cluster

2011-05-09 Thread lohit
Yes, you have to bounce datanode so that it can start using the disk. Also note that you have to tell datanode to use this disk via dfs.data.dir config parameter in hdfs-site.xml. Same with tasktracker, if you want tasktracker to use this disk for its temp output, you have to tell it via mapred-sit

Adding new disks to an Hadoop Cluster

2011-05-09 Thread Pete Haidinyak
Hi all, When you add a disk to a Hadoop data node do you have to bounce the node (restart mapreduce and dfs) before Hadoop can use the new disk? Thanks -Pete

Re: Hmaster is OutOfMemory

2011-05-09 Thread Stack
2011/5/9 Gaojinchao : > Hbase version : 0.90.3RC0 > > It happened when creating table with Regions > I find master started needs so much memory when the cluster has 100K regions Do you need to have 100k regions in the cluster Gao? Or, you are just testing how we do w/ 100k regions? > It seems l

Re: Hmaster is OutOfMemory

2011-05-09 Thread Stack
Want to dump the heap and put it somewhere so we can pull it and take a look Gao? St.Ack 2011/5/9 Gaojinchao : > Hbase version : 0.90.3RC0 > > It happened when creating table with Regions > I find master started needs so much memory when the cluster has 100K regions > It seems likes zkclientcnxn.

Re: Hmaster is OutOfMemory

2011-05-09 Thread Gaojinchao
Hbase version : 0.90.3RC0 It happened when creating table with Regions I find master started needs so much memory when the cluster has 100K regions It seems likes zkclientcnxn. It seems master assigned region need improve. top -c | grep 5834 5834 root 20 0 8875m 7.9g 11m S2 50.5 33

Background Jobs

2011-05-09 Thread sa_le
Given my system with 1.5 lacs users and 2 lacs contents(say documents with some attributes say title, tags, description etc.). I need to run a loop within a loop where I need to iterate over these 1.5 lacs users and for each of the user in the outer loop I need to iterate 2 ... -- View this messa

Re: Error of "Got error in response to OP_READ_BLOCK for file"

2011-05-09 Thread Jean-Daniel Cryans
Very often the "cannot open filename" happens when the region in question was reopened somewhere else and that region was compacted. As to why it was reassigned, most of the time it's because of garbage collections taking too long. The master log should have all the required evidence, and the regio

Re: Hmaster has some warn logs.

2011-05-09 Thread Jean-Daniel Cryans
Anything closer to 05:13:05 in the region server? That's an hour and a half before. J-D On Thu, May 5, 2011 at 6:03 PM, Gaojinchao wrote: > Thanks for your reply. > > I got it. But master and Region server is same machine and the cluster is > free. > > In my scenario: > 1/ before 2011-05-04 03:

Re: any performance results of transferring tera bytes from db to hbase?

2011-05-09 Thread Jean-Daniel Cryans
That could be easily done with bulk loads http://hbase.apache.org/bulk-loads.html it will really depend on how fast you can get the data out of sybase providing that you have the appropriate hardware for hbase. More than a year ago we loaded a bit more than a 1TB (pre-replication) into 20 machines

RE: just trying to get into HBase from java

2011-05-09 Thread Doug Meil
Read this section in the HBase book... http://hbase.apache.org/book.html#d427e2108 "Java client configuration" -Original Message- From: James McGlothlin [mailto:mcglothli...@gmail.com] Sent: Monday, May 09, 2011 4:57 PM To: hbase-u...@hadoop.apache.org Subject: just trying to get into

Re: just trying to get into HBase from java

2011-05-09 Thread Ted Yu
The most important conf parameter for client to connect to HBase cluster is hbase.zookeeper.quorum On Mon, May 9, 2011 at 1:57 PM, James McGlothlin wrote: > > I am able to utilize HBase from the shell with no problems. > > However, I have been unable to access it from Java code. I may very well

just trying to get into HBase from java

2011-05-09 Thread James McGlothlin
I am able to utilize HBase from the shell with no problems. However, I have been unable to access it from Java code. I may very well be making some simple error but I am at a lost as all the examples I find do it the same way. I have tried: Configuration conf= HBaseConfiguration.create(); HBa

Re: Performance test results

2011-05-09 Thread Eran Kutner
I tried flushing the table, not a specific region. -eran On Mon, May 9, 2011 at 20:03, Stack wrote: > On Mon, May 9, 2011 at 9:31 AM, Eran Kutner wrote: > > OK, I tried it, truncated the table and ran inserts for about a day. Now > I > > tried flushing the table but I get a "Region is not on

Re: VMWare and Hadoop/Hbase

2011-05-09 Thread Andrew Purtell
It is not advisable to do this. Hadoop/HBase is very I/O intensive. They should have dedicated hardware. Why add the overhead of Hypervisor mediation on the I/O path then? --- On Mon, 5/9/11, Vishal Kapoor wrote: > From: Vishal Kapoor > Subject: VMWare and Hadoop/Hbase > To: user@hbase.apache

RE: VMWare and Hadoop/Hbase

2011-05-09 Thread Doug Meil
For a dev cluster (i.e., something where you aren't trying to do performance testing) it's a reasonable approach. But I wouldn't do it on a production cluster. -Original Message- From: Vishal Kapoor [mailto:vishal.kapoor...@gmail.com] Sent: Monday, May 09, 2011 3:24 PM To: user@hbase.

VMWare and Hadoop/Hbase

2011-05-09 Thread Vishal Kapoor
We were wondering if its advisable to provision hbase/hadoop nodes as VMWare instances? any suggestions? thanks, Vishal

RS data-transfer metric suggestion...

2011-05-09 Thread Doug Meil
Hi everybody- There I just thought I'd ping the group to see what everybody thought about this RS metric suggestion... https://issues.apache.org/jira/browse/HBASE-3869 Doug Meil Chief Software Architect, Explorys doug.m...@explorys.com

Re: Data retention in HBase

2011-05-09 Thread Ophir Cohen
PS The deletion is matter of privacy, security and terms-of-service not only storage problems... On Mon, May 9, 2011 at 8:33 PM, Ophir Cohen wrote: > Tell it to my company ;) > > It looks like a nice tool to have such an a region dropper... > I'll take a look and will come back to discuss it. >

Re: Data retention in HBase

2011-05-09 Thread Ted Dunning
If you change your key to "date - customer id - time stamp - session id" then you shouldn't lose any important data locality, but you would be able to delete things more efficiently. For one thing, any map-reduce programs that are running for deleting would be doing dense scans over a small part o

Re: Data retention in HBase

2011-05-09 Thread Ophir Cohen
Tell it to my company ;) It looks like a nice tool to have such an a region dropper... I'll take a look and will come back to discuss it. If I'll go this direction I'm sure going to automate it... Ophir On Mon, May 9, 2011 at 8:29 PM, Stack wrote: > On Mon, May 9, 2011 at 10:09 AM, Ophir Cohen

Re: Data retention in HBase

2011-05-09 Thread Stack
On Mon, May 9, 2011 at 10:09 AM, Ophir Cohen wrote: > Actually the main motivation to remove old rows is that we have storage > limitations (and too much data...). > Ophir: Haven't you heard. 'Real' bigdata men and women don't delete! I think you should try the sequence outlined in the previous

any performance results of transferring tera bytes from db to hbase?

2011-05-09 Thread Hiller, Dean (Contractor)
We were recently trying to transfer 2.5 terabytes from Sybase to a nosql environment and I am wondering if hbase on hadoopDfs would be much faster. Anyone know how quickly they have been able to go from 1 terabyte in a database to 1 terabyte in hbase? We are still working on something that eve

Re: Hmaster is OutOfMemory

2011-05-09 Thread Jean-Daniel Cryans
It looks like the master entered a GC loop of death (since there are a lot of "We slept 76166ms" messages) and finally died. Was it splitting logs? Did you get a heap dump? Did you inspect it and can you tell what was using all that space? Thx, J-D 2011/5/8 Gaojinchao : > Hbase version 0.90.2: >

Re: Data retention in HBase

2011-05-09 Thread Ophir Cohen
Thanks for the answer! A little bit more info: Our data is internal events grouped for sessions (i.e. group of events). There is differnet sessions to differnet customers. We talking about millions sessions per day. The key is *customer id - time stamp - sessions id. * So, yes it sorted by custom

Re: A question about client

2011-05-09 Thread Jean-Daniel Cryans
TreeMap isn't concurrent and it seems it was used that way? I know you guys are testing a bunch of different things at the same time so which HBase version and which patches were you using when you got that? Thx, J-D On Mon, May 9, 2011 at 5:22 AM, Gaojinchao wrote: >    I used ycsb to put data

Re: Performance test results

2011-05-09 Thread Stack
On Mon, May 9, 2011 at 9:31 AM, Eran Kutner wrote: > OK, I tried it, truncated the table and ran inserts for about a day. Now I > tried flushing the table but I get a "Region is not online" error, although > all the servers are up, no regions are in transition and as far as I can > tell all the re

Re: Data retention in HBase

2011-05-09 Thread Stack
What Ted says and then some comments inline below. On Mon, May 9, 2011 at 2:59 AM, Ophir Cohen wrote: >   3. Need to perform main compaction afterwards - that will affect >   performance or even stop service (is that right???). > It will do the former. It should nod to the latter. Thats a prob

Re: Performance test results

2011-05-09 Thread Eran Kutner
OK, I tried it, truncated the table and ran inserts for about a day. Now I tried flushing the table but I get a "Region is not online" error, although all the servers are up, no regions are in transition and as far as I can tell all the regions seem up. I can even read rows which are supposedly in

Re: Data retention in HBase

2011-05-09 Thread Ted Dunning
Can you say a bit more about your data organization? Are you storing transactions of some kind? If so an your key involve time? I think that putting some extract of time (day number perhaps) as a leading Are you storing profiles where the key is the user (or something) id and the data is essen

A question about client

2011-05-09 Thread Gaojinchao
I used ycsb to put data and threw exception. Who can give me some suggestion? Hbase Code: // Cut the cache so that we only get the part that could contain // regions that match our key SoftValueSortedMap matchingRegions = tableLocations.headMap(row); //

Data retention in HBase

2011-05-09 Thread Ophir Cohen
Hi All, In my company currently we are working hard on deployment our cluster with HBase. We talking of ~20 nodes to hold pretty big data (~1TB per day). As there is a lot of data, we need a retention method, i.e. a way to remove old data. The problem is that I can't/want to do it using TTL caus

Re: Problem with zookeeper port while using completebulkupload

2011-05-09 Thread Lukas
Hi there, I ran in the same issue in a pseudo distributed setting and a custom HBase config location. It seems to be the same issue as in HBASE-3578 [1] and in another thread here on the mailing list [2]. I quickly fixed it by calling HBaseConfiguration.addHbaseResources(this.getConf()) to the Load