My first cluster needs save 147 TB data. If one region has 512M or 1 GB, It
will be 300 K regions or 147K regions.
In the future, If store several PB, It will more regions.
I have a silly question.
how do it in Facebook?
-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Sta
Hbase version: 0.90.2 .
I merged patches:
HBASE-3773 Set ZK max connections much higher in 0.90
HBASE-3771 All jsp pages don't clean their HBA
HBASE-3783 hbase-0.90.2.jar exists in hbase root and in 'lib/'
HBASE-3756 Can't move META or ROOT from shell
HBASE-3744 createTable
Yes, you have to bounce datanode so that it can start using the disk. Also
note that you have to tell datanode to use this disk via dfs.data.dir config
parameter in hdfs-site.xml. Same with tasktracker, if you want tasktracker
to use this disk for its temp output, you have to tell it via
mapred-sit
Hi all,
When you add a disk to a Hadoop data node do you have to bounce the
node (restart mapreduce and dfs) before Hadoop can use the new disk?
Thanks
-Pete
2011/5/9 Gaojinchao :
> Hbase version : 0.90.3RC0
>
> It happened when creating table with Regions
> I find master started needs so much memory when the cluster has 100K regions
Do you need to have 100k regions in the cluster Gao? Or, you are just
testing how we do w/ 100k regions?
> It seems l
Want to dump the heap and put it somewhere so we can pull it and take
a look Gao?
St.Ack
2011/5/9 Gaojinchao :
> Hbase version : 0.90.3RC0
>
> It happened when creating table with Regions
> I find master started needs so much memory when the cluster has 100K regions
> It seems likes zkclientcnxn.
Hbase version : 0.90.3RC0
It happened when creating table with Regions
I find master started needs so much memory when the cluster has 100K regions
It seems likes zkclientcnxn.
It seems master assigned region need improve.
top -c | grep 5834
5834 root 20 0 8875m 7.9g 11m S2 50.5 33
Given my system with 1.5 lacs users and 2 lacs contents(say documents with
some attributes say title, tags, description etc.). I need to run a loop
within a loop where I need to iterate over these 1.5 lacs users and for each
of the user in the outer loop I need to iterate 2 ...
--
View this messa
Very often the "cannot open filename" happens when the region in
question was reopened somewhere else and that region was compacted. As
to why it was reassigned, most of the time it's because of garbage
collections taking too long. The master log should have all the
required evidence, and the regio
Anything closer to 05:13:05 in the region server? That's an hour and a
half before.
J-D
On Thu, May 5, 2011 at 6:03 PM, Gaojinchao wrote:
> Thanks for your reply.
>
> I got it. But master and Region server is same machine and the cluster is
> free.
>
> In my scenario:
> 1/ before 2011-05-04 03:
That could be easily done with bulk loads
http://hbase.apache.org/bulk-loads.html it will really depend on how
fast you can get the data out of sybase providing that you have the
appropriate hardware for hbase.
More than a year ago we loaded a bit more than a 1TB (pre-replication)
into 20 machines
Read this section in the HBase book...
http://hbase.apache.org/book.html#d427e2108
"Java client configuration"
-Original Message-
From: James McGlothlin [mailto:mcglothli...@gmail.com]
Sent: Monday, May 09, 2011 4:57 PM
To: hbase-u...@hadoop.apache.org
Subject: just trying to get into
The most important conf parameter for client to connect to HBase cluster is
hbase.zookeeper.quorum
On Mon, May 9, 2011 at 1:57 PM, James McGlothlin wrote:
>
> I am able to utilize HBase from the shell with no problems.
>
> However, I have been unable to access it from Java code. I may very well
I am able to utilize HBase from the shell with no problems.
However, I have been unable to access it from Java code. I may very well be
making some simple error but I am at a lost as all the examples I find do it
the same way.
I have tried:
Configuration conf= HBaseConfiguration.create();
HBa
I tried flushing the table, not a specific region.
-eran
On Mon, May 9, 2011 at 20:03, Stack wrote:
> On Mon, May 9, 2011 at 9:31 AM, Eran Kutner wrote:
> > OK, I tried it, truncated the table and ran inserts for about a day. Now
> I
> > tried flushing the table but I get a "Region is not on
It is not advisable to do this.
Hadoop/HBase is very I/O intensive. They should have dedicated hardware. Why
add the overhead of Hypervisor mediation on the I/O path then?
--- On Mon, 5/9/11, Vishal Kapoor wrote:
> From: Vishal Kapoor
> Subject: VMWare and Hadoop/Hbase
> To: user@hbase.apache
For a dev cluster (i.e., something where you aren't trying to do performance
testing) it's a reasonable approach. But I wouldn't do it on a production
cluster.
-Original Message-
From: Vishal Kapoor [mailto:vishal.kapoor...@gmail.com]
Sent: Monday, May 09, 2011 3:24 PM
To: user@hbase.
We were wondering if its advisable to provision hbase/hadoop nodes as VMWare
instances?
any suggestions?
thanks,
Vishal
Hi everybody-
There I just thought I'd ping the group to see what everybody thought about
this RS metric suggestion...
https://issues.apache.org/jira/browse/HBASE-3869
Doug Meil
Chief Software Architect, Explorys
doug.m...@explorys.com
PS
The deletion is matter of privacy, security and terms-of-service not only
storage problems...
On Mon, May 9, 2011 at 8:33 PM, Ophir Cohen wrote:
> Tell it to my company ;)
>
> It looks like a nice tool to have such an a region dropper...
> I'll take a look and will come back to discuss it.
>
If you change your key to "date - customer id - time stamp - session id"
then you shouldn't lose any important
data locality, but you would be able to delete things more efficiently.
For one thing, any map-reduce programs that are running for deleting would
be doing dense scans over a small
part o
Tell it to my company ;)
It looks like a nice tool to have such an a region dropper...
I'll take a look and will come back to discuss it.
If I'll go this direction I'm sure going to automate it...
Ophir
On Mon, May 9, 2011 at 8:29 PM, Stack wrote:
> On Mon, May 9, 2011 at 10:09 AM, Ophir Cohen
On Mon, May 9, 2011 at 10:09 AM, Ophir Cohen wrote:
> Actually the main motivation to remove old rows is that we have storage
> limitations (and too much data...).
>
Ophir: Haven't you heard. 'Real' bigdata men and women don't delete!
I think you should try the sequence outlined in the previous
We were recently trying to transfer 2.5 terabytes from Sybase to a nosql
environment and I am wondering if hbase on hadoopDfs would be much faster.
Anyone know how quickly they have been able to go from 1 terabyte in a database
to 1 terabyte in hbase?
We are still working on something that eve
It looks like the master entered a GC loop of death (since there are a
lot of "We slept 76166ms" messages) and finally died. Was it splitting
logs? Did you get a heap dump? Did you inspect it and can you tell
what was using all that space?
Thx,
J-D
2011/5/8 Gaojinchao :
> Hbase version 0.90.2:
>
Thanks for the answer!
A little bit more info:
Our data is internal events grouped for sessions (i.e. group of events).
There is differnet sessions to differnet customers.
We talking about millions sessions per day.
The key is *customer id - time stamp - sessions id.
*
So, yes it sorted by custom
TreeMap isn't concurrent and it seems it was used that way? I know you
guys are testing a bunch of different things at the same time so which
HBase version and which patches were you using when you got that?
Thx,
J-D
On Mon, May 9, 2011 at 5:22 AM, Gaojinchao wrote:
> I used ycsb to put data
On Mon, May 9, 2011 at 9:31 AM, Eran Kutner wrote:
> OK, I tried it, truncated the table and ran inserts for about a day. Now I
> tried flushing the table but I get a "Region is not online" error, although
> all the servers are up, no regions are in transition and as far as I can
> tell all the re
What Ted says and then some comments inline below.
On Mon, May 9, 2011 at 2:59 AM, Ophir Cohen wrote:
> 3. Need to perform main compaction afterwards - that will affect
> performance or even stop service (is that right???).
>
It will do the former. It should nod to the latter. Thats a prob
OK, I tried it, truncated the table and ran inserts for about a day. Now I
tried flushing the table but I get a "Region is not online" error, although
all the servers are up, no regions are in transition and as far as I can
tell all the regions seem up. I can even read rows which are supposedly in
Can you say a bit more about your data organization?
Are you storing transactions of some kind? If so an your key involve time?
I think that putting some extract of time (day number perhaps) as a
leading
Are you storing profiles where the key is the user (or something) id and the
data is essen
I used ycsb to put data and threw exception.
Who can give me some suggestion?
Hbase Code:
// Cut the cache so that we only get the part that could contain
// regions that match our key
SoftValueSortedMap matchingRegions =
tableLocations.headMap(row);
//
Hi All,
In my company currently we are working hard on deployment our cluster with
HBase.
We talking of ~20 nodes to hold pretty big data (~1TB per day).
As there is a lot of data, we need a retention method, i.e. a way to remove
old data.
The problem is that I can't/want to do it using TTL caus
Hi there,
I ran in the same issue in a pseudo distributed setting and a custom
HBase config location.
It seems to be the same issue as in HBASE-3578 [1] and in another
thread here on the mailing list [2]. I quickly fixed it by calling
HBaseConfiguration.addHbaseResources(this.getConf()) to the
Load
34 matches
Mail list logo