Hi,
Have you formatted your name node and data nodes before start daemons? If
not, that is the reason.
Try:
hadoop namenode -format
then on each data node:
hadoop datanode -format
Then try to start daemons.
-elton
On Tue, Apr 5, 2011 at 4:04 PM, prasunb wrote:
>
> Hello,
>
> I am new in Hado
Hello,
I am new in Hadoop and I am struggling to configure it in fully distribution
mode.
I have created three virtual machines (hadoop1, hadoop2 and hadoop3) with
Fedora 12 and installed Hadoop in pseudo distributed mode on each of them
successfully. I have followed the steps from cloudera site
Nothing comes to mind as to why it would "fix" it, maybe I don't
understand what you did instead.
BTW I created https://issues.apache.org/jira/browse/HBASE-3734 to
track the issue.
J-D
On Mon, Apr 4, 2011 at 9:54 PM, Hari Sreekumar wrote:
> Ah, I didn't notice it was happening for tableExists()
Ah, I didn't notice it was happening for tableExists() in this instance. But
it was always happening for flush() and majorCompact() methods earlier so I
didn't check the log when copying it. So I thought it might have something
to do with these methods. Yes, I see "Too many connections" error in th
On Mon, Apr 4, 2011 at 3:30 PM, Ted Dunning wrote:
> OpenTSDB does an interesting thing where they put a primary key in front of
> the date. This limits some of the hot-spotting on inserts. Each different
> kind of query goes to a different machine as well. The query balancing
> won't be as goo
Hi users,
I just want to share a useful tip when storing very fat values into
HBase, we were able to get some of our MR jobs an order of magnitude
faster by simply using Java's Deflater and then passing the byte[] to
Put (and the equivalent when retrieving the values with Inflator). We
also use LZ
OpenTSDB does an interesting thing where they put a primary key in front of
the date. This limits some of the hot-spotting on inserts. Each different
kind of query goes to a different machine as well. The query balancing
won't be as good as the insert balancing since some queries are much more
p
Thanks for all your help.
I will try your solutions. I also saw this link
http://static.last.fm/johan/huguk-20090414/fredrik-hypercubes-in-hbase.pdf.
I will try OpenTSDB and maybe Zhomg
Miguel
-Original Message-
From: Peter Haidinyak [mailto:phaidin...@local.com]
Sent: segund
hbase.client.pause
1000
General client pause value. Used mostly as value to wait
before running a retry of a failed get, region lookup, etc.
hbase.client.retries.number
10
Maximum retries. Used as maximum for all retryable
operations such as fetching of th
On Apr 4, 2011, at 10:48 AM, tsuna wrote:
> On Mon, Apr 4, 2011 at 10:40 AM, Stack wrote:
>> Want to make an issue to change it Joe? (As Ryan says, no
>> justification that I remember other than that is how its always been).
>
> Personally I think that 3 is a good reasonable default. Maybe mo
I've done almost the same thing at my work. Since I'm running on a VERY small
number of servers (2), I pre-aggregate my data into tables in the format...
[-MM-DD]|[Keyword]|[Referrer] for the row key
And then for the data column I store the hit count for that referrer. This
approach has a
I'm glad you figured it Venkatesh. St.Ack
On Mon, Apr 4, 2011 at 10:57 AM, Venkatesh wrote:
> Sorry about this..It was indeed an environment issue..my core-site.xml was
> pointing to wrong hadoop
> thanks for the tips
>
>
>
>
>
>
>
>
>
>
> -Original Message-
> From: Venkatesh
> To: use
I would approach this problem by trying to find the common
characteristics of the rows that are missing. A common pattern I've
see is rows missing at the end of a batch (meaning some issues with
flushing the buffers). If the missing rows aren't in sequences,
meaning one missing every few other rows
Sorry about this..It was indeed an environment issue..my core-site.xml was
pointing to wrong hadoop
thanks for the tips
-Original Message-
From: Venkatesh
To: user@hbase.apache.org
Sent: Fri, Apr 1, 2011 4:51 pm
Subject: Re: row_counter map reduce job & 0.90.1
Yeah.. I
On Mon, Apr 4, 2011 at 10:40 AM, Stack wrote:
> Want to make an issue to change it Joe? (As Ryan says, no
> justification that I remember other than that is how its always been).
Personally I think that 3 is a good reasonable default. Maybe most
people don't really need 3 versions, but most of
Want to make an issue to change it Joe? (As Ryan says, no
justification that I remember other than that is how its always been).
St.Ack
On Mon, Apr 4, 2011 at 9:31 AM, Joe Pallas wrote:
>
> On Apr 3, 2011, at 11:52 PM, Ryan Rawson wrote:
>
>> because it always has been? I think the original B
Hi JD,
Sorry for taking a while - I was in traveling. Thank you very much for looking
through these.
See answers below:
On Apr 1, 2011, at 11:19 AM, Jean-Daniel Cryans wrote:
> Thanks for taking the time to upload all those logs, I really appreciate it.
>
> So from the looks of it, only 1 reg
On Mon, Apr 4, 2011 at 12:45 AM, Ashish Shinde wrote:
> We are using hbase to power a web application. The current
> implementation of the data access classes maintain a static HTable
> instance to read and write. The reason being getting hold of HTable
> instance looks costly.
>
> In this scenari
>From http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html
"Instances of HTable passed the same Configuration instance will share
connections to servers out on the cluster and to the zookeeper
ensemble as well as caches of region locations. This is usually a
*good* thing. Thi
As far as I can tell the async nature of those operations has nothing
to do with what you see since it's not even able to get a session from
ZooKeeper (so it's not even talking to the region servers). If you
look at the stack trace:
org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBase
Take a look at OpenTSDB.
I think you will be impressed with the speed.
Regarding the exponential explosion. Yes. That is a risk in theory. But
what happens in practice is that you only create the alternative forms of
the file where the simpler key forms are unacceptable due to volume of data.
For 2, HBASE-3488 is for Cell Counter.
In Vishal's case, 3 years of data is stored for given row key. Issuing 'get'
command would not help much.
TIMERANGE support has been added in HBASE-3729
Cheers
On Sun, Apr 3, 2011 at 11:40 PM, Eric Charles wrote:
> 1.- On my side, I could imagine to use t
Ted thanks for your help.
I considered the last option that you mentioned , "pushing one of you r
dimension to the key".
With that I can have results for that single dimension: For example key:
Time+Site+Referrer
But if I want now the top Keywords (where top can be any metric) of that
Key.
On Apr 3, 2011, at 11:52 PM, Ryan Rawson wrote:
> because it always has been? I think the original BT paper probably
> had the number '3' in there somewhere...
>
> But yes, not too big, not too small. There probably isnt a reasonable
> setting here, I'm guessing 1 isnt quite right either.
Why
Miguel,
One option is to use the simplest design and use the key you have. Scanning
for a particular period of time will give you all the data in that time
period which you can reduce in any way that you like.
If that becomes too inefficient, a common trick is to build a secondary file
that cont
Hi,
I need some help to a schema design on HBase.
I have 5 dimensions (Time,Site,Referrer Keyword,Country).
My row key is Site+Time.
Now I want to answer some questions like what is the top Referrer by Keyword
for a site on a Period of Time.
Basically I want to cross all the dimension
I'm trying to use REST to post data to an HBase table. I currently try
something like:
curl -v -H "Content-Type: text/xml" - T test.txt
http://localhost:8080/testtable/testrowkey
The contents of test.txt are:
test data
I'm not sure about this XML: there are no examples that I can find of how
https://issues.apache.org/jira/browse/HBASE-3729
Get cells via shell with a time range predicate
Tks,
- Eric
On 4/04/2011 16:09, Ted Yu wrote:
Please file a JIRA.
On Mon, Apr 4, 2011 at 2:50 AM, Eric Charleswrote:
Hi,
The shell allows to specify a timestamp to get a value
- get 't1', 'r1', {
Please file a JIRA.
On Mon, Apr 4, 2011 at 2:50 AM, Eric Charles wrote:
> Hi,
>
> The shell allows to specify a timestamp to get a value
> - get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
>
> If you don't give the exact timestamp, you get nothing...
>
> I didn't find a way to a list of values
Hi Ryan,
Thanks HTablePool fits the bill. Will start using it.
I kinda discovered the re-use of Configuration object after zookeeper
"too many connections" errors. Although I could not find it documented
anywhere. Had to dig into HTable code to figure it out.
Thanks and regards,
- Ashish
On
I have this in the zookeeper logs, which might be helpful:
2011-04-04 21:26:40,356 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /192.168.1
.49:51467 which had sessionid 0x12f20e3713b02ee
2011-04-04 21:26:40,357 WARN org.apache.zookeeper.server.NIOServerCnxn:
Hi,
I get this exception when I try to flush META using
HbaseAdmin.flush(".META."). I get the same exception when I do major
compact:
11/04/04 21:26:31 INFO org.apache.zookeeper.ClientCnxn: Socket connection
established to hadoopqa2/192.168.1.50:2181, initiating session
11/04/04 21:26:31 WARN org
Hi,
The shell allows to specify a timestamp to get a value
- get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
If you don't give the exact timestamp, you get nothing...
I didn't find a way to a list of values (different versions) via a
command such as
- get 't1', 'r1', {COLUMN => 'c1', TIMER
Hey,
HTable instances are not really thread safe at this time. You can
cache them, check out HTablePool. But the creation cost of a HTable
instance isnt that high, the actual TCP socket creation and management
is done at a lower level and all HTable interfaces share these common
caches and socke
Hi,
We are using hbase to power a web application. The current
implementation of the data access classes maintain a static HTable
instance to read and write. The reason being getting hold of HTable
instance looks costly.
In this scenario the HTable instances could more or less be perpetually
cac
Good to me as "de-facto" standard.
People should simply know that they will have 3 versions by default when
inserting data in hbase.
Tks,
- Eric
On 4/04/2011 08:52, Ryan Rawson wrote:
because it always has been? I think the original BT paper probably
had the number '3' in there somewhere...
36 matches
Mail list logo