Re: Getting Error in Hadoop Cluster Mode

2011-04-04 Thread elton sky
Hi, Have you formatted your name node and data nodes before start daemons? If not, that is the reason. Try: hadoop namenode -format then on each data node: hadoop datanode -format Then try to start daemons. -elton On Tue, Apr 5, 2011 at 4:04 PM, prasunb wrote: > > Hello, > > I am new in Hado

Getting Error in Hadoop Cluster Mode

2011-04-04 Thread prasunb
Hello, I am new in Hadoop and I am struggling to configure it in fully distribution mode. I have created three virtual machines (hadoop1, hadoop2 and hadoop3) with Fedora 12 and installed Hadoop in pseudo distributed mode on each of them successfully. I have followed the steps from cloudera site

Re: ZooKeeperConnectionException whenever table is flushed or majorcompacted

2011-04-04 Thread Jean-Daniel Cryans
Nothing comes to mind as to why it would "fix" it, maybe I don't understand what you did instead. BTW I created https://issues.apache.org/jira/browse/HBASE-3734 to track the issue. J-D On Mon, Apr 4, 2011 at 9:54 PM, Hari Sreekumar wrote: > Ah, I didn't notice it was happening for tableExists()

Re: ZooKeeperConnectionException whenever table is flushed or majorcompacted

2011-04-04 Thread Hari Sreekumar
Ah, I didn't notice it was happening for tableExists() in this instance. But it was always happening for flush() and majorCompact() methods earlier so I didn't check the log when copying it. So I thought it might have something to do with these methods. Yes, I see "Too many connections" error in th

Re: HBase design schema

2011-04-04 Thread tsuna
On Mon, Apr 4, 2011 at 3:30 PM, Ted Dunning wrote: > OpenTSDB does an interesting thing where they put a primary key in front of > the date.  This limits some of the hot-spotting on inserts.  Each different > kind of query goes to a different machine as well.  The query balancing > won't be as goo

Compressing values before inserting them

2011-04-04 Thread Jean-Daniel Cryans
Hi users, I just want to share a useful tip when storing very fat values into HBase, we were able to get some of our MR jobs an order of magnitude faster by simply using Java's Deflater and then passing the byte[] to Put (and the equivalent when retrieving the values with Inflator). We also use LZ

Re: HBase design schema

2011-04-04 Thread Ted Dunning
OpenTSDB does an interesting thing where they put a primary key in front of the date. This limits some of the hot-spotting on inserts. Each different kind of query goes to a different machine as well. The query balancing won't be as good as the insert balancing since some queries are much more p

RE: HBase design schema

2011-04-04 Thread Miguel Costa
Thanks for all your help. I will try your solutions. I also saw this link http://static.last.fm/johan/huguk-20090414/fredrik-hypercubes-in-hbase.pdf. I will try OpenTSDB and maybe Zhomg   Miguel -Original Message- From: Peter Haidinyak [mailto:phaidin...@local.com] Sent: segund

question about RS to DN timeouts

2011-04-04 Thread Jack Levin
hbase.client.pause 1000 General client pause value. Used mostly as value to wait before running a retry of a failed get, region lookup, etc. hbase.client.retries.number 10 Maximum retries. Used as maximum for all retryable operations such as fetching of th

Re: Why HTableDescriptor DEFAULT_VERSIONS is 3?

2011-04-04 Thread Joe Pallas
On Apr 4, 2011, at 10:48 AM, tsuna wrote: > On Mon, Apr 4, 2011 at 10:40 AM, Stack wrote: >> Want to make an issue to change it Joe? (As Ryan says, no >> justification that I remember other than that is how its always been). > > Personally I think that 3 is a good reasonable default. Maybe mo

RE: HBase design schema

2011-04-04 Thread Peter Haidinyak
I've done almost the same thing at my work. Since I'm running on a VERY small number of servers (2), I pre-aggregate my data into tables in the format... [-MM-DD]|[Keyword]|[Referrer] for the row key And then for the data column I store the hit count for that referrer. This approach has a

Re: row_counter map reduce job & 0.90.1

2011-04-04 Thread Stack
I'm glad you figured it Venkatesh. St.Ack On Mon, Apr 4, 2011 at 10:57 AM, Venkatesh wrote: > Sorry about this..It was indeed an environment issue..my core-site.xml was > pointing to wrong hadoop > thanks for the tips > > > > > > > > > > > -Original Message- > From: Venkatesh > To: use

Re: HBase strangeness and double deletes of HDFS blocks and writing to closed blocks

2011-04-04 Thread Jean-Daniel Cryans
I would approach this problem by trying to find the common characteristics of the rows that are missing. A common pattern I've see is rows missing at the end of a batch (meaning some issues with flushing the buffers). If the missing rows aren't in sequences, meaning one missing every few other rows

Re: row_counter map reduce job & 0.90.1

2011-04-04 Thread Venkatesh
Sorry about this..It was indeed an environment issue..my core-site.xml was pointing to wrong hadoop thanks for the tips -Original Message- From: Venkatesh To: user@hbase.apache.org Sent: Fri, Apr 1, 2011 4:51 pm Subject: Re: row_counter map reduce job & 0.90.1 Yeah.. I

Re: Why HTableDescriptor DEFAULT_VERSIONS is 3?

2011-04-04 Thread tsuna
On Mon, Apr 4, 2011 at 10:40 AM, Stack wrote: > Want to make an issue to change it Joe?  (As Ryan says, no > justification that I remember other than that is how its always been). Personally I think that 3 is a good reasonable default. Maybe most people don't really need 3 versions, but most of

Re: Why HTableDescriptor DEFAULT_VERSIONS is 3?

2011-04-04 Thread Stack
Want to make an issue to change it Joe? (As Ryan says, no justification that I remember other than that is how its always been). St.Ack On Mon, Apr 4, 2011 at 9:31 AM, Joe Pallas wrote: > > On Apr 3, 2011, at 11:52 PM, Ryan Rawson wrote: > >> because it always has been?  I think the original B

Re: HBase strangeness and double deletes of HDFS blocks and writing to closed blocks

2011-04-04 Thread Chris Tarnas
Hi JD, Sorry for taking a while - I was in traveling. Thank you very much for looking through these. See answers below: On Apr 1, 2011, at 11:19 AM, Jean-Daniel Cryans wrote: > Thanks for taking the time to upload all those logs, I really appreciate it. > > So from the looks of it, only 1 reg

Re: Is HTable threadsafe and cachable?

2011-04-04 Thread tsuna
On Mon, Apr 4, 2011 at 12:45 AM, Ashish Shinde wrote: > We are using hbase to power a web application. The current > implementation of the data access classes maintain a static HTable > instance to read and write. The reason being getting hold of HTable > instance looks costly. > > In this scenari

Re: Is HTable threadsafe and cachable?

2011-04-04 Thread Jean-Daniel Cryans
>From http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html "Instances of HTable passed the same Configuration instance will share connections to servers out on the cluster and to the zookeeper ensemble as well as caches of region locations. This is usually a *good* thing. Thi

Re: ZooKeeperConnectionException whenever table is flushed or majorcompacted

2011-04-04 Thread Jean-Daniel Cryans
As far as I can tell the async nature of those operations has nothing to do with what you see since it's not even able to get a session from ZooKeeper (so it's not even talking to the region servers). If you look at the stack trace: org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBase

Re: HBase design schema

2011-04-04 Thread Ted Dunning
Take a look at OpenTSDB. I think you will be impressed with the speed. Regarding the exponential explosion. Yes. That is a risk in theory. But what happens in practice is that you only create the alternative forms of the file where the simpler key forms are unacceptable due to volume of data.

Re: versions stored in a cell

2011-04-04 Thread Ted Yu
For 2, HBASE-3488 is for Cell Counter. In Vishal's case, 3 years of data is stored for given row key. Issuing 'get' command would not help much. TIMERANGE support has been added in HBASE-3729 Cheers On Sun, Apr 3, 2011 at 11:40 PM, Eric Charles wrote: > 1.- On my side, I could imagine to use t

RE: HBase design schema

2011-04-04 Thread Miguel Costa
Ted thanks for your help. I considered the last option that you mentioned , "pushing one of you r dimension to the key". With that I can have results for that single dimension: For example key: Time+Site+Referrer But if I want now the top Keywords (where top can be any metric) of that Key.

Re: Why HTableDescriptor DEFAULT_VERSIONS is 3?

2011-04-04 Thread Joe Pallas
On Apr 3, 2011, at 11:52 PM, Ryan Rawson wrote: > because it always has been? I think the original BT paper probably > had the number '3' in there somewhere... > > But yes, not too big, not too small. There probably isnt a reasonable > setting here, I'm guessing 1 isnt quite right either. Why

Re: HBase design schema

2011-04-04 Thread Ted Dunning
Miguel, One option is to use the simplest design and use the key you have. Scanning for a particular period of time will give you all the data in that time period which you can reduce in any way that you like. If that becomes too inefficient, a common trick is to build a secondary file that cont

HBase design schema

2011-04-04 Thread Miguel Costa
Hi, I need some help to a schema design on HBase. I have 5 dimensions (Time,Site,Referrer Keyword,Country). My row key is Site+Time. Now I want to answer some questions like what is the top Referrer by Keyword for a site on a Period of Time. Basically I want to cross all the dimension

Stargate does not accept my PUTs

2011-04-04 Thread Eric
I'm trying to use REST to post data to an HBase table. I currently try something like: curl -v -H "Content-Type: text/xml" - T test.txt http://localhost:8080/testtable/testrowkey The contents of test.txt are: test data I'm not sure about this XML: there are no examples that I can find of how

Re: Get values via shell with timeRange

2011-04-04 Thread Eric Charles
https://issues.apache.org/jira/browse/HBASE-3729 Get cells via shell with a time range predicate Tks, - Eric On 4/04/2011 16:09, Ted Yu wrote: Please file a JIRA. On Mon, Apr 4, 2011 at 2:50 AM, Eric Charleswrote: Hi, The shell allows to specify a timestamp to get a value - get 't1', 'r1', {

Re: Get values via shell with timeRange

2011-04-04 Thread Ted Yu
Please file a JIRA. On Mon, Apr 4, 2011 at 2:50 AM, Eric Charles wrote: > Hi, > > The shell allows to specify a timestamp to get a value > - get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1} > > If you don't give the exact timestamp, you get nothing... > > I didn't find a way to a list of values

Re: Is HTable threadsafe and cachable?

2011-04-04 Thread Ashish Shinde
Hi Ryan, Thanks HTablePool fits the bill. Will start using it. I kinda discovered the re-use of Configuration object after zookeeper "too many connections" errors. Although I could not find it documented anywhere. Had to dig into HTable code to figure it out. Thanks and regards, - Ashish On

Re: ZooKeeperConnectionException whenever table is flushed or majorcompacted

2011-04-04 Thread Hari Sreekumar
I have this in the zookeeper logs, which might be helpful: 2011-04-04 21:26:40,356 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /192.168.1 .49:51467 which had sessionid 0x12f20e3713b02ee 2011-04-04 21:26:40,357 WARN org.apache.zookeeper.server.NIOServerCnxn:

ZooKeeperConnectionException whenever table is flushed or majorcompacted

2011-04-04 Thread Hari Sreekumar
Hi, I get this exception when I try to flush META using HbaseAdmin.flush(".META."). I get the same exception when I do major compact: 11/04/04 21:26:31 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hadoopqa2/192.168.1.50:2181, initiating session 11/04/04 21:26:31 WARN org

Get values via shell with timeRange

2011-04-04 Thread Eric Charles
Hi, The shell allows to specify a timestamp to get a value - get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1} If you don't give the exact timestamp, you get nothing... I didn't find a way to a list of values (different versions) via a command such as - get 't1', 'r1', {COLUMN => 'c1', TIMER

Re: Is HTable threadsafe and cachable?

2011-04-04 Thread Ryan Rawson
Hey, HTable instances are not really thread safe at this time. You can cache them, check out HTablePool. But the creation cost of a HTable instance isnt that high, the actual TCP socket creation and management is done at a lower level and all HTable interfaces share these common caches and socke

Is HTable threadsafe and cachable?

2011-04-04 Thread Ashish Shinde
Hi, We are using hbase to power a web application. The current implementation of the data access classes maintain a static HTable instance to read and write. The reason being getting hold of HTable instance looks costly. In this scenario the HTable instances could more or less be perpetually cac

Re: Why HTableDescriptor DEFAULT_VERSIONS is 3?

2011-04-04 Thread Eric Charles
Good to me as "de-facto" standard. People should simply know that they will have 3 versions by default when inserting data in hbase. Tks, - Eric On 4/04/2011 08:52, Ryan Rawson wrote: because it always has been? I think the original BT paper probably had the number '3' in there somewhere...