Re: Observer/Observable MapReduce

2011-03-25 Thread Harsh J
Instead of using a table, how about using the available ZooKeeper service itself? They can hold small bits of information pretty well themselves. On Sat, Mar 26, 2011 at 12:29 AM, Vishal Kapoor wrote: > David, > how about waking up my second map reduce job as soon as I see some > rows updated in

Re: zookeeper-3.3.2 has default maxClientCnxns set to 10?

2011-03-25 Thread Todd Lipcon
I came upon this independently today, actually. Filed ZOOKEEPER-1030 On Fri, Mar 25, 2011 at 3:21 PM, Alex Baranau wrote: > Right, from the same host (same ip). But in HBase I think the default max > number of connections is set to 30. Please correct me if I'm wrong. If I'm > right, then we shoul

Re: zookeeper-3.3.2 has default maxClientCnxns set to 10?

2011-03-25 Thread Alex Baranau
Right, from the same host (same ip). But in HBase I think the default max number of connections is set to 30. Please correct me if I'm wrong. If I'm right, then we should probably change either of the defaults. No? Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hado

Re: Observer/Observable MapReduce

2011-03-25 Thread Andrey Stepachev
Look at http://yahoo.github.com/oozie/. May be it will helps you. 2011/3/25 Vishal Kapoor > Can someone give me a direction on how to start a map reduce based on > an outcome of another map reduce? ( nothing common between them apart > from the first decides about the scope of the second. > > I

Re: zookeeper-3.3.2 has default maxClientCnxns set to 10?

2011-03-25 Thread Alex Baranau
I see what you are asking. I'm using stand-alone Zookeeper, not "internal" one of HBase. So it reads configuration only form zoo.cfg. And it seems that by default (when maxClientCnxns is absent in it) it acts like maxClientCnxns=10. I'd expect it to be unlimited when this property is omitted. At le

RE: Observer/Observable MapReduce

2011-03-25 Thread Doug Meil
The simplest way to do this is with a thread that executes the jobs you want to run synchronously Job job1 = ... job1.waitForCompletion(true); Job job2 = ... job2.waitForCompletion(true); -Original Message- From: Vishal Kapoor [mailto:vishal.kapoor.

Re: Zookeeper connection error on mapreduce HBase writes

2011-03-25 Thread Jonathan Bender
I actually created another Configuration object (cfg) within the map() method itself, so it still worked. Now I have a much better idea of how the Mapper is called. Moving the HTable object configuration to the setup() method was the right call. Thanks! On Fri, Mar 25, 2011 at 12:01 PM, Buttler

RE: How could I re-calculate every entries in hbase efficiently through mapreduce?

2011-03-25 Thread Michael Segel
Well there goes my weekend. :-P > From: buttl...@llnl.gov > To: user@hbase.apache.org > Date: Fri, 25 Mar 2011 10:00:26 -0700 > Subject: RE: How could I re-calculate every entries in hbase efficiently > through mapreduce? > > I would certainly find it use

Re: zookeeper-3.3.2 has default maxClientCnxns set to 10?

2011-03-25 Thread Stack
On Fri, Mar 25, 2011 at 12:36 PM, Alex Baranau wrote: > As far as I know HBase configured to initiate up to 30 connections by > default, and maxClientCnxns for Zookeeper was meant to be 30 as well. Yes I'm not sure how it'd go from 30 to 10 (Is 10 the default connections for zk?). Is it possibl

zookeeper-3.3.2 has default maxClientCnxns set to 10?

2011-03-25 Thread Alex Baranau
Hello, I've set up a test HBase+Hadoop cluster yesterday and got the following error in logs during running MR job (which internally creates HTable for Reducer): KeeperErrorCode = ConnectionLoss for /hbase Then I went to Zookeeper logs and found this: 2011-03-24 22:41:49,884 - WARN [NIOServerC

Re: Region server crashes when using replication

2011-03-25 Thread Eran Kutner
Thanks, J-D, that managed to solve a part of the problem. The servers have stopped crashing and the master now properly detects when a RS goes down, by the way, since the RS does detect this it may be a good idea to stop the server on this event which is a significant configuration issue. However n

Re: Drop a inconsistent table.

2011-03-25 Thread Stack
What version of hbase? How many regions? Can you get a list? (Scan .META.) You need to close the regions out on each regionserver, remove them from .META. then remove the table from the filesystem. The first step can be tricky. If only a few regions, you could try doing each one in turn sendi

RE: Zookeeper connection error on mapreduce HBase writes

2011-03-25 Thread Buttler, David
I would suggest that you have each mapper have its own HTable, rather than having a static HTable in the outer class. Configure it from the setup method of the mapper. Hmm..., I am not exactly sure how the configuration from your HTable is passed to the mapper in the first place. You are conf

Re: Observer/Observable MapReduce

2011-03-25 Thread Vishal Kapoor
David, how about waking up my second map reduce job as soon as I see some rows updated in that table. any thoughts on observing a column update? thanks, Vishal On Fri, Mar 25, 2011 at 2:56 PM, Buttler, David wrote: > What about just storing some metadata in a special table? > Then on you second

RE: Observer/Observable MapReduce

2011-03-25 Thread Buttler, David
What about just storing some metadata in a special table? Then on you second job startup you can read that meta data and set your scan /input splits appropriately? Dave -Original Message- From: Vishal Kapoor [mailto:vishal.kapoor...@gmail.com] Sent: Friday, March 25, 2011 11:21 AM To: us

Re: How could I re-calculate every entries in hbase efficiently through mapreduce?

2011-03-25 Thread Stack
On Thu, Mar 24, 2011 at 7:36 PM, Stanley Xu wrote: > But I have two doubts here: > 1. It looks the partitioner will do a lots of shuffling, I am wondering why > it couldn't just do the put on the local region since the read and write on > the same entry should be on the same region, isn't it? > T

Observer/Observable MapReduce

2011-03-25 Thread Vishal Kapoor
Can someone give me a direction on how to start a map reduce based on an outcome of another map reduce? ( nothing common between them apart from the first decides about the scope of the second. I might also want to set the scope of my second map reduce (from/after) my first map reduce(scope as in

Zookeeper connection error on mapreduce HBase writes

2011-03-25 Thread Jonathan Bender
Hello all, I wrote a routine that scans an HBase table, and writes to another table from within the map function using HTable.put(). When I run the job, it works fine for the first few rows but ZooKeeper starts having issues opening up a connection after a while. Am I just overloading the ZK ser

Re: Stargate+hbase

2011-03-25 Thread Weishung Chung
+1 Thank you David for the great explanation. It's complicated. I am pretty new to this BigData space and found it really interesting and always want to learn more about it. I will definitely look into OpenTSDB as suggested. Thanks again :D On Fri, Mar 25, 2011 at 12:18 PM, Buttler, David wrote:

RE: Stargate+hbase

2011-03-25 Thread Buttler, David
Hmmm maybe my mental model is deficient. How do you propose building a secondary index without a transaction? The reason indexes work is that they store the data in a different way than the primary table. That implies a second, independent data storage. Without a transaction you can't be

Re: Stargate+hbase

2011-03-25 Thread Stack
Ugh. Redo. I added pointer to David Butler's response above as an intro to secondary indexing issues in hbase. St.Ack On Fri, Mar 25, 2011 at 10:09 AM, Stack wrote: > I added pointer to below into our book as 'intro to secondary indexing > in hbase'. > St.Ack > > On Fri, Mar 25, 2011 at 8:39 AM,

Re: Stargate+hbase

2011-03-25 Thread Stack
I added pointer to below into our book as 'intro to secondary indexing in hbase'. St.Ack On Fri, Mar 25, 2011 at 8:39 AM, Buttler, David wrote: > Do you know what it means to make secondary indexing a feature?  There are > two reasonable outcomes: > 1) adding ACID semantics (and thus killing sca

RE: How could I re-calculate every entries in hbase efficiently through mapreduce?

2011-03-25 Thread Buttler, David
I would certainly find it useful if you wrote such a blog post. Dave -Original Message- From: Michael Segel [mailto:michael_se...@hotmail.com] Sent: Friday, March 25, 2011 8:55 AM To: user@hbase.apache.org Subject: RE: How could I re-calculate every entries in hbase efficiently through m

Re: Stargate+hbase

2011-03-25 Thread Weishung Chung
Thank you so much for the informative info. It really helps me out. For secondary index, even without transaction, I would think one could still build a secondary index on another key especially if we have row level locking. Correct me if I am wrong. Also, I have read about clustered B-Tree used

RE: How could I re-calculate every entries in hbase efficiently through mapreduce?

2011-03-25 Thread Michael Segel
"During inserts into the table, there was one field that was populated from hand-crafted HTML that should only have a small range of values (e.g. a primary color). We wanted to keep a log of all of the unique values that were found here, and so the values were the map job output and then sorte

RE: How could I re-calculate every entries in hbase efficiently through mapreduce?

2011-03-25 Thread Buttler, David
We ran across a use-case this week. During inserts into the table, there was one field that was populated from hand-crafted HTML that should only have a small range of values (e.g. a primary color). We wanted to keep a log of all of the unique values that were found here, and so the values wer

RE: Stargate+hbase

2011-03-25 Thread Buttler, David
Do you know what it means to make secondary indexing a feature? There are two reasonable outcomes: 1) adding ACID semantics (and thus killing scalability) 2) allowing the secondary index to be out of date (leading to every naïve user claiming that there is a serious bug that must be fixed). Sec

Re: Query Regarding Design Strategy behind Abortable.

2011-03-25 Thread Stack
On Fri, Mar 25, 2011 at 1:56 AM, Mohit wrote: > Why not reconnect back to the zookeeper(at least try once and then abort, if > unsuccessful) and resetting trackers/watchers instead of aborting/killing > HMaster/HRegionServers just like it is done in one of the implementation of > abort able named

RE: How could I re-calculate every entries in hbase efficiently through mapreduce?

2011-03-25 Thread Michael Segel
Yeah... Uhm I don't know of many use cases where you would want or need a reducer step when dealing with HBase. I'm sure one may exist, but from past practical experience... you shouldn't need one. > From: buttl...@llnl.gov > To: user@hbase.apache.org

Drop a inconsistent table.

2011-03-25 Thread Vivek Krishna
The table is in inconsistent state. Reason being it was not able to locate a few regions. When I disable this table using hbase shell, the master log says RetriesException and is in the process of transition. This takes a lot of time. Is it possible to force drop this table? Or rather what are

RE: How could I re-calculate every entries in hbase efficiently through mapreduce?

2011-03-25 Thread Buttler, David
There is no reason to use a reducer in this scenario. I frequently do map-only update jobs. Skipping the reduce step saves a lot of unnecessary work. Dave -Original Message- From: Stanley Xu [mailto:wenhao...@gmail.com] Sent: Thursday, March 24, 2011 7:37 PM To: user@hbase.apache.org S

BerlinBuzzwords 2011 Early Bird Ticket Period ends on April 7th.

2011-03-25 Thread Isabel Drost
Hey folks, just a short notice for those who haven't noticed we have only a limited amount of Early-Bird tickets left and the Early-Bird period is ends on April 7th. If you want to get one of the 30 remaining tickets go and get one now here: http://berlinbuzzwords.de/content/tickets While we are

Query Regarding Design Strategy behind Abortable.

2011-03-25 Thread Mohit
Hello Users/Authors Well we've observed in our cluster , that HMaster went down due to watched event triggered from zookeeper, of type session expired. Why not reconnect back to the zookeeper(at least try once and then abort, if unsuccessful) and resetting trackers/watchers instead of abort

Re: Stargate+hbase

2011-03-25 Thread Wei Shung Chung
I need to use secondary indexing too, hopefully this important feature will be made available soon :) Sent from my iPhone On Mar 25, 2011, at 12:48 AM, Stack wrote: There is no native support for secondary indices in HBase (currently). You will have to manage it yourself. St.Ack On Thu, Ma