Re: Data Processing in hbase

2009-07-21 Thread Amandeep Khurana
HBase is meant to store large tables. The intention is to store data in a way thats more scalable as compared to traditional database systems. Now, HBase is built over Hadoop and has the option of being used as the data store for MR jobs. However, thats not the only purpose. In all data storage sy

Re: Data Processing in hbase

2009-07-21 Thread bharath vissapragada
That means .. it is not very useful to write java codes (using API) .. because any way it is not using the real power of hadoop(distributed processing) instead it has the overhead of fetching data from other machines right? On Wed, Jul 22, 2009 at 12:12 PM, Amandeep Khurana wrote: > Yes.. Only

Re: Data Processing in hbase

2009-07-21 Thread Amandeep Khurana
Yes.. Only if you use MR. If you are writing your own code, it'll pull the records to the place where you run the code. On Tue, Jul 21, 2009 at 11:39 PM, Fernando Padilla wrote: > That is if you use Hadoop MapReduce right? Not if you simply access HBase > through a standard api (like java)? > > >

Re: Data Processing in hbase

2009-07-21 Thread Fernando Padilla
That is if you use Hadoop MapReduce right? Not if you simply access HBase through a standard api (like java)? On 7/21/09 9:49 PM, Amandeep Khurana wrote: Bharath, The processing is done as local to the RS as possible. The first attempt is at doing it local on the same node. If thats not possi

Re: Data Processing in hbase

2009-07-21 Thread Amandeep Khurana
Bharath, The processing is done as local to the RS as possible. The first attempt is at doing it local on the same node. If thats not possible, its done on the same rack. -ak On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada < bhara...@students.iiit.ac.in> wrote: > Hi all, > > I have one s

Data Processing in hbase

2009-07-21 Thread bharath vissapragada
Hi all, I have one simple doubt in hbase , Suppose i use a scanner to iterate through all the rows in the hbase and process the data in the table corresponding to those rows .Is the processing of that data done locally on the region server in which that particular region is located or is it trans

secondary indexes

2009-07-21 Thread Fernando Padilla
Hi. I have a very good use for a secondary index.. but I'm trying to find out some information ( pros, cons, performance, anything ), about them to validate my need for them. :) But doing a google search I can only come across two blog posts, that show the how-to of using it, but do not have

Re: TableMap.initJob() function

2009-07-21 Thread bharath vissapragada
In this case iam thinking as giving input to the map class as one table and using hbase API to extract rows in other table ... but i have one doubt in this ... Can a single Map instance emit more than 1 (Key,value) pairs ? Is the following situation feasible (Inputs to the map are table1 rowkey

Re: TableMap.initJob() function

2009-07-21 Thread stack
In the past, not knowing any better, I've made dummy imputs. Make a file of as many lines as you'd like mappers, use TextInputFormat and just ignore the input or trigger what the particular mapper does off passed input. St.Ack On Tue, Jul 21, 2009 at 8:13 PM, bharath vissapragada < bharathvissa

Re: TableMap.initJob() function

2009-07-21 Thread bharath vissapragada
By saying "using API in your job to pull from multiple table" do you mean that in the Map phase itself we use HBase API to fetch table rows ? If yes then what should we give as input to the map function , can we leave it as a blank? On Wed, Jul 22, 2009 at 12:43 AM, Jonathan Gray wrote: > Curren

org.apache.hadoop.hbase.client.RetriesExhaustedException

2009-07-21 Thread stchu
Hi, Recently I try to import HDFS text files into HBase. The map function read each line (record) in the files and calculate the index of this record. The map output is set as: = . TableReduce is used for reduce function which combine the record in value.iterator into a String with "," splits and

Re: zookeeper problem starting hbase 0.20.0 trunk

2009-07-21 Thread Ken Weiner
Got the answer to my problem. I had to enable UDP traffic on port 2888 for the Amazon security group for which I'm running my HBase instances. Everything started working once I did that. Thanks to Jon Gray and others in the IRC channel for helping. On Tue, Jul 21, 2009 at 5:57 PM, Ken Weiner wr

zookeeper problem starting hbase 0.20.0 trunk

2009-07-21 Thread Ken Weiner
I am trying to start HBase in distributed mode on EC2. HBase is not starting properly and it seems to be caused by Zookeeper not being able to elect a leader. We are using an HBase-managed Zookeeper. There is one ZK on each of 3 regionservers. I didn't change any default ports. The following erro

Re: IndexedTable and Delete

2009-07-21 Thread Andrew McCall
Cool will do. Andrew On 21 Jul 2009, at 22:13, Clint Morgan wrote: Yeah, you've basically got it right. Its a bug. Please open a JIRA (and perhaps take a stab at a patch). Its low on my priority list as we mostly just do updates or delete whole rows.. -clint On Tue, Jul 21, 2009 at 1:0

Re: IndexedTable and Delete

2009-07-21 Thread Clint Morgan
Yeah, you've basically got it right. Its a bug. Please open a JIRA (and perhaps take a stab at a patch). Its low on my priority list as we mostly just do updates or delete whole rows.. -clint On Tue, Jul 21, 2009 at 1:04 PM, Andrew McCall wrote: > Hi, > > I've been using the IndexedTable stuf

IndexedTable and Delete

2009-07-21 Thread Andrew McCall
Hi, I've been using the IndexedTable stuff from contrib and come across a bit of an issue. When I delete a column my indexes are removed for that column. I've run through the code in IndexedRegion and used very similar code in my own classes to recreate the index after I've run the delete

Re: TableMap.initJob() function

2009-07-21 Thread Jonathan Gray
Currently, there is not. You would have multiple MR jobs, or you would directly use the API in your job to pull from multiple tables. I suppose it would be feasible, but as it is now, you are not told which table your Result comes from. bharath vissapragada wrote: Hi all, Generally Tabl

TableMap.initJob() function

2009-07-21 Thread bharath vissapragada
Hi all, Generally TableMap.initJob() method takes a "table name" as input to the map while using map-reduce in HBase . Is there a way so that i can use more than 1 table , i.e., input to the map contains more than 1 table , Thanks

Re: Fail to read properties from zoo.cfg

2009-07-21 Thread y_823910
Hi, Because I submit my code by GridGain, and run by GridGain. GridGain has its own classpath, I've put zoo.cfg to GridGain configure path. That's work. Fleming

Re: Zookeeper config in 0.20.0

2009-07-21 Thread tim robertson
Hi Stack, Just FYI - when I change to only 1 zookeeper IP, I get the stacktrace as above, and then the master will not shut down. Cheers Tim On Mon, Jul 20, 2009 at 8:12 PM, tim robertson wrote: > Hi Stack, > > Thanks, I now have it loading in using mapreduce on my (cough cough) > cluster of

Annotating all rows

2009-07-21 Thread tim robertson
Hi all, I plan to write a mapreduce job that will use HBase as a source and annotate each record (e.g. add a column to each record) I think Stack said I might run in to issues doing this (region splits?) but this was a while ago. - is it correct I should take care with this? - even if I write to

Fail to read properties from zoo.cfg

2009-07-21 Thread y_823910
Fail to read properties from zoo.cfg java.io.IOException: zoo.cfg not found I put zoo.cfg in my path /HaDoop/hbase/conf and set my CLASSPATH to /HaDoop/hbase/conf in a linux PC without install Hadoop and Hbase. While I run following codes, it occurred error mesages about "Fail to read properties