RE: Problems Loading RegionObserver Coprocessor

2012-05-25 Thread Anoop Sam John
Hi Kevin hbase.coprocessor.region.classes com.hbase.example.region.coprocessors.MyCustomRegionObserver Instead of configuring the FQCN of your region observer against "hbase.coprocessor.region.classes", you can go with configuring the same against the config param "hbase.coprocessor.

Re: : question on filters

2012-05-25 Thread Dhaval Shah
Yes instead of a single Get you can supply a list of Get's to the same htable.get call. It will sort and partition the list on a per region basis, make requests in parallel, aggregate the responses and return an array of Result. Make sure you apply your filter to each Get ---

Re: Problems Loading RegionObserver Coprocessor

2012-05-25 Thread Kevin
I didn't think about it like that. I just figured that HBase would block certain components until the cluster is stable. However, like you said, since I configured my Observer as a system coprocessor, your explanation makes sense. I'll have to come up with another example. Thank you for your answer

Re: improve performance of a MapReduce job with HBase input

2012-05-25 Thread Ey-Chih chow
Thanks. This help. Ey-Chih Chow On May 25, 2012, at 11:23 AM, Dave Revell wrote: > Here's what I do: > > Scan scan = new Scan(...) > scan.setCaching(5000); > scan.setWhatever(...); > > TableMapReduceUtil.initTableMapperJob(tablename, scan, mapClass, >mapOutKeyClass, mapOut

Re: Occasional regionserver crashes following socket errors writing to HDFS

2012-05-25 Thread Dave Revell
Have you verified that your nodes are not swapping? This has caused serious issues for many people, including me. Swapping can occur even if you have lots of available memory, for complicated reasons. Best, Dave On Thu, May 24, 2012 at 4:39 PM, Stack wrote: > On Thu, May 24, 2012 at 4:15 AM, E

Re: improve performance of a MapReduce job with HBase input

2012-05-25 Thread Ey-Chih chow
Thanks. This works. Ey-Chih Chow On May 25, 2012, at 11:33 AM, Jean-Daniel Cryans wrote: > TIF should be configured via TableMapReduceUtil.initTableMapperJob > which takes a Scan object. > > J-D > > On Fri, May 25, 2012 at 11:30 AM, Ey-Chih chow wrote: >> Thanks. Since we use TableInputForm

Re: Version issue

2012-05-25 Thread Suraj Varma
>>I added this to hbase-site.xml, and that got hbase started but trying to run >>a program to Put rows throws the above error. This seems to indicate that your program is picking up a different version of hbase jars than your hbase cluster, perhaps? Check your classpath to ensure that the version

Re: : question on filters

2012-05-25 Thread Alok Kumar
Hi, try creating a FilterList and attach it to scan object. A FilterList may contain any number of filters, with MUSTPASS.ALL or MUSTPASS.ONE condition. This way scan.setFilter(yourFilerList) returns you a ResultScanner object where rows are not contiguous. Or you can make use of HTable.batch(.

Re: improve performance of a MapReduce job with HBase input

2012-05-25 Thread Jean-Daniel Cryans
TIF should be configured via TableMapReduceUtil.initTableMapperJob which takes a Scan object. J-D On Fri, May 25, 2012 at 11:30 AM, Ey-Chih chow wrote: > Thanks.  Since we use TableInputFormat in our map/reduce job.  The scan > object is created inside TableInputFormat.  Is there any way to get

Re: Problems Loading RegionObserver Coprocessor

2012-05-25 Thread Andrew Purtell
> When I restart my HBase cluster the cluster does not ever finish assigning > META region. In the master's log there are a lot of > NotServingRegionExceptions: Region is not online: .META.,,1. Other than > that I can't see any log messages that indicate any specific about loading > the coprocessor

Re: improve performance of a MapReduce job with HBase input

2012-05-25 Thread Ey-Chih chow
Thanks. Since we use TableInputFormat in our map/reduce job. The scan object is created inside TableInputFormat. Is there any way to get the scan object to set caching? Ey-Chih Chow On May 25, 2012, at 11:24 AM, Alok Kumar wrote: > Hi, > > you can make use of 'setCaching' method of your sc

Re: question on filters

2012-05-25 Thread Jean-Daniel Cryans
What you need is a secondary index and HBase doesn't have that. For some tips see: http://hbase.apache.org/book.html#secondary.indexes J-D On Thu, May 24, 2012 at 5:06 PM, jack chrispoo wrote: > Hi, > > I'm new to HBase and I have a question about using filters. I know that I > can use filters w

Re: improve performance of a MapReduce job with HBase input

2012-05-25 Thread Alok Kumar
Hi, you can make use of 'setCaching' method of your scan object. Eg: Scan objScan = new Scan(); objScan.setCaching(100); // set it to some integer, as per ur use case. http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setCaching(int) thanks, Alok On Fri, May 25, 2012 at

Re: RefGuide updated

2012-05-25 Thread anil gupta
Hi Doug, Nice work. I went through the bulk loader part. It would be great if you can incorporate a note on loading a file with separator other than tab character. Here is the mailing list discussion regarding the problem: http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/24852 Here is

Re: Of hbase key distribution and query scalability, again.

2012-05-25 Thread Ian Varley
Yeah, I think you're right Dmitriy; there's nothing like that in HBase today as far as I know. If it'd be useful for you, maybe it would be for others, too; work up a rough patch and see what people think on the dev list. Ian On May 25, 2012, at 1:02 PM, Dmitriy Lyubimov wrote: > Thanks, Ian.

Re: improve performance of a MapReduce job with HBase input

2012-05-25 Thread Dave Revell
Here's what I do: Scan scan = new Scan(...) scan.setCaching(5000); scan.setWhatever(...); TableMapReduceUtil.initTableMapperJob(tablename, scan, mapClass, mapOutKeyClass, mapOutValueClass, job); Does that help? -Dave On Fri, May 25, 2012 at 11:03 AM, Ey-Chih chow wrote: >

improve performance of a MapReduce job with HBase input

2012-05-25 Thread Ey-Chih chow
Hi, We have a MapReduce job of which input data is from HBase. We would like to improve performance of the job. According to the HBase book, we can do that by setting scan caching to a number higher than default. We use TableInputFormat to read data from the job. I look at the implementatio

Re: Of hbase key distribution and query scalability, again.

2012-05-25 Thread Dmitriy Lyubimov
Thanks, Ian. I am talking about situation when even when we have uniform keys, the query distribution over them is still non-uniform and impossible to predict without sampling query skewness, but skewness is surprisingly great. (as in least active/most active user may differ in activity 100 times

Re: : question on filters

2012-05-25 Thread jack chrispoo
Thanks Dhaval, and is there a way to get multiple rows (their keys not contiguous) from HBase server with only one request? it seems to me it's expensive to send one get request for each one row. jack On Thu, May 24, 2012 at 5:40 PM, Dhaval Shah wrote: > > Jack, you can use filters on Get's too.

Re: Of hbase key distribution and query scalability, again.

2012-05-25 Thread Ian Varley
Dmitriy, If I understand you right, what you're asking about might be called "Read Hotspotting". For an obvious example, if I distribute my data nicely over the cluster but then say: for (int x = 0; x < 100; x++) { htable.get(new Get(Bytes.toBytes("row1"))); } Then naturally I'm onl

Of hbase key distribution and query scalability, again.

2012-05-25 Thread Dmitriy Lyubimov
Hello, I'd like to collect opinions from HBase experts on the query uniformity and whether there's any advance technique currently exists in HBase to cope with the problems of query uniformity beyond just maintaining the key uniform distribution. I know we start with the statement that in order t

Re: Version issue

2012-05-25 Thread Prashant Kommireddi
I am building of off trunk (0.95-SNAPSHOT), this is not for production. I have updated hbase-site.xml in my conf dir but that does not work. On May 25, 2012, at 8:29 AM, Marcos Ortiz wrote: > Are you using 0.95-SNAPSHOT? > The 0.94 version just released in May 16, so, If you have a production >

Re: A question about HBase MapReduce

2012-05-25 Thread Doug Meil
re: "data from raw data file into hbase table" One approach is bulk loading.. http://hbase.apache.org/book.html#arch.bulk.load If he's talking about using an Hbase table as the source of a MR job, then see this... http://hbase.apache.org/book.html#splitter On 5/25/12 2:35 AM, "Florin P

Re: fast scan VS hot regions

2012-05-25 Thread Simon Kelly
Hi Andre Have a look at HbaseWD from Sematext: https://github.com/sematext/HBaseWD The strategy there is to prefix monotonic row keys by a bin number. This spreads the writes across N bins but still allows efficient scans assuming N is not large (N scans are required). -Simon On May 25, 2012 11:

Re: Version issue

2012-05-25 Thread Marcos Ortiz
Are you using 0.95-SNAPSHOT? The 0.94 version just released in May 16, so, If you have a production environment, you should use a stable version. Anyway, Which specific version are you using? Because, based on this error is telling you that you are using a different version from you hbase-site.

Re: Problems Loading RegionObserver Coprocessor

2012-05-25 Thread Kevin
I don't think my issue stems from clocks being out of sync. All my servers are syncing to a NTP server. Checking the dates on each machine shows that they differ far less than 30 seconds. On Fri, May 25, 2012 at 10:00 AM, shashwat shriparv < dwivedishash...@gmail.com> wrote: > Check this if it so

Enhancing hbase bulk import performance

2012-05-25 Thread sakin cali
Hi all, I have a few question regarding bulk load, some of them may be "novice", sorry for them... I am trying to enhance my bulk loading performance into hbase. Setup: - I have one table with one column family and 10 columns. - 4 pc cluster ( each: i5 2400 cpu, 1tb harddisk, 4 gb ram) - Ubun

Re: Problems Loading RegionObserver Coprocessor

2012-05-25 Thread shashwat shriparv
Check this if it solves your problem : http://helpmetocode.blogspot.in/2012/05/issueif-you-master-machines-region.html On Fri, May 25, 2012 at 7:25 PM, Kevin wrote: > Hi, > > I'm starting to give coprocessors a try, but I'm having trouble getting the > HBase cluster to start up properly after

Problems Loading RegionObserver Coprocessor

2012-05-25 Thread Kevin
Hi, I'm starting to give coprocessors a try, but I'm having trouble getting the HBase cluster to start up properly after deploying the new configuration. My coprocessor is trivial, but it is only to get my feet wet. I override the prePut method to add the row being put into a table into another ta

Re: RefGuide updated

2012-05-25 Thread Mikael Sitruk
Thanks Doug, very instructive. Do you have number to feel the gain using bulk loading On May 24, 2012 4:18 PM, "Doug Meil" wrote: > > > Hi folks- > > The RefGuide was updated in a big way at the Hackathon yesterday. Two things to note: > > http://hbase.apache.org/book.html#arch.bulk.load > > The

fast scan VS hot regions

2012-05-25 Thread Andre Reiter
i'm starting a new project, which is pretty simple it will be something like google analytics, but of course a bit smaller what is required: web servers handle requests with a kind of generic key/value list that requests will come at a pretty much high rate, lets say 1000 req per second so far i