Re: Row distribution

2012-07-25 Thread Mohit Anchlia
On Wed, Jul 25, 2012 at 6:53 AM, Alex Baranau wrote: > Hi Mohit, > > 1. When talking about particular table: > > For viewing rows distribution you can check out how regions are > distributed. And each region defined by the start/stop key, so depending on > your key format, etc. you can see which r

Re: silently aborted scans when using hbase.client.scanner.max.result.size

2012-07-25 Thread Jean-Daniel Cryans
That looks nasty. Could it be that your client doesn't know about the max result size? Looking at ClientScanner.next() we iterate while this is true: } while (remainingResultSize > 0 && countdown > 0 && nextScanner(countdown, values == null)); Let's say the region server returns less rows than n

Re: HBase tap for cascading

2012-07-25 Thread Andrew Purtell
Hi Pranav, This is a question better posed to the Cascading mailing lists. On Wed, Jul 25, 2012 at 10:21 AM, Pranav Modi wrote: > I have a hbase table where the column names for a column family are not > known in advance; timestamp is part of the column name. for example, a > column name could b

Re: MR hbase export is failing

2012-07-25 Thread Jeff Whiting
Thanks for the replies everyone. We are on an old version of hbase and plan on upgrading pretty soon. Immediately it looks like we have to tell the region server to have a longer timeout rather than telling the client to have a longer timeout. I was hoping to just change a parameter in the MR

Re: Coprocessors vs MapReduce?

2012-07-25 Thread Andrew Purtell
Answers inline below. On Wed, Jul 25, 2012 at 1:09 AM, Bertrand Dechoux wrote: > #1 > > As Andrew pointed out, Cascading is indeed for MapReduce. I know the use > case was discussed, I wanted to know what was the state now. (The blog > entry is from 2010.) The use case is simple. I am doing log a

HBase tap for cascading

2012-07-25 Thread Pranav Modi
Hello, I have a hbase table where the column names for a column family are not known in advance; timestamp is part of the column name. for example, a column name could be - events:1336343168013 My question is - has anyone been able to read such tables with a Cascading tap for hbase? Cascading/m

Re: HBase Replication : Do I need to create table programmatically on Replica Cluster?

2012-07-25 Thread Alok Kumar
Thank You JD. Now I can update my codebase accordingly. -Alok On Wed, Jul 25, 2012 at 8:28 PM, Jean-Daniel Cryans wrote: > On Wed, Jul 25, 2012 at 1:34 AM, Alok Kumar wrote: >> Q 1. Do I need to create table programmatically on Backup cluster every >> time when new tables get created on Produc

Re: HBase Replication : Do I need to create table programmatically on Replica Cluster?

2012-07-25 Thread Jean-Daniel Cryans
On Wed, Jul 25, 2012 at 1:34 AM, Alok Kumar wrote: > Q 1. Do I need to create table programmatically on Backup cluster every > time when new tables get created on Production cluster? Yes. > Can't it be automatically created using WAL? No, we don't replicate .META. edits as it would be a mess.

Re: Index building process design

2012-07-25 Thread Eric Czech
Thank you both for the response! Michael, I'll elaborate on the use case in response to Amandeep's questions but I'm pretty clear on what you mean with regards to using inverted indexes built from a base table. Amandeep, I think I can answer all of your questions with a better explanation of what

Re: Modify rowKey in prePut hook

2012-07-25 Thread Alex Baranau
I don't think you can do this. Other than the fact that you cannot change Put's row key (there's simply no interface for that), consider the following notes: Note-1: row key is used by HBase client (yes, in your app) to find out where (which RegionServer) should handle that put. Note-2: coprocesso

Re: Row distribution

2012-07-25 Thread Alex Baranau
Hi Mohit, 1. When talking about particular table: For viewing rows distribution you can check out how regions are distributed. And each region defined by the start/stop key, so depending on your key format, etc. you can see which records go into each region. You can see the regions distribution i

Re: Issue with hbase

2012-07-25 Thread Kevin O'dell
Irwan, Just to be clear: You don't have the META, ROOT, or any .regioninfo files, but the storefiles are still intact? If so, you should be able to recreate your tables using the correct HBase home and bulk load your data back in? I think that would work, but I will defer to Stack or Michael o

silently aborted scans when using hbase.client.scanner.max.result.size

2012-07-25 Thread Ferdy Galema
I was experiencing aborted scans on certain conditions. In these cases I was simply missing so many rows that only a fraction was inputted, without warning. After lots of testing I was able to pinpoint and reproduce the error when scanning over a single region, single column family, single store fi

FW: {kundera-discuss} Kundera 2.0.7 Released

2012-07-25 Thread Vivek Mishra
From: kundera-disc...@googlegroups.com [kundera-disc...@googlegroups.com] on behalf of Amry [amresh1...@gmail.com] Sent: 25 July 2012 16:41 To: kundera-disc...@googlegroups.com Subject: {kundera-discuss} Kundera 2.0.7 Released Hi All, We are happy to ann

HBase Replication : Do I need to create table programmatically on Replica Cluster?

2012-07-25 Thread Alok Kumar
Hello, I've two Hadoop+HBase cluster setup (production + BackUp in different region) I'm using HBase Replication. Q 1. Do I need to create table programmatically on Backup cluster every time when new tables get created on Production cluster? Can't it be automatically created using WAL? Your hel

Re: host:port problem

2012-07-25 Thread Mohammad Tariq
did it work?? Regards, Mohammad Tariq On Mon, Jul 23, 2012 at 9:09 PM, Mohammad Tariq wrote: > Hi Rajendra, > > If the web service core was written with the zookeeper jar > included in some older Hbase release, and you have now upgraded your > Hbase version, then this could happen. Try

Re: Coprocessors vs MapReduce?

2012-07-25 Thread Bertrand Dechoux
#1 As Andrew pointed out, Cascading is indeed for MapReduce. I know the use case was discussed, I wanted to know what was the state now. (The blog entry is from 2010.) The use case is simple. I am doing log analysis and would like to perform fast aggregations. These aggregations are common (count/