RE: Nosqls schema design

2012-11-08 Thread Pamecha, Abhishek
Nick Short answer: It all depends on your key-design overlap with the use cases you want to address. If (all of) your use cases map very closely to your key design you're in good hands otherwise some tricks are warranted like more tables with duplicated data, pre-computations through M/R job

queries and MR jobs

2013-02-15 Thread Pamecha, Abhishek
Hi Is there a way to partition HDFS [replication factor, say 3]] or route requests to specific RS nodes so that One set of nodes serve operations like put and get etc. Other set of nodes do MR on the same replicated data set And those two sets don't share the same nodes? I mean, If we are repl

HBase Put

2012-08-21 Thread Pamecha, Abhishek
Hi I had a question on Hbase Put call. In the scenario, where data is inserted without any order to column qualifiers, how does Hbase maintain sortedness wrt column qualifiers in its store files/blocks? I checked the code base and I can see checks

RE: HBase Put

2012-08-21 Thread Pamecha, Abhishek
, changes are batched in memory first. -- Lars From: "Pamecha, Abhishek" To: "user@hbase.apache.org" Sent: Tuesday, August 21, 2012 4:00 PM Subject: HBase Put Hi I had a  question on Hbase Put call. In the scenario, where data is i

RE: HBase Put

2012-08-22 Thread Pamecha, Abhishek
anding. Thanks, Abhishek -Original Message- From: lars hofhansl [mailto:lhofha...@yahoo.com] Sent: Tuesday, August 21, 2012 5:55 PM To: user@hbase.apache.org Subject: Re: HBase Put That is correct. From: "Pamecha, Abhishek" To: "user@hba

RE: Using HBase serving to replace memcached

2012-08-22 Thread Pamecha, Abhishek
Great explanation. May be diverging from the thread's original question, but could you also care to explain the difference if any, in searching for a rowkey [ that you mentioned below ] Vs searching for a specific column qualifier. Are there any optimizations for column qualifier search too or

RE: HBase Put

2012-08-22 Thread Pamecha, Abhishek
bject: Re: HBase Put On Wed, Aug 22, 2012 at 10:20 AM, Pamecha, Abhishek wrote: > So then a GET query means one needs to look in every HFile where key > falls within the min/max range of the file. > > From another parallel thread, I gather, HFile comprise of blocks > which, I thin

RE: HBase Put

2012-08-22 Thread Pamecha, Abhishek
will include qualifiers: http://hbase.apache.org/book.html#schema.bloom -Jason On Wed, Aug 22, 2012 at 1:49 PM, Pamecha, Abhishek wrote: > Can I enable bloom filters per block at column qualifier levels too? > That way, will small block sizes, I can selectively load only few data

Re: Using HBase serving to replace memcached

2012-08-22 Thread Pamecha, Abhishek
Thanks all.. i Sent from my iPad with iMstakes On Aug 22, 2012, at 20:53, "J Mohamed Zahoor" wrote: > If you need to search row and column qualifiers you can pick row+ col bloom > to help you skip blocks. > > ./Zahoor@iPad > > On 22-Aug-2012, at 10:58 PM

Re: client cache for all region server information?

2012-08-22 Thread Pamecha, Abhishek
I think for the refresh case, client first uses the older region server derived from its cache it then connects to that older region server which responds with a failure code. and then client talks to the zookeeper and then the meta node server to find the new region server for that key. The

RE: how client location a region/tablet?

2012-08-23 Thread Pamecha, Abhishek
I too thought there are multiple meta regions where as just one ROOT. May be I am mixing b/w Big Table and Hbase. Thanks, Abhishek -Original Message- From: Lin Ma [mailto:lin...@gmail.com] Sent: Thursday, August 23, 2012 9:41 AM To: user@hbase.apache.org; ha...@cloudera.com Cc: doug.m

limit on number of blocks per HFile and files per region

2012-08-23 Thread Pamecha, Abhishek
Hi I have a few questions on blocks/file and file/region. 1. Can there be multiple row keys per block and then per HFile? Or is a block or Hfile dedicated to a single row key? I have a scenario, where for the same column family, some rowkeys will have very wide rows, say rowkey W, and

Re: hbase many-to-many design

2012-08-23 Thread Pamecha, Abhishek
Hi Jong You can add a new column unannounced. This means your current 'put' does not have to recall which other columns are already present in the row or for that matter, in the table. You just issue a put command as if it was your first one, and the column will be added. Unlike rdbms, There

Re: limit on number of blocks per HFile and files per region

2012-08-23 Thread Pamecha, Abhishek
> J-D > > On Thu, Aug 23, 2012 at 4:21 PM, Pamecha, Abhishek wrote: >> 1. Can there be multiple row keys per block and then per HFile? Or is >> a block or Hfile dedicated to a single row key? > > Multiple row keys per HFile block. Read > http://hbase.apach

sorting by value

2012-08-28 Thread Pamecha, Abhishek
Hi I probably know the usual answer but are there any tricks to do some sort of sort by value in HBase. The only option I know is to somehow embed value in the key part. The value is not a timestamp but a normal number. I want to find out, say, top 10 from a range of columns. The range could be

Re: sorting by value

2012-08-31 Thread Pamecha, Abhishek
; everyone, but it does allow us to do some limited sorting. > > --Tom > > On Thursday, August 30, 2012, Stack wrote: > >> On Tue, Aug 28, 2012 at 4:11 PM, Pamecha, Abhishek >> > >> wrote: >>> Hi >>> >>> I probably know the usual an

Re: sorting by value

2012-08-31 Thread Pamecha, Abhishek
gt; > --Tom > > On Thursday, August 30, 2012, Stack wrote: > >> On Tue, Aug 28, 2012 at 4:11 PM, Pamecha, Abhishek >> > >> wrote: >>> Hi >>> >>> I probably know the usual answer but are there any tricks to do some >> sort of

RE: [Schema] Put or Increment ?

2012-09-25 Thread Pamecha, Abhishek
Hi Shrijeet What's your usecase? That should drive your decision. Put will overwrite in case your userid and ip address is same. Increment would just bump up the counter. -abhishek -Original Message- From: Shrijeet Paliwal [mailto:shrij...@rocketfuel.com] Sent: Tuesday, September

RE: HBase table row key design question.

2012-10-02 Thread Pamecha, Abhishek
For 1. I wouldn't worry about that problem until it really happens. Just my opinion. If you really want to solve it you will need to generate a unique id per row-key 'put' outside of hbase [ say some hash of serverip + timestamp etc ] and append it to the end of your row key. For 2. You can inv

RE: Efficient way to sample from large HBase table.

2012-10-12 Thread Pamecha, Abhishek
Although, I have no idea of your use case, I would be surprised if during sampling you want to stop exactly at the 1M mark. Here is one approach you might use: May be if you store the total count of rows separately say 90M, then you can randomly pick 1 in 90 rows in your MR job doing a global sc

hbase deployment using VMs for data nodes and SAN for data storage

2012-10-15 Thread Pamecha, Abhishek
Hi We are deciding between using local disks for bare metal hosts Vs VMs using SAN for data storage. I was wondering if anyone has contrasted performance, availability and scalability between these two options? IMO, This is kinda similar to a typical AWS or another cloud deployment. Thanks, A

Re: hbase deployment using VMs for data nodes and SAN for data storage

2012-10-15 Thread Pamecha, Abhishek
:46, "lars hofhansl" wrote: If you have a SAN, why would you want to use HBase? -- Lars ________ From: "Pamecha, Abhishek" To: "user@hbase.apache.org" Sent: Monday, October 15, 2012 3:00 PM Subject: hbase deployment using VMs for data no

RE: High IPC Latency

2012-10-18 Thread Pamecha, Abhishek
Is it sustained for the same client hitting the same region server OR does it get better for the same client-RS combination when run for longer duration? Trying to eliminate Zookeeper from this. Thanks, Abhishek From: Yousuf Ahmad [mailto:myahm...@gmail.com] Sent: Thursday, October 18, 2012 11

RE: High IPC Latency

2012-10-19 Thread Pamecha, Abhishek
couple of minutes of the experiment running, it wouldn't need to > re-visit ZooKeeper, I believe. Correct me if I am wrong please. > > Regards, > Yousuf > > > On Thu, Oct 18, 2012 at 2:42 PM, lars hofhansl > wrote: > > > Also, what version of HBase/HDF

RE: scaling a low latency service with HBase

2012-10-19 Thread Pamecha, Abhishek
Here are a few of my thoughts: If possible, you might want to localize your data to a few regions if you can and then may be have exclusive access to those regions. This way, external load will not impact you. I have heard that write penalty of SSDs is quite high. But I think, they will still

RE: How to Manage Data Architecture & Modeling for HBase

2015-04-06 Thread Pamecha, Abhishek
I would stress that if you envision any joins or arbitrary slices and dices at a later point in your application, you might want to either redesign your schema "very carefully" or be ready for more time consuming ( not near real time) answers. We had explored a possible solution on similar line

Re: Zookeeper load balancing hbase requests

2015-06-25 Thread Pamecha, Abhishek
Tunable Consistency of writes On 6/25/15, 6:48 PM, "Bharath Kumar" wrote: >Hi Team, > I have a query , with having an ensemble of zookeeper instances . >Does hbase requests get load balanced as against having a single zookeeper >instance.? > >Apart from zookeeper availability is there any