Hi guys,
Supposing I have a table in HBase, how to estimate its storage footprint?
Thanks.
regards,
Lin
;
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[],%20boolean%29
>
> Because the Hbase client talks directly to each RS, it has to know the
> region boundaries.
>
>
>
> From: Lin Ma
> Date: Thursday, September 6, 20
6, 2012 at 12:59 AM, Doug Meil wrote:
>
> Hi there, if you look in the source code for HTable there is a list of Put
> objects. That's the buffer, and it's a client-side buffer.
>
>
>
>
>
> On 9/5/12 12:04 PM, "Lin Ma" wrote:
>
> >Thank
There are use cases, where either one is easier to use to achieve things.
>
> Cheers
>
> Julian
>
> 2012/9/2 Lin Ma
>
> > Hello HBase masters,
> >
> > For the two add methods of Put class,
> >
> >
> >
> http://hbase.apache.org/apidocs/or
Sun, Sep 2, 2012 at 2:13 AM, Lin Ma wrote:
> > Hello guys,
> >
> > I am reading the book "HBase, the definitive guide", at the beginning of
> > chapter 3, it is mentioned in order to reduce performance impact for
> > clients to update the same row (lock
see
> https://github.com/sematext/HBaseHUT
>
>
>
> ----------
> Lin Ma schrieb am So., 2. Sep 2012 11:13 MESZ:
>
> >Hello guys,
> >
> >I am reading the book "HBase, the definitive guide", at the beginning of
> >chapter 3, it i
Hello HBase masters,
For the two add methods of Put class,
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html#add%28org.apache.hadoop.hbase.KeyValue%29
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html#add%28byte[],%20byte[],%20long,%20byte[]%29
I thin
Hello guys,
I am reading the book "HBase, the definitive guide", at the beginning of
chapter 3, it is mentioned in order to reduce performance impact for
clients to update the same row (lock contention issues for automatic
write), batch update is preferred. My questions is, for MR job, what are
th
am wrong.
regards,
Lin
On Tue, Aug 28, 2012 at 2:41 PM, Harsh J wrote:
> Lin,
>
> On Tue, Aug 28, 2012 at 9:09 AM, Lin Ma wrote:
> > Thanks Harsh,
> >
> > A two more comments / thoughts,
> >
> > 1. For mapper: mapper normally runs on the same regional ser
nt to the RS. Whatever is accumulated
> as the result of the Scan operation (server-side) is accumulated in
> sizes of 500 rows and returned in one Scanner.next() call from the
> client.
>
> Does this clear it up Lin?
>
> On Mon, Aug 27, 2012 at 8:40 PM, Lin Ma wrote:
> &g
this value to 500,
for example, will transfer 500 rows at a time to the client to be
processed."*
regards,
Lin
On Thu, Aug 23, 2012 at 11:37 PM, Harsh J wrote:
> Hi Lin,
>
> On Thu, Aug 23, 2012 at 7:56 PM, Lin Ma wrote:
> > Harsh, thanks for the detailed information.
le and Hbase.
>
> Thanks,
> Abhishek
>
>
> -Original Message-
> From: Lin Ma [mailto:lin...@gmail.com]
> Sent: Thursday, August 23, 2012 9:41 AM
> To: user@hbase.apache.org; ha...@cloudera.com
> Cc: doug.m...@explorysmedical.com
> Subject: Re: how client loca
utilizing one room for living before having more children. :-)
regards,
Lin
On Fri, Aug 24, 2012 at 12:46 AM, Harsh J wrote:
> Lin,
>
> On Thu, Aug 23, 2012 at 10:10 PM, Lin Ma wrote:
> > Thanks, Harsh!
> >
> > - "HBase currently keeps a single META region (Doesn'
regards,
Lin
On Thu, Aug 23, 2012 at 11:48 PM, Harsh J wrote:
> HBase currently keeps a single META region (Doesn't split it). ROOT
> holds META region location, and META has a few rows in it, a few of
> them for each table. See also the class MetaScanner.
>
> On Thu, Aug
out which META region server to
access.
Not sure if I get the points. Please feel free to correct me.
regards,
Lin
On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma wrote:
> Doug, very informative document. Thanks a lot!
>
> I read through it and have some thoughts,
>
> - Supposing at the
On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil wrote:
>
> For further information about the catalog tables and region-regionserver
> assignment, see thisŠ
>
> http://hbase.apache.org/book.html#arch.catalog
>
>
>
>
>
>
> On 8/19/12 7:36 AM, "Lin Ma" w
you could point me to some more detailed information.
regards,
Lin
On Thu, Aug 23, 2012 at 9:35 PM, Harsh J wrote:
> Hi Lin,
>
> On Thu, Aug 23, 2012 at 4:31 PM, Lin Ma wrote:
> > Thank you Abhishek,
> >
> > Two more comments,
> >
> > -- "Client onl
server.
>
> Btw,Client only caches information as needed for its queries and not
> necessarily for 'all' region servers.
>
> Abhishek
>
>
> i Sent from my iPad with iMstakes
>
> On Aug 22, 2012, at 23:31, "Lin Ma" wrote:
>
> > Hello
Hello HBase masters,
I am wondering whether in current implementation, each client of HBase
cache all information of region server, for example, where is region server
(physical hosting machine of region server), and also cache row-key range
managed by the region server. If so, two more questions,
e such that 'x' comes in between and Hbase
> will load that block. So usage of blooms can avoid this IO. Hope this is
> clear for you now.
>
> -Anoop-
>
> From: Lin Ma [lin...@gmail.com]
> Sent: Wednesday, August 22, 2012 5:41 PM
> To: J Mohamed Zahoor;
Thanks Zahoor,
I read through the document you referred to, I am confused about what means
leaf-level index, intermediate-level index and root-level index. It is
appreciate if you could give more details what they are, or point me to the
related documents.
BTW: the document you pointed me is very
Thanks Zahoor,
> If there is no bloom... you have to load every block and scan to find if
the row exists..
I could be wrong. I think HFile index block (which is located at the end of
HFile) is a binary search tree containing all row-key values (of the HFile)
in the binary search tree. Searching a
Thank you Zahoor,
Two more comments,
1. After reading the materials you sent to me, I am confused how Bloom
Filter could save I/O during random read. Supposing I am not using Bloom
Filter, in order to find whether a row (or row-key) exists, we need to scan
the index block which is at the end part
e are some requests
> > which will be handled incorrectly.
> >
> > Memcached is great but also look at Guava cache for similar use cases.
> >
> > Asif Ali
> >
> >
> > On Mon, Aug 20, 2012 at 9:09 AM, Lin Ma wrote:
> >
> > > Thank you D
reat but also look at Guava cache for similar use cases.
>
> Asif Ali
>
>
> On Mon, Aug 20, 2012 at 9:09 AM, Lin Ma wrote:
>
> > Thank you Drew. I like your reply, especially blocking cache nature
> > provided by HBase. A quick question, for traditional memcached,
odel is persist things in HBase and then cache things
> with memcached just as you would with any other data store. If you're
> looking for a spiffy memcached replacement I'd recommend checking out
> Redis.
>
>
> On Sat, Aug 18, 2012 at 3:12 AM, Lin Ma wrote:
>
hich is
region / physical server mapping data. Why you say not data (do you mean
real content in each region)?
regards,
Lin
On Sun, Aug 19, 2012 at 12:40 PM, Stack wrote:
> On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma wrote:
> > Hello guys,
> >
> > I am referencing the Big T
Hello guys,
I am referencing the Big Table paper about how a client locates a tablet.
In section 5.1 Tablet location, it is mentioned that client will cache all
tablet locations, I think it means client will cache root tablet in
METADATA table, and all other tablets in METADATA table (which means
llowing link comparing "traditional" columnar databases against
> HBase/BigTable interesting:
>
>
> http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html
>
> -Jason
>
> On Sun, Aug 5, 2012 at 8:03 PM, Lin Ma wrote:
>
> > Thank
region servers serve a single region gets you into the land
> of maintaining consistency across copies, which is challenging. It might be
> doable but that's not the design choice Bigtable (and hence HBase) made
> initially.
>
> On Thu, Aug 9, 2012 at 11:04 AM, Lin Ma wrote:
:41 PM, Amandeep Khurana wrote:
> HDFS also chooses to degrade availability in the face of partitions.
>
>
> On Thu, Aug 9, 2012 at 11:08 AM, Lin Ma wrote:
>
>> Amandeep, thanks for your comments, and I will definitely read the paper
>> you suggested.
>>
>&g
will be visible to all
> clients. There is no concept of multiple different versions that the
> clients need to reconcile between. When you read, you always get the same
> version of the row you are reading. In other words, HBase is strongly
> consistent.
>
> Hope that clears things u
gt;
> On Wed, Aug 8, 2012 at 10:32 PM, Lin Ma wrote:
>
> > Thank you Lars.
> >
> > Is the same data store duplicated copy across region server? If so, if
> one
> > primary server for the region dies, client just need to read from the
> > secondary server for the
>
> - Original Message -
> From: Lin Ma
> To: user@hbase.apache.org
> Cc:
> Sent: Wednesday, August 8, 2012 8:47 AM
> Subject: Re: consistency, availability and partition pattern of HBase
>
> And consistency is not sacrificed? i.e. all distributed clients' upda
ility is sacrificed in the sense that if region server
> fails clients will have data inaccessible for the time region comes up on
> some other server, not to confuse with data loss.
>
> Sent from my iPad
>
> On Aug 7, 2012, at 11:56 PM, Lin Ma wrote:
>
> > Thank
gt; Wei Tan
> Research Staff Member
> IBM T. J. Watson Research Center
> 19 Skyline Dr, Hawthorne, NY 10532
> w...@us.ibm.com; 914-784-6752
>
>
>
> From: Lin Ma
> To: user@hbase.apache.org,
> Date: 08/07/2012 09:30 PM
> Subject:consistency, availa
Hello guys,
According to the notes by Werner*, "*He presented the CAP theorem, which
states that of three properties of shared-data systems—data consistency,
system availability, and tolerance to network partition—only two can be
achieved at any given time." =>
http://www.allthingsdistributed.com/
you'd better take a look
> at the content of HFile.
>
> regards!
>
> Yong
>
> On Mon, Aug 6, 2012 at 5:03 AM, Lin Ma wrote:
> > Thank you for the informative reply, Mohit!
> >
> > Some more comments,
> >
> > 1. actually my confusion about co
2, region B has row3.
> A region is shard of a table based on the row key and just
>
> #1 above means that HBase will never place key value for "row1" in
> different regions.
> #2 means you very efficiently locate specific keys, as they are always
> stored sorted.
>
&
t;
> I tried to write this up a while back:
> http://hadoop-hbase.blogspot.com/2011/12/introduction-to-hbase.html
>
> -- Lars
>
>
>
> - Original Message -
> From: Lin Ma
> To: user@hbase.apache.org
> Cc:
> Sent: Sunday, August 5, 2012 6:04 AM
> Subject: colu
is used to
describe the pattern to store sparse, large number of columns (with NULL
for free). Any comments?
regards,
Lin
On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia wrote:
> On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma wrote:
>
> > Hi guys,
> >
> > I am wondering whether H
Hi guys,
I am wondering whether HBase is using column based storage or row based
storage?
- I read some technical documents and mentioned advantages of HBase is
using column based storage to store similar data together to foster
compression. So it means same columns of different rows are
42 matches
Mail list logo