DFS block.
Thats right.
> Given that HDFS doesn't support
> random reads within a block, how is that possible?
It does support reading at an explicit offset. See [1] and the pread method
that follows.
> Or does HBase somehow short circuit and go directly to OS, bypassing
> HDFS
I'd like to understand HBase block reads better. Assume my HBase
block is 64KB and my HDFS block is 64MB.
I've read that HBase can just do a random read of the 64KB block,
without reading the 64MB HDFS block. Given that HDFS doesn't support
random reads within a block, how is th
What about your KV size and HFile block size for the table. For a random
read type of use case a lower value for HFile block size might help.
-Anoop-
On Fri, Aug 15, 2014 at 1:56 AM, Esteban Gutierrez
wrote:
> If not set in hbase-site.xml both tcpnodelay and tcpkeepalive are set to
> true (th
If not set in hbase-site.xml both tcpnodelay and tcpkeepalive are set to
true (thats the default behavior since 0.95/0.96)
Have you noticed if the call processing times or the call queue is too
high? How does IO look like when you do try to this random gets? are those
gets going 100% of the time t
Thomas:
Have you set tcpnodelay to true ?
See http://hbase.apache.org/book.html for explanation of
hbase.ipc.client.tcpnodelay
Cheers
On Thu, Aug 14, 2014 at 11:41 AM, Thomas Kwan
wrote:
> Hi Esteban,
>
> Thanks for sharing ideas.
>
> We are on Hbase 0.96 and java 1.6. I have enabled short-ci
Hi Esteban,
Thanks for sharing ideas.
We are on Hbase 0.96 and java 1.6. I have enabled short-circuit read,
and heap size is around 16G for each region server. We have about 20
of them.
The list of rowkeys that I need to process is about 10M. I am using
batch gets already and the batch size is ~
Hello Thomas,
What version of HBase are you using? sorting and grouping based on the
regions the rows is going to help for sure. I don't think you should focus
too much in the locality side of the problem unless your HDFS input set is
too large (100s or 1000s of MBs per task), otherwise it might b
Hi there
I have a use-case where I need to do a read to check if a hbase entry
is present, then I do a put to create the entry when it is not there.
I have a script to get a list of rowkeys from hive and put them on a
HDFS directory. Then I have a MR job that reads the rowkeys and do
batch reads.
Jonathan,
hbase(main):002:0> describe 'states'
DESCRIPTION
ENABLED
{NAME => 'states', FAMILIES => [{NAME => 'cf', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRE true
SSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647
On Wed, Sep 26, 2012 at 12:01 PM, Jonathan Bishop wrote:
> Kevin,
>
> So, setting HBase block size is which configuration?
>
> Just tried the hadoop shortcircuit option and I see it does improve the
> performance, perhaps twice as fast, although it is hard to tell whether
> this was due to some ot
Kevin,
So, setting HBase block size is which configuration?
Just tried the hadoop shortcircuit option and I see it does improve the
performance, perhaps twice as fast, although it is hard to tell whether
this was due to some other load on the network/machines changing.
Jon
Though I haven't personally tried it yet, I have been told that the
enabling the shortcut for local-client reads is very effective at speeding
up random reads in hbase. More here:
https://issues.apache.org/jira/browse/HDFS-2246
We are using the cloudera package which includes this pat
> large number of columns in a row?
>
> Thanks for the suggestions everyone.
>
> Jon
>
> On Wed, Sep 26, 2012 at 6:06 AM, Kevin O'dell >wrote:
>
> > What is your block size you are using? Typically a smaller block size
> can
> > help with random read
On Wed, Sep 26, 2012 at 9:05 AM, Jonathan Bishop wrote:
> I am using block size in HDFS of 64MB - the default I believe. I'll try
> something smaller, say 16MB or even 4MB.
>
> I'll also give bloom filters a try, but I don't believe that will help
> because I have so few columns. Isn't bloom filte
of columns in a row?
Thanks for the suggestions everyone.
Jon
On Wed, Sep 26, 2012 at 6:06 AM, Kevin O'dell wrote:
> What is your block size you are using? Typically a smaller block size can
> help with random reads, but will have a longer create time.\
> -Kevin
>
> On Wed,
What is your block size you are using? Typically a smaller block size can
help with random reads, but will have a longer create time.\
-Kevin
On Wed, Sep 26, 2012 at 2:18 AM, Anoop Sam John wrote:
> Can you try with bloom filters? This can help in get()
>
Can you try with bloom filters? This can help in get()
-Anoop-
From: Jonathan Bishop [jbishop@gmail.com]
Sent: Wednesday, September 26, 2012 11:34 AM
To: user@hbase.apache.org
Subject: Tuning HBase for random reads
Hi,
I am running hbase-0.92.1 and
Yes. What I meant was a low number of slots on these TTs alone (those
that are co-located with RS, if you want to do that) by having a
limited maximum of map and reduce slots configured on it specially. Or
if you use MR2 over YARN, you must limit the NodeManager's maximum
memory usage.
On Sat, Aug
How would you define a "low slotted" ?
A poor scheduling capacity to avoid high number of mappers ?
On Sat, Aug 25, 2012 at 3:32 PM, Harsh J wrote:
> Hi Marc,
>
> On Sat, Aug 25, 2012 at 12:56 AM, Marc Sturlese
> wrote:
>> The reasons for that would be:
>> -After running full compaction, HFiles
Hi Marc,
On Sat, Aug 25, 2012 at 12:56 AM, Marc Sturlese wrote:
> The reasons for that would be:
> -After running full compaction, HFiles end up in the RS nodes, so would
> achieve data locality.
> -As I have replication factor 3 and just 2 Hbase nodes, I know that no map
> task would try to read
ferent clusters)
--
View this message in context:
http://old.nabble.com/RS%2C-TT%2C-shared-DN-and-good-performance-on-random-Hbase-random-reads.-tp34345744p34345744.html
Sent from the HBase User mailing list archive at Nabble.com.
se and comparing it with other distributed database I
> > know much better. I am currently stressing my testing platform (servers
> > with 32 GB Ram, 16 GB allocated to HBase JVM) and I'm observing strange
> > performances... I'm putting tons of well-spred data (100 Tables
latform (servers
> with 32 GB Ram, 16 GB allocated to HBase JVM) and I'm observing strange
> performances... I'm putting tons of well-spred data (100 Tables of 100M
> rows in a single column family) and then I'm performing random reads. I get
> good read performances whi
with other distributed database I
know much better. I am currently stressing my testing platform (servers
with 32 GB Ram, 16 GB allocated to HBase JVM) and I'm observing strange
performances... I'm putting tons of well-spred data (100 Tables of 100M
rows in a single column family) and then I
(100 Tables of 100M
rows in a single column family) and then I'm performing random reads. I get
good read performances while the table does not have too much data in it,
but in a big table, I only get around 100/300 qps. I'm not swapping, don't
see any long pauses due to GC and inser
2011/6/24 Sateesh Lakkarsu
> I'll look into HDFS-347, but in terms of driving more reads thru, does
> having more discs help? or would RS be the bottleneck? Any thoughts on this
> plz?
>
Increasing number of disks should increase your read throughput.
We did and experiment with 5 disks and 10 dis
Yes.
If you have blown the cache then getting more IOPs per second is good.
On Fri, Jun 24, 2011 at 4:08 PM, Sateesh Lakkarsu wrote:
> I'll look into HDFS-347, but in terms of driving more reads thru, does
> having more discs help? or would RS be the bottleneck? Any thoughts on this
> plz?
>
I'll look into HDFS-347, but in terms of driving more reads thru, does
having more discs help? or would RS be the bottleneck? Any thoughts on this
plz?
If you are defeating caching you will want to patch in HDFS-347.
Good luck!
On Fri, Jun 24, 2011 at 3:25 PM, Sateesh Lakkarsu wrote:
> block cache was at default 0.2%, the id's being looked up don't repeat and
> each one has a lot of versions, so not expecting cache hits - also was
> seeing a l
block cache was at default 0.2%, the id's being looked up don't repeat and
each one has a lot of versions, so not expecting cache hits - also was
seeing a lot of cache evictions as is. Can we get better performance in such
a scenario?
Does having more discs help? or would RS be the bottleneck?
Lo
been testing random reads and from a 6 node cluster (1NN, 5DN,
> 1HM,
> > 5RS each with 48G, 5 disks) right now seeing a throughput of 1100 per sec
> > per node. Most of the configs are default, except 4G for RS,
> *handler.count
> > and gc (
> >
> >
> http://www.c
2011/6/23 Sateesh Lakkarsu
> We have been testing random reads and from a 6 node cluster (1NN, 5DN, 1HM,
> 5RS each with 48G, 5 disks) right now seeing a throughput of 1100 per sec
> per node. Most of the configs are default, except 4G for RS, *handler.count
> and g
We have been testing random reads and from a 6 node cluster (1NN, 5DN, 1HM,
5RS each with 48G, 5 disks) right now seeing a throughput of 1100 per sec
per node. Most of the configs are default, except 4G for RS, *handler.count
and gc (
http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase
Under our load at su, the new gen would grow to max size and take 800+ ms. I
would consider setting the ms goal to 20-40ms (what we get in prod now). At
1gb par new i would expect large pauses. Plus in my previous tests the
promotion was like 75% even with a huge par new.
This is all based on my b
On Fri, Feb 4, 2011 at 12:20 AM, Lars George wrote:
> I saw the -XX:MaxGCPauseMillis option too and assumed it is not that
> effective as it was never suggested so far. So it was simply not tried
> yet and someone has to be the guinea pig?
>
Yeah, haven't had good experience with these upper-boun
increasing the RAM?
>> >>> > > >>
>> >>> > > >> I am adding some more info about the app.
>> >>> > > >>
>> >>> > > >>> We are storing web page data in HBase.
>> >>> > > >
nt
> plan
> >>> to
> >>> > > do
> >>> > > >> scan's..
> >>> > > >>> We have LZOCompression Set on this column family.
> >>> > > >>> We were noticing 1500 Reads, when reading the page conte
olumn family.
>>> > > >>> We were noticing 1500 Reads, when reading the page content.
>>> > > >>> We have a column family, which stores just metadata of the page
>>> > "title"
>>> > > >> etc... When reading this the perf
performance is whopping 12000 TPS.
>> > > >>
>> > > >> We though the issue could be because of N/w bandwidth used between
>> > HBase
>> > > >> and Clients. So we disable LZO Compression on Column Family and
>> > started
>> >
Sequential writes vs random reads on disk are always faster. You want
caching. Lots of it :)
On Feb 3, 2011 10:24 PM, "charan kumar" wrote:
> Hello,
>
> I am using Hbase 0.90.0 with hadoop-append. on a 30 m/c cluster (1950, 2
> CPU, 6 G).
>
> Writes peak at 5000 per s
gt; doing the compression of the raw page on the client and decompress
> it
> > > when
> > > >> readind (LZO).
> > > >>
> > > >>> With this my write performance jumped up from 2000 to 5000 at peak.
> > > >>> With this approach, th
Hello,
I am using Hbase 0.90.0 with hadoop-append. on a 30 m/c cluster (1950, 2
CPU, 6 G).
Writes peak at 5000 per second. But Reads are only at 1000 QPS. We hash
the key for even distribution across regions. Any
recommendations/suggestions?
Thanks,
Charan
the raw page on the client and decompress it
> > when
> > >> readind (LZO).
> > >>
> > >>> With this my write performance jumped up from 2000 to 5000 at peak.
> > >>> With this approach, the servers are crashing... Not sure , why only
> >
On Thu, Feb 3, 2011 at 12:13 PM, Jonathan Gray wrote:
> >>
> >>> How much heap are you running on your RegionServers?
> >>>
> >>> 6GB of total RAM is on the low end. For high throughput applications,
> I
> >>> would recommend at least 6-8GB of
total RAM is on the low end. For high throughput applications, I
>>> would recommend at least 6-8GB of heap (so 8+ GB of RAM).
>>>
>>>> -Original Message-
>>>> From: charan kumar [mailto:charan.ku...@gmail.com]
>>>> Sent: Thursday, Febru
M is on the low end. For high throughput applications, I
> > would recommend at least 6-8GB of heap (so 8+ GB of RAM).
> >
> > > -Original Message-
> > > From: charan kumar [mailto:charan.ku...@gmail.com]
> > > Sent: Thursday, February 03, 2011 11
-Original Message-
> > From: charan kumar [mailto:charan.ku...@gmail.com]
> > Sent: Thursday, February 03, 2011 11:47 AM
> > To: user@hbase.apache.org
> > Subject: Region Servers Crashing during Random Reads
> >
> > Hello,
> >
> > I am using hbase 0.9
bruary 03, 2011 11:47 AM
> To: user@hbase.apache.org
> Subject: Region Servers Crashing during Random Reads
>
> Hello,
>
> I am using hbase 0.90.0 with hadoop-append. h/w ( Dell 1950, 2 CPU, 6 GB
> RAM)
>
> I had 9 Region Servers crash (out of 30) in a span of 30 m
Hello,
I am using hbase 0.90.0 with hadoop-append. h/w ( Dell 1950, 2 CPU, 6 GB
RAM)
I had 9 Region Servers crash (out of 30) in a span of 30 minutes during a
heavy reads. It looks like a GC, ZooKeeper Connection Timeout thingy to me.
I did all recommended configuration from the Hbase wiki... An
On Fri, Sep 24, 2010 at 10:18 AM, Sharma, Avani wrote:
> My HBase is 0.20.6. hadoop is 0.20.2.
> We have a 3 node cluster with master, namenode, jobtracker, tasktracker,
> datanode and regionserver on one machine and the other two machines are
> tasktracker, datanode and regionserver.
> The heap
reads
I am getting the below errors in my datanode and regionserver logs when doing
random reads from HBase tables using Stargate.
My HBase is 0.20.6. hadoop is 0.20.2.
We have a 3 node cluster with master, namenode, jobtracker, tasktracker,
datanode and regionserver on one machine and the
I am getting the below errors in my datanode and regionserver logs when doing
random reads from HBase tables using Stargate.
My HBase is 0.20.6. hadoop is 0.20.2.
We have a 3 node cluster with master, namenode, jobtracker, tasktracker,
datanode and regionserver on one machine and the other two
52 matches
Mail list logo