Re: Understanding HBase random reads

2016-07-04 Thread Stack
DFS block. Thats right. > Given that HDFS doesn't support > random reads within a block, how is that possible? It does support reading at an explicit offset. See [1] and the pread method that follows. > Or does HBase somehow short circuit and go directly to OS, bypassing > HDFS

Understanding HBase random reads

2016-07-04 Thread Robert James
I'd like to understand HBase block reads better. Assume my HBase block is 64KB and my HDFS block is 64MB. I've read that HBase can just do a random read of the 64KB block, without reading the 64MB HDFS block. Given that HDFS doesn't support random reads within a block, how is th

Re: random reads

2014-08-15 Thread Anoop John
What about your KV size and HFile block size for the table. For a random read type of use case a lower value for HFile block size might help. -Anoop- On Fri, Aug 15, 2014 at 1:56 AM, Esteban Gutierrez wrote: > If not set in hbase-site.xml both tcpnodelay and tcpkeepalive are set to > true (th

Re: random reads

2014-08-14 Thread Esteban Gutierrez
If not set in hbase-site.xml both tcpnodelay and tcpkeepalive are set to true (thats the default behavior since 0.95/0.96) Have you noticed if the call processing times or the call queue is too high? How does IO look like when you do try to this random gets? are those gets going 100% of the time t

Re: random reads

2014-08-14 Thread Ted Yu
Thomas: Have you set tcpnodelay to true ? See http://hbase.apache.org/book.html for explanation of hbase.ipc.client.tcpnodelay Cheers On Thu, Aug 14, 2014 at 11:41 AM, Thomas Kwan wrote: > Hi Esteban, > > Thanks for sharing ideas. > > We are on Hbase 0.96 and java 1.6. I have enabled short-ci

Re: random reads

2014-08-14 Thread Thomas Kwan
Hi Esteban, Thanks for sharing ideas. We are on Hbase 0.96 and java 1.6. I have enabled short-circuit read, and heap size is around 16G for each region server. We have about 20 of them. The list of rowkeys that I need to process is about 10M. I am using batch gets already and the batch size is ~

Re: random reads

2014-08-14 Thread Esteban Gutierrez
Hello Thomas, What version of HBase are you using? sorting and grouping based on the regions the rows is going to help for sure. I don't think you should focus too much in the locality side of the problem unless your HDFS input set is too large (100s or 1000s of MBs per task), otherwise it might b

random reads

2014-08-14 Thread Thomas Kwan
Hi there I have a use-case where I need to do a read to check if a hbase entry is present, then I do a put to create the entry when it is not there. I have a script to get a list of rowkeys from hive and put them on a HDFS directory. Then I have a MR job that reads the rowkeys and do batch reads.

Re: Tuning HBase for random reads

2012-09-26 Thread Kevin O'dell
Jonathan, hbase(main):002:0> describe 'states' DESCRIPTION ENABLED {NAME => 'states', FAMILIES => [{NAME => 'cf', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRE true SSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647

Re: Tuning HBase for random reads

2012-09-26 Thread Stack
On Wed, Sep 26, 2012 at 12:01 PM, Jonathan Bishop wrote: > Kevin, > > So, setting HBase block size is which configuration? > > Just tried the hadoop shortcircuit option and I see it does improve the > performance, perhaps twice as fast, although it is hard to tell whether > this was due to some ot

Re: Tuning HBase for random reads

2012-09-26 Thread Jonathan Bishop
Kevin, So, setting HBase block size is which configuration? Just tried the hadoop shortcircuit option and I see it does improve the performance, perhaps twice as fast, although it is hard to tell whether this was due to some other load on the network/machines changing. Jon

Re: Tuning HBase for random reads

2012-09-26 Thread Paul Mackles
Though I haven't personally tried it yet, I have been told that the enabling the shortcut for local-client reads is very effective at speeding up random reads in hbase. More here: https://issues.apache.org/jira/browse/HDFS-2246 We are using the cloudera package which includes this pat

Re: Tuning HBase for random reads

2012-09-26 Thread Kevin O'dell
> large number of columns in a row? > > Thanks for the suggestions everyone. > > Jon > > On Wed, Sep 26, 2012 at 6:06 AM, Kevin O'dell >wrote: > > > What is your block size you are using? Typically a smaller block size > can > > help with random read

Re: Tuning HBase for random reads

2012-09-26 Thread Stack
On Wed, Sep 26, 2012 at 9:05 AM, Jonathan Bishop wrote: > I am using block size in HDFS of 64MB - the default I believe. I'll try > something smaller, say 16MB or even 4MB. > > I'll also give bloom filters a try, but I don't believe that will help > because I have so few columns. Isn't bloom filte

Re: Tuning HBase for random reads

2012-09-26 Thread Jonathan Bishop
of columns in a row? Thanks for the suggestions everyone. Jon On Wed, Sep 26, 2012 at 6:06 AM, Kevin O'dell wrote: > What is your block size you are using? Typically a smaller block size can > help with random reads, but will have a longer create time.\ > -Kevin > > On Wed,

Re: Tuning HBase for random reads

2012-09-26 Thread Kevin O'dell
What is your block size you are using? Typically a smaller block size can help with random reads, but will have a longer create time.\ -Kevin On Wed, Sep 26, 2012 at 2:18 AM, Anoop Sam John wrote: > Can you try with bloom filters? This can help in get() >

RE: Tuning HBase for random reads

2012-09-25 Thread Anoop Sam John
Can you try with bloom filters? This can help in get() -Anoop- From: Jonathan Bishop [jbishop@gmail.com] Sent: Wednesday, September 26, 2012 11:34 AM To: user@hbase.apache.org Subject: Tuning HBase for random reads Hi, I am running hbase-0.92.1 and

Re: RS, TT, shared DN and good performance on random Hbase random reads.

2012-08-25 Thread Harsh J
Yes. What I meant was a low number of slots on these TTs alone (those that are co-located with RS, if you want to do that) by having a limited maximum of map and reduce slots configured on it specially. Or if you use MR2 over YARN, you must limit the NodeManager's maximum memory usage. On Sat, Aug

Re: RS, TT, shared DN and good performance on random Hbase random reads.

2012-08-25 Thread Adrien Mogenet
How would you define a "low slotted" ? A poor scheduling capacity to avoid high number of mappers ? On Sat, Aug 25, 2012 at 3:32 PM, Harsh J wrote: > Hi Marc, > > On Sat, Aug 25, 2012 at 12:56 AM, Marc Sturlese > wrote: >> The reasons for that would be: >> -After running full compaction, HFiles

Re: RS, TT, shared DN and good performance on random Hbase random reads.

2012-08-25 Thread Harsh J
Hi Marc, On Sat, Aug 25, 2012 at 12:56 AM, Marc Sturlese wrote: > The reasons for that would be: > -After running full compaction, HFiles end up in the RS nodes, so would > achieve data locality. > -As I have replication factor 3 and just 2 Hbase nodes, I know that no map > task would try to read

RS, TT, shared DN and good performance on random Hbase random reads.

2012-08-24 Thread Marc Sturlese
ferent clusters) -- View this message in context: http://old.nabble.com/RS%2C-TT%2C-shared-DN-and-good-performance-on-random-Hbase-random-reads.-tp34345744p34345744.html Sent from the HBase User mailing list archive at Nabble.com.

Re: Slow random reads, SocketTimeoutExceptions

2012-07-22 Thread Minh Duc Nguyen
se and comparing it with other distributed database I > > know much better. I am currently stressing my testing platform (servers > > with 32 GB Ram, 16 GB allocated to HBase JVM) and I'm observing strange > > performances... I'm putting tons of well-spred data (100 Tables

Re: Slow random reads, SocketTimeoutExceptions

2012-07-11 Thread Adrien Mogenet
latform (servers > with 32 GB Ram, 16 GB allocated to HBase JVM) and I'm observing strange > performances... I'm putting tons of well-spred data (100 Tables of 100M > rows in a single column family) and then I'm performing random reads. I get > good read performances whi

Re: Slow random reads, SocketTimeoutExceptions

2012-07-11 Thread Asaf Mesika
with other distributed database I know much better. I am currently stressing my testing platform (servers with 32 GB Ram, 16 GB allocated to HBase JVM) and I'm observing strange performances... I'm putting tons of well-spred data (100 Tables of 100M rows in a single column family) and then I

Slow random reads, SocketTimeoutExceptions

2012-07-11 Thread Adrien Mogenet
(100 Tables of 100M rows in a single column family) and then I'm performing random reads. I get good read performances while the table does not have too much data in it, but in a big table, I only get around 100/300 qps. I'm not swapping, don't see any long pauses due to GC and inser

Re: Random Reads throughput/performance

2011-06-24 Thread lohit
2011/6/24 Sateesh Lakkarsu > I'll look into HDFS-347, but in terms of driving more reads thru, does > having more discs help? or would RS be the bottleneck? Any thoughts on this > plz? > Increasing number of disks should increase your read throughput. We did and experiment with 5 disks and 10 dis

Re: Random Reads throughput/performance

2011-06-24 Thread Ted Dunning
Yes. If you have blown the cache then getting more IOPs per second is good. On Fri, Jun 24, 2011 at 4:08 PM, Sateesh Lakkarsu wrote: > I'll look into HDFS-347, but in terms of driving more reads thru, does > having more discs help? or would RS be the bottleneck? Any thoughts on this > plz? >

Re: Random Reads throughput/performance

2011-06-24 Thread Sateesh Lakkarsu
I'll look into HDFS-347, but in terms of driving more reads thru, does having more discs help? or would RS be the bottleneck? Any thoughts on this plz?

Re: Random Reads throughput/performance

2011-06-24 Thread Ryan Rawson
If you are defeating caching you will want to patch in HDFS-347. Good luck! On Fri, Jun 24, 2011 at 3:25 PM, Sateesh Lakkarsu wrote: > block cache was at default 0.2%, the id's being looked up don't repeat and > each one has a lot of versions, so not expecting cache hits - also was > seeing a l

Re: Random Reads throughput/performance

2011-06-24 Thread Sateesh Lakkarsu
block cache was at default 0.2%, the id's being looked up don't repeat and each one has a lot of versions, so not expecting cache hits - also was seeing a lot of cache evictions as is. Can we get better performance in such a scenario? Does having more discs help? or would RS be the bottleneck? Lo

Re: Random Reads throughput/performance

2011-06-24 Thread Ted Dunning
been testing random reads and from a 6 node cluster (1NN, 5DN, > 1HM, > > 5RS each with 48G, 5 disks) right now seeing a throughput of 1100 per sec > > per node. Most of the configs are default, except 4G for RS, > *handler.count > > and gc ( > > > > > http://www.c

Re: Random Reads throughput/performance

2011-06-24 Thread lohit
2011/6/23 Sateesh Lakkarsu > We have been testing random reads and from a 6 node cluster (1NN, 5DN, 1HM, > 5RS each with 48G, 5 disks) right now seeing a throughput of 1100 per sec > per node. Most of the configs are default, except 4G for RS, *handler.count > and g

Random Reads throughput/performance

2011-06-23 Thread Sateesh Lakkarsu
We have been testing random reads and from a 6 node cluster (1NN, 5DN, 1HM, 5RS each with 48G, 5 disks) right now seeing a throughput of 1100 per sec per node. Most of the configs are default, except 4G for RS, *handler.count and gc ( http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase

Re: Region Servers Crashing during Random Reads

2011-02-04 Thread Ryan Rawson
Under our load at su, the new gen would grow to max size and take 800+ ms. I would consider setting the ms goal to 20-40ms (what we get in prod now). At 1gb par new i would expect large pauses. Plus in my previous tests the promotion was like 75% even with a huge par new. This is all based on my b

Re: Region Servers Crashing during Random Reads

2011-02-04 Thread Stack
On Fri, Feb 4, 2011 at 12:20 AM, Lars George wrote: > I saw the -XX:MaxGCPauseMillis option too and assumed it is not that > effective as it was never suggested so far. So it was simply not tried > yet and someone has to be the guinea pig? > Yeah, haven't had good experience with these upper-boun

Re: Region Servers Crashing during Random Reads

2011-02-04 Thread Lars George
increasing the RAM? >> >>> > > >> >> >>> > > >> I am adding some more info about the app. >> >>> > > >> >> >>> > > >>> We are storing web page data in HBase. >> >>> > > >

Re: Region Servers Crashing during Random Reads

2011-02-04 Thread Todd Lipcon
nt > plan > >>> to > >>> > > do > >>> > > >> scan's.. > >>> > > >>> We have LZOCompression Set on this column family. > >>> > > >>> We were noticing 1500 Reads, when reading the page conte

Re: Region Servers Crashing during Random Reads

2011-02-03 Thread Lars George
olumn family. >>> > > >>> We were noticing 1500 Reads, when reading the page content. >>> > > >>> We have a column family, which stores just metadata of the page >>> > "title" >>> > > >> etc... When reading this the perf

Re: Region Servers Crashing during Random Reads

2011-02-03 Thread Stack
performance is whopping 12000 TPS. >> > > >> >> > > >> We though the issue could be because of N/w bandwidth used between >> > HBase >> > > >> and Clients. So we disable LZO Compression on Column Family and >> > started >> >

Re: Why Random Reads are much slower than the Writes

2011-02-03 Thread Ryan Rawson
Sequential writes vs random reads on disk are always faster. You want caching. Lots of it :) On Feb 3, 2011 10:24 PM, "charan kumar" wrote: > Hello, > > I am using Hbase 0.90.0 with hadoop-append. on a 30 m/c cluster (1950, 2 > CPU, 6 G). > > Writes peak at 5000 per s

Re: Region Servers Crashing during Random Reads

2011-02-03 Thread charan kumar
gt; doing the compression of the raw page on the client and decompress > it > > > when > > > >> readind (LZO). > > > >> > > > >>> With this my write performance jumped up from 2000 to 5000 at peak. > > > >>> With this approach, th

Why Random Reads are much slower than the Writes

2011-02-03 Thread charan kumar
Hello, I am using Hbase 0.90.0 with hadoop-append. on a 30 m/c cluster (1950, 2 CPU, 6 G). Writes peak at 5000 per second. But Reads are only at 1000 QPS. We hash the key for even distribution across regions. Any recommendations/suggestions? Thanks, Charan

Re: Region Servers Crashing during Random Reads

2011-02-03 Thread Todd Lipcon
the raw page on the client and decompress it > > when > > >> readind (LZO). > > >> > > >>> With this my write performance jumped up from 2000 to 5000 at peak. > > >>> With this approach, the servers are crashing... Not sure , why only > >

Re: Region Servers Crashing during Random Reads

2011-02-03 Thread charan kumar
On Thu, Feb 3, 2011 at 12:13 PM, Jonathan Gray wrote: > >> > >>> How much heap are you running on your RegionServers? > >>> > >>> 6GB of total RAM is on the low end. For high throughput applications, > I > >>> would recommend at least 6-8GB of

Re: Region Servers Crashing during Random Reads

2011-02-03 Thread Charan K
total RAM is on the low end. For high throughput applications, I >>> would recommend at least 6-8GB of heap (so 8+ GB of RAM). >>> >>>> -Original Message- >>>> From: charan kumar [mailto:charan.ku...@gmail.com] >>>> Sent: Thursday, Febru

Re: Region Servers Crashing during Random Reads

2011-02-03 Thread Todd Lipcon
M is on the low end. For high throughput applications, I > > would recommend at least 6-8GB of heap (so 8+ GB of RAM). > > > > > -Original Message- > > > From: charan kumar [mailto:charan.ku...@gmail.com] > > > Sent: Thursday, February 03, 2011 11

Re: Region Servers Crashing during Random Reads

2011-02-03 Thread charan kumar
-Original Message- > > From: charan kumar [mailto:charan.ku...@gmail.com] > > Sent: Thursday, February 03, 2011 11:47 AM > > To: user@hbase.apache.org > > Subject: Region Servers Crashing during Random Reads > > > > Hello, > > > > I am using hbase 0.9

RE: Region Servers Crashing during Random Reads

2011-02-03 Thread Jonathan Gray
bruary 03, 2011 11:47 AM > To: user@hbase.apache.org > Subject: Region Servers Crashing during Random Reads > > Hello, > > I am using hbase 0.90.0 with hadoop-append. h/w ( Dell 1950, 2 CPU, 6 GB > RAM) > > I had 9 Region Servers crash (out of 30) in a span of 30 m

Region Servers Crashing during Random Reads

2011-02-03 Thread charan kumar
Hello, I am using hbase 0.90.0 with hadoop-append. h/w ( Dell 1950, 2 CPU, 6 GB RAM) I had 9 Region Servers crash (out of 30) in a span of 30 minutes during a heavy reads. It looks like a GC, ZooKeeper Connection Timeout thingy to me. I did all recommended configuration from the Hbase wiki... An

Re: DataXceiver problem slowing down random reads

2010-09-24 Thread Stack
On Fri, Sep 24, 2010 at 10:18 AM, Sharma, Avani wrote: > My HBase is 0.20.6. hadoop is 0.20.2. > We have a 3 node cluster with master, namenode, jobtracker, tasktracker, > datanode and regionserver on one machine and the other two machines are > tasktracker, datanode and regionserver. > The heap

RE: DataXceiver problem slowing down random reads

2010-09-24 Thread Sharma, Avani
reads I am getting the below errors in my datanode and regionserver logs when doing random reads from HBase tables using Stargate. My HBase is 0.20.6. hadoop is 0.20.2. We have a 3 node cluster with master, namenode, jobtracker, tasktracker, datanode and regionserver on one machine and the

DataXceiver problem slowing down random reads

2010-09-23 Thread Sharma, Avani
I am getting the below errors in my datanode and regionserver logs when doing random reads from HBase tables using Stargate. My HBase is 0.20.6. hadoop is 0.20.2. We have a 3 node cluster with master, namenode, jobtracker, tasktracker, datanode and regionserver on one machine and the other two