Re: Tool to to execute an benchmark for HBase.

Guillermo Ortiz Fri, 30 Jan 2015 04:35:46 -0800

I have coming back to the benchmark.I executde this command:
yscb run hbase -P workflowA -p columnfamilty=cf -p
operationcount=100000 threads=32


And I got an performace of 2000op/seg
What I did later it's to execute ten of those commands in parallel and
I got about 18000op/sec  in total. I don't get 2000op/sec for each ot
them executions but I got about 1800op/sec

I don't know if ti's an HBase question, but, I don't understand why I
got more performance if I execute more commands in parallel if I
already execute 32 threads.
I took a look to the "top" and I saw that in the first (just one
process) the CPU was working about 20-60% when I launch more processes
the CPU it's about 400-500%.



2015-01-29 18:23 GMT+01:00 Guillermo Ortiz <[email protected]>:
> There's an option when you execute yscb to say how many clients
> threads you want to use. I tried with 1/8/16/32. Those results are
> with 16, the improvement 1vs8 it's pretty high not as much 16 to 32.
> I only use one yscb, could it be that important?
>
> -threads : the number of client threads. By default, the YCSB Client
> uses a single worker thread, but additional threads can be specified.
> This is often done to increase the amount of load offered against the
> database.
>
> 2015-01-29 17:27 GMT+01:00 Nishanth S <[email protected]>:
>> How many instances of ycsb do you run and how many threads do you use per
>> instance.I guess these ops are per instance and  you should get similar
>> numbers if you run  more instances.In short try running more  workload
>> instances...
>>
>> -Nishanth
>>
>> On Thu, Jan 29, 2015 at 8:49 AM, Guillermo Ortiz <[email protected]>
>> wrote:
>>
>>> Yes, I'm using 40%. i can't access to those data either.
>>> I don't know how YSCB executes the reads and if they are random and
>>> could take advange of the cache.
>>>
>>> Do you think that it's an acceptable performance?
>>>
>>>
>>> 2015-01-29 16:26 GMT+01:00 Ted Yu <[email protected]>:
>>> > What's the value for hfile.block.cache.size ?
>>> >
>>> > By default it is 40%. You may want to increase its value if you're using
>>> > default.
>>> >
>>> > Andrew published some ycsb results :
>>> > http://people.apache.org/~apurtell/results-ycsb-0.98.8/ycsb
>>> > -0.98.0-vs-0.98.8.pdf
>>> >
>>> > However, I couldn't access the above now.
>>> >
>>> > Cheers
>>> >
>>> > On Thu, Jan 29, 2015 at 7:14 AM, Guillermo Ortiz <[email protected]>
>>> > wrote:
>>> >
>>> >> Is there any result with that benchmark to compare??
>>> >> I'm executing the different workloads and for example for 100% Reads
>>> >> in a table with 10Millions of records I only get an performance of
>>> >> 2000operations/sec. I hoped much better performance but I could be
>>> >> wrong. I'd like to know if it's a normal performance or I could have
>>> >> something bad configured.
>>> >>
>>> >>
>>> >> I have splitted the tabled and all the records are balanced and used
>>> >> snappy.
>>> >> The cluster has a master and 4 regions servers with 256Gb,Cores 2 (32
>>> >> w/ Hyperthreading), 0.98.6-cdh5.3.0,
>>> >>
>>> >> RegionServer is executed with these parameters:
>>> >>  /usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_regionserver
>>> >> -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
>>> >> -Djava.net.preferIPv4Stack=true -Xms640679936 -Xmx640679936
>>> >> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>> >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>> >> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
>>> >> -Dhbase.log.dir=/var/log/hbase
>>> >>
>>> >>
>>> -Dhbase.log.file=hbase-cmf-hbase-REGIONSERVER-cnsalbsrvcl23.lvtc.gsnet.corp.log.out
>>> >>
>>> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase
>>> >> -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA
>>> >>
>>> >>
>>> -Djava.library.path=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
>>> >> -Dhbase.security.logger=INFO,RFAS
>>> >> org.apache.hadoop.hbase.regionserver.HRegionServer start
>>> >>
>>> >>
>>> >> The results for 100% reads are
>>> >> [OVERALL], RunTime(ms), 42734.0
>>> >> [OVERALL], Throughput(ops/sec), 2340.0570973931763
>>> >> [UPDATE], Operations, 1.0
>>> >> [UPDATE], AverageLatency(us), 103170.0
>>> >> [UPDATE], MinLatency(us), 103168.0
>>> >> [UPDATE], MaxLatency(us), 103171.0
>>> >> [UPDATE], 95thPercentileLatency(ms), 103.0
>>> >> [UPDATE], 99thPercentileLatency(ms), 103.0
>>> >> [READ], Operations, 100000.0
>>> >> [READ], AverageLatency(us), 412.5534
>>> >> [READ], AverageLatency(us,corrected), 581.6249026771276
>>> >> [READ], MinLatency(us), 218.0
>>> >> [READ], MaxLatency(us), 268383.0
>>> >> [READ], MaxLatency(us,corrected), 268383.0
>>> >> [READ], 95thPercentileLatency(ms), 0.0
>>> >> [READ], 95thPercentileLatency(ms,corrected), 0.0
>>> >> [READ], 99thPercentileLatency(ms), 0.0
>>> >> [READ], 99thPercentileLatency(ms,corrected), 0.0
>>> >> [READ], Return=0, 100000
>>> >> [CLEANUP], Operations, 1.0
>>> >> [CLEANUP], AverageLatency(us), 103598.0
>>> >> [CLEANUP], MinLatency(us), 103596.0
>>> >> [CLEANUP], MaxLatency(us), 103599.0
>>> >> [CLEANUP], 95thPercentileLatency(ms), 103.0
>>> >> [CLEANUP], 99thPercentileLatency(ms), 103.0
>>> >>
>>> >> hbase(main):030:0> describe 'username'
>>> >> DESCRIPTION
>>> >>                                     ENABLED
>>> >>  'username', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER
>>> >> => 'ROW', REPLICATION_SCOPE => '0', true
>>> >>   VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0', TTL
>>> >> => 'FOREVER', KEEP_DELETED_CELLS => '
>>> >>  false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
>>> 'true'}
>>> >> 1 row(s) in 0.0170 seconds
>>> >>
>>> >> 2015-01-29 5:27 GMT+01:00 Ted Yu <[email protected]>:
>>> >> > Maybe ask on Cassandra mailing list for the benchmark tool they use ?
>>> >> >
>>> >> > Cheers
>>> >> >
>>> >> > On Wed, Jan 28, 2015 at 1:23 PM, Guillermo Ortiz <
>>> [email protected]>
>>> >> > wrote:
>>> >> >
>>> >> >> I was checking that web, do you know if there's another possibility
>>> >> >> since last updated for Cassandra was two years ago and I'd like to
>>> >> >> compare bothof them with kind of same tool/code.
>>> >> >>
>>> >> >> 2015-01-28 22:10 GMT+01:00 Ted Yu <[email protected]>:
>>> >> >> > Guillermo:
>>> >> >> > If you use hbase 0.98.x, please consider Andrew's ycsb repo:
>>> >> >> >
>>> >> >> > https://github.com/apurtell/ycsb/tree/new_hbase_client
>>> >> >> >
>>> >> >> > Cheers
>>> >> >> >
>>> >> >> > On Wed, Jan 28, 2015 at 12:41 PM, Nishanth S <
>>> [email protected]
>>> >> >
>>> >> >> > wrote:
>>> >> >> >
>>> >> >> >> You can use ycsb for this purpose.See here
>>> >> >> >>
>>> >> >> >> https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started
>>> >> >> >> -Nishanth
>>> >> >> >>
>>> >> >> >> On Wed, Jan 28, 2015 at 1:37 PM, Guillermo Ortiz <
>>> >> [email protected]>
>>> >> >> >> wrote:
>>> >> >> >>
>>> >> >> >> > Hi,
>>> >> >> >> >
>>> >> >> >> > I'd like to do some benchmarks fo HBase but I don't know what
>>> tool
>>> >> >> >> > could use. I started to make some code but I guess that there're
>>> >> some
>>> >> >> >> > easier.
>>> >> >> >> >
>>> >> >> >> > I've taken a look to JMeter, but I guess that I'd attack
>>> directly
>>> >> from
>>> >> >> >> > Java, JMeter looks great but I don't know if it fits well in
>>> this
>>> >> >> >> > scenario. What tool could I use to take some measures as time to
>>> >> >> >> > response some read and write request, etc. I'd like that to be
>>> >> able to
>>> >> >> >> > make the same benchmarks to Cassandra.
>>> >> >> >> >
>>> >> >> >>
>>> >> >>
>>> >>
>>>

Re: Tool to to execute an benchmark for HBase.

Reply via email to