Re: Confirming a Bug

2012-03-19 Thread Lars George
Hi Peter, Lars #1 here again :) That is fine, the caching is done transparently for you. But what I also suggest is counting the number of KeyValues you get back, just to confirm. In other words, iterate over the result and check how many actual KVs you get back. The reason I am asking is

0.92 and Read/writes not scaling

2012-03-19 Thread Juhani Connolly
Hi, We're running into a brick wall where our throughput numbers will not scale as we increase server counts both using custom inhouse tests and ycsb. We're using hbase 0.92 on hadoop 0.20.2(we also experience the same issues using 0.90 before switching our testing to this version). Our

RE: 0.92 and Read/writes not scaling

2012-03-19 Thread Ramkrishna.S.Vasudevan
Hi Juhani Can you tell more on how the regions are balanced? Are you overloading only specific region server alone? Regards Ram -Original Message- From: Juhani Connolly [mailto:juha...@gmail.com] Sent: Monday, March 19, 2012 4:11 PM To: user@hbase.apache.org Subject: 0.92 and

Re: 0.92 and Read/writes not scaling

2012-03-19 Thread Juhani Connolly
Our custom tests are randomly distributed over 64 bit keys. The ycsb tests use the zipfian request distribution(so its an uneven distribution to hit certain rows more frequently) Monitoring the web interface, most of the time the load is pretty even(though occasionally a region will briefly stop

Re: 0.92 and Read/writes not scaling

2012-03-19 Thread Mingjian Deng
@Juhani: How many clients did you test? Maybe the bottleneck was client? 2012/3/19 Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com Hi Juhani Can you tell more on how the regions are balanced? Are you overloading only specific region server alone? Regards Ram -Original

Re: 0.92 and Read/writes not scaling

2012-03-19 Thread Christian Schäfer
referring to my experiences I expect the client to be the bottleneck, too. So try to increase the count of client-machines (not client threads) each with its own unshared network interface. In my case I could double write throughput by doubling client machine count with a much smaller system

RE: 0.92 and Read/writes not scaling

2012-03-19 Thread Ramkrishna.S.Vasudevan
Hi In our experience rather than increasing threads increase the number of clients. Increasing the client number has given us better throughput. Regards Ram -Original Message- From: Juhani Connolly [mailto:juha...@gmail.com] Sent: Monday, March 19, 2012 5:33 PM To:

Re: 0.92 and Read/writes not scaling

2012-03-19 Thread Juhani Connolly
Actually we did try running off two machines both running our own tests in parallel. Unfortunately the results were a split that results in the same total throughput. We also did the same thing with iperf running from each machine to another machine, indicating 800Mb additional throughput between

ethernet channel bonding experiences

2012-03-19 Thread Oliver Meyn (GBIF)
Hi all, I've been experimenting with PerformanceEvaluation in the last weeks and on a whim thought I'd give channel bonding a try to see if it was networking bandwidth that was acting as the bottleneck. It would seem that it's not quite as trivial as it sounds, so I'm looking for other

Re: There is no data value information in HLog?

2012-03-19 Thread Ted Yu
Hi, Have you noticed this in HLogPrettyPrinter ? options.addOption(p, printvals, false, Print values); Looks like you should have specified the above option. On Mon, Mar 19, 2012 at 7:31 AM, yonghu yongyong...@gmail.com wrote: Hello, I used the $ ./bin/hbase

Re: 0.92 and Read/writes not scaling

2012-03-19 Thread Matt Corgan
I'd be curious to see what happens if you split the table into 1 region per CPU core, so 24 cores * 11 servers = 264 regions. Each region has 1 memstore which is a ConcurrentSkipListMap, and you're currently hitting each CSLM with 8 cores which might be too contentious. Normally in production

Re: ethernet channel bonding experiences

2012-03-19 Thread Jean-Daniel Cryans
Hi Oliver, Unless you are network-bound you shouldn't see an improvement, verify that first. J-D On Mon, Mar 19, 2012 at 8:58 AM, Oliver Meyn (GBIF) om...@gbif.org wrote: Hi all, I've been experimenting with PerformanceEvaluation in the last weeks and on a whim thought I'd give channel

HBase shell - control characters in row key

2012-03-19 Thread Jon Bender
Hi everyone, I've got a couple keys in my HBase table that are delimited by the EOT (\x04) character. I've tried a couple ways to query this: eg: get 'table', 'foo\x04bar'; get 'table' 'foo\cDbar' but haven't had any luck. If I scan the table the keys come back as 'foo\x04bar' in the shell but

Re: HBase shell - control characters in row key

2012-03-19 Thread Lars George
Hi Jon, Please see the help the shell prints out, it has a section on how to use binary characters. Important is to enclose the code points in double quotes - courtesy of JRuby. The single quotes are literals only. HTH, Lars On Mar 19, 2012, at 6:03 PM, Jon Bender wrote: Hi everyone,

Hbase Transactional support

2012-03-19 Thread Deepika Khera
Hi, I have some map reduce jobs that write to Hbase. I am trying to pick a library that could provide transactional support for Hbase. I looked at Omid and hbase-trx . Could you please provide me with a comparison between the two so I can make the right choice. Are there any other ways to do

Thrift and coprocessors

2012-03-19 Thread Ben West
Hi all, We use thrift to access HBase, and I've been playing around with endpoint coprocessors. I'm wondering how I can use thrift to access these - it seems like they're mostly supported with Java clients. So far, I've just been adding each function to the thrift schema and then manually

Re: Thrift and coprocessors

2012-03-19 Thread Gary Helmling
Currently endpoint coprocessors are only callable via the java client. Please do open a JIRA describing what you would like to see here. If you'd like to try working up a patch, that would be even better! On Mon, Mar 19, 2012 at 11:03 AM, Ben West bwsithspaw...@yahoo.com wrote: Hi all, We

Re: Confirming a Bug

2012-03-19 Thread Peter Wolf
Hello Lars and Lars, Thank you for you help and attention. I wrote a standalone test that exhibits the bug. http://dl.dropbox.com/u/68001072/HBaseScanCacheBug.java Here is the output. It shows how the number of results and key value pairs varies as caching in changed, and families are

Re: Hbase Transactional support

2012-03-19 Thread Maysam Yabandeh
Hi Deepika, Omid provides Snapshot Isolation (SI), which is a well-known isolation guarantee in database systems such as Oracle. In short, each transaction reads from a consistent snapshot that does not include partial changes by concurrent (or failed) transactions. SI also prevents

RE: Hbase Transactional support

2012-03-19 Thread Sandy Pratt
Maysam, I wasn't aware of Omid before this post, so thanks for sharing that. I really like the approach and indeed our own implementation of transactions on HBase uses MVCC and optimistic concurrency control with a centralized transaction manager. I think it's a great fit for HBase. One

Rows vs. Columns

2012-03-19 Thread Konrad Tendera
Hello, I'm designing some schema for my use case and I'm considering what will be better: rows or columns. Here's what I need - my schema actually looks like this (it will be used for keeping not large pdf files or single pages of larger document) table files: family info:

Re: SocketTimeoutException upon 'create' command

2012-03-19 Thread Yermalkar, Sanjay
Just in case somebody else stumbles upon this problem: This was due to the combination of https://issues.apache.org/jira/browse/HBASE-3744 and that there were some table inconsistencies. After fixing the table inconsistencies, all region servers reported back in time and the socketimoutexception

Re: Hbase Transactional support

2012-03-19 Thread Deepika Khera
Thanks Maysam. I am trying out Omid to see if it will fit my needs. As I told you I am writing to hbase from a map reduce jobs. If my commit and rollback is around a reducer task then it will be quite straight forward. But if the commit should happen if all tasks of the M/R job succeed(which is