+ http://blog.sematext.com/2012/12/24/hbasewd-and-hbasehut-handy-hbase-libraries-available-in-public-maven-repo/ if you use Maven and want to use HBaseWD.
Otis -- HBASE Performance Monitoring - http://sematext.com/spm/index.html On Sat, Apr 20, 2013 at 11:24 AM, Amit Sela <am...@infolinks.com> wrote: > Hope I'm not too late here... regarding hot spotting with sequential keys, > I'd suggest you read this Sematext blog - > http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ > They present a nice idea there for this kind of issues. > > Good Luck! > > > > On Mon, Apr 15, 2013 at 11:18 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> bq. write performance would be lower >> >> The above means poorer performance. >> >> bq. I could batch them up application side >> >> Please do that. >> >> bq. I guess there is no way to turn that off? >> >> That's right. >> >> On Mon, Apr 15, 2013 at 11:15 AM, Kireet <kir...@feedly.com> wrote: >> >> > >> > >> > >> > Thanks for the reply. "write performance would be lower" -> this means >> > better? >> > >> > Also I think I used the wrong terminology regarding batching. I meant to >> > ask if it uses the client side write buffer. I would think not since the >> > append() method returns a Result. I could batch them up application side >> I >> > suppose. Append also seems to return the updated value. This seems like a >> > lot of unnecessary I/O in my case since I am not immediately interested >> in >> > the updated value. I guess there is no way to turn that off? >> > >> > >> > On 4/15/13 1:28 PM, Ted Yu wrote: >> > >> >> I assume you would select HBase 0.94.6.1 (the latest release) for this >> >> project. >> >> >> >> For #1, write performance would be lower if you choose to use Append >> (vs. >> >> using Put). >> >> >> >> bq. Can appends be batched by the client or do they execute immediately? >> >> This depends on your use case. Take a look at the following method in >> >> HTable where you can send a list of actions (Appends): >> >> >> >> public void batch(final List<?extends Row> actions, final Object[] >> >> results) >> >> For #2 >> >> bq. The other would be to prefix the timestamp row key with a random >> >> leading byte. >> >> >> >> This technique has been used elsewhere and is better than the first one. >> >> >> >> Cheers >> >> >> >> On Mon, Apr 15, 2013 at 6:09 AM, Kireet Reddy >> <kireet-Teh5dPVPL8nQT0dZR+* >> >> *a...@public.gmane.org < >> kireet-teh5dpvpl8nqt0dzr%2ba...@public.gmane.org>> >> >> wrote: >> >> >> >> I are planning to create a "scheduled task list" table in our hbase >> >>> cluster. Essentially we will define a table with key timestamp and then >> >>> the >> >>> row contents will be all the tasks that need to be processed within >> that >> >>> second (or whatever time period). I am trying to do the "reasonably >> wide >> >>> rows" design mentioned in the hbasecon opentsdb talk. A couple of >> >>> questions: >> >>> >> >>> 1. Should we use append or put to create tasks? Since these rows will >> not >> >>> live forever, storage space in not a concern, read/write performance is >> >>> more important. As concurrency increases I would guess the row lock may >> >>> become an issue in append? Can appends be batched by the client or do >> >>> they >> >>> execute immediately? >> >>> >> >>> 2. I am a little worried about hotspots. This basic design may cause >> >>> issues in terms of the table's performance. Many tasks will execute and >> >>> reschedule themselves using the same interval, t + 1 hour for example. >> So >> >>> many the writes may all go to the same block. Also, we have a lot of >> >>> other >> >>> data so I am worried it may impact performance of unrelated data if the >> >>> region server gets too busy servicing the task list table. I can think >> >>> of 2 >> >>> strategies to avoid this. One would be to create N different tables and >> >>> read/write tasks to them randomly. This may spread load across servers, >> >>> but >> >>> there is no guarantee hbase will place the tables on different region >> >>> servers, correct? The other would be to prefix the timestamp row key >> >>> with a >> >>> random leading byte. Then when reading from the task list table, >> >>> consumers >> >>> could scan from any/all possible values of the random byte + current >> >>> timestamp to obtain tasks. Both strategies seem like they could spread >> >>> out >> >>> load, but at the cost of more work/complexity to read tasks from the >> >>> table. >> >>> Do either of those approaches make sense? >> >>> >> >>> On the read side, it seems like a similar problem exists in that all >> >>> consumers will be reading rows based on the current timestamp. Is this >> >>> good >> >>> because the block will very likely be cached or bad because the region >> >>> server may become overloaded? I have a feeling the answer is going to >> be >> >>> "it depends". :) >> >>> >> >>> I did see the previous posts on queues and the tips there - use >> zookeeper >> >>> for coordination, schedule major compactions, etc. Sorry if these >> >>> questions >> >>> are basic, I am pretty new to hbase. Thanks! >> >>> >> >> >> >> >> > >> > >>