Re: The write process in the Region Server

2012-06-16 Thread Infolinks
Hi Harsh J, I'm not using WAL in my writes. Is there still a log rolling ? ב-Jun 17, 2012, בשעה 7:40, Harsh J כתב/ה: > Amit, > > Your values for HLog block size (hbase.regionserver.hlog.blocksize, > default is the HDFS default block size (64 MB unless you've raised it > properly), too low un

Re: The write process in the Region Server

2012-06-16 Thread Harsh J
Amit, Your values for HLog block size (hbase.regionserver.hlog.blocksize, default is the HDFS default block size (64 MB unless you've raised it properly), too low unless you also have HLog compression) and the factor of max-hlogs-to-keep (hbase.regionserver.maxlogs, default 32 files) can easily ca

Re: Timestamp as a key good practice?

2012-06-16 Thread Rob Verkuylen
Just to add from my experiences: Yes hotspotting is bad, but so are devops headaches. A reasonable machine can handle 3-4000 puts a second with ease, and a simple timerange scan can give you the records you need. I have my doubts you will be hitting these amounts anytime soon. A simple setup will

Re: Timestamp as a key good practice?

2012-06-16 Thread Michael Segel
Jean-Marc, You indicated that you didn't want to do full table scans when you want to find out which files hadn't been touched since X time has past. (X could be months, weeks, days, hours, etc ...) So here's the thing. First, I am not convinced that you will have hot spotting. Second, you e

Re: Timestamp as a key good practice?

2012-06-16 Thread Jean-Marc Spaggiari
Let's imagine the timestamp is "123456789". If I salt it with later from 'a' to 'z' them it will always be split between few RegionServers. I will have like "t123456789". The issue is that I will have to do 26 queries to be able to find all the entries. I will need to query from A0 to Axxx

Re: Timestamp as a key good practice?

2012-06-16 Thread Michel Segel
You can't salt the key in the second table. By salting the key, you lose the ability to do range scans, which is what you want to do. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 16, 2012, at 6:22 AM, Jean-Marc Spaggiari wrote: > Thanks all for your comments and

Re: The write process in the Region Server

2012-06-16 Thread Amit Sela
Thanks Doug, I read the regions section from the book like you recommended but I still have some questions left. When running a massive write job, the regionserver log show the memsize that is flushed. The problem is that most of the time the memsize is either much smaller then the memstore.flush.

Re: Multiple HBase instances per machine

2012-06-16 Thread Lars George
Hi, I have done this at a customer site to overcome the 0.90.x slow WAL performance. With one RS per DN we bottlenecked, with 5-7 RS per DN we were able to hit the target rate. Please note that we did this in lieu of the proper built-in options like WAL compression, multiple WAL, or n-way wri

Re: Timestamp as a key good practice?

2012-06-16 Thread Jean-Marc Spaggiari
Thanks all for your comments and suggestions. Regarding the hotspotting I will try to salt the key in the 2nd table and see the results. Yesterday I finished to install my 4 servers cluster with old machine. It's slow, but it's working. So I will do some testing. You are recommending to modify th

Re: HBase first steps: Design a table

2012-06-16 Thread Jean-Marc Spaggiari
Hi Doug, You're right. I missed it :( I received Lars' book yesterday, so I will read a lot more before my next question ;) JM 2012/6/13, Doug Meil : > > Just wanted to point out that is also discussed under the autoFlush entry > in this chapter.. > > http://hbase.apache.org/book.html#perf.writi

Re: Multiple HBase instances per machine

2012-06-16 Thread Em
Stack, I have no issues with HBase, the question is purely theoretical. > So, you intend doubling the datanode instances per machine too? Everything else would not make sense to me, or what do you think? Thanks for your feedback! Regards, Em Am 16.06.2012 07:12, schrieb Stack: > On Fri, Jun 15