Build failed in Jenkins: HBase-0.92 #294

2012-02-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/HBase-0.92/294/changes Changes: [stack] HBASE-5209 HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup -- FIX BUILD ON 0.92; ADDENDUM -- [...truncated

LIRS cache as an alternative to LRU cache

2012-02-21 Thread yuzhihong
Hi, Shall we experiment with low inter-reference recency set replacement policy to see if block cache becomes more effective ? Cheers

Handling EOFexception while splitlog

2012-02-21 Thread Ramkrishna.S.Vasudevan
Hi Devs We ran into one issue while splitting HLogs due to EOFException. (0.90.6 version). Due to some reason the DNs were not able to connect to NN (network fluctuation) and the master was splitting the logs. While parsing the hlog, we get the length and we expect it might be 0. (No

Re: Handling EOFexception while splitlog

2012-02-21 Thread yuzhihong
Can you provide stack trace for this issue ? Thanks On Feb 21, 2012, at 8:52 AM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: Hi Devs We ran into one issue while splitting HLogs due to EOFException. (0.90.6 version). Due to some reason the DNs were not able

RE: Handling EOFexception while splitlog

2012-02-21 Thread Ramkrishna.S.Vasudevan
Please find the logs in HRegionServer. Did you mean this Ted? 2012-02-18 00:44:38,808 INFO org.apache.hadoop.hbase.util.FSUtils: Finished lease recover attempt for hdfs://158-1-130-13:9000/hbase/.logs/linux1,20020,1329487401169/linux1%3A200 20.1329492399793 2012-02-18 00:44:38,808 WARN

Re: LIRS cache as an alternative to LRU cache

2012-02-21 Thread Nicolas Spiegelberg
We had the author of LIRS come to Facebook last year to talk about his algorithm and general benefits. At the time, we were looking at increasing block cache efficiency. The general consensus was that it wasn't an exponential perf gain, so we could get bigger wins from cache-on-write

RE: LIRS cache as an alternative to LRU cache

2012-02-21 Thread Vladimir Rodionov
afaik, existing LruBlockCache is not exactly LRU cache It utilizes more advanced algorithm to avoid cache trashing during scan ops by dividing cache into three sub-caches (for newly added blocks, for promoted blocks and for in-memory blocks) Best regards, Vladimir Rodionov Principal Platform

Re: LIRS cache as an alternative to LRU cache

2012-02-21 Thread Li Pi
I thought about this over the summer when I was working on 4027. Pretty much same idea as Nicholas here. I figured LIRs might be troublesome to implement - and also thought that newer features, such as 4027 or the reference counting patch, was a better use of time. The larger the cache gets, the

Re: LIRS cache as an alternative to LRU cache

2012-02-21 Thread Nicolas Spiegelberg
Vlad, You're correct. The existing algorithm is called an Adaptive Replacement Cache. However, note that this cache does need some proper tuning for optimal efficiency. Nicolas On 2/21/12 12:09 PM, Vladimir Rodionov vrodio...@carrieriq.com wrote: afaik, existing LruBlockCache is not exactly

Re: LIRS cache as an alternative to LRU cache

2012-02-21 Thread Jean-Daniel Cryans
If it was ARC (which uses both LRU and LFU) we'd have patenting issues with IBM, what we have is closer to a 2Q: http://www.vldb.org/conf/1994/P439.PDF J-D On Tue, Feb 21, 2012 at 9:19 AM, Nicolas Spiegelberg nspiegelb...@fb.com wrote: Vlad, You're correct.  The existing algorithm is called

Re: LIRS cache as an alternative to LRU cache

2012-02-21 Thread Dhruba Borthakur
I think we should make the BlockCache pluggable for HBase. A simple reflection-based enhancement to CacheConfig.instantiateBlockCache should do the trick, is it not? If people think that this is valuable, I can submit a patch. This will enable people to play with their own versions of the

Re: LIRS cache as an alternative to LRU cache

2012-02-21 Thread Nicolas Spiegelberg
In general, I agree about making isolated algorithms pluggable. In this particular case, I think that Ted was trying to gather consensus on ways to increase cache efficiency with LIRS being the strawman. I think it's good to bring up these discussions because it's really easy to add 10k lines to

Re: HBase wire compatibility

2012-02-21 Thread Enis Söztutar
On Fri, Feb 17, 2012 at 5:42 PM, Gregory Chanan gcha...@cloudera.comwrote: Enis, You mean until Tuesday (the 21st), right? Ops, my bad. Will be there today. Enis Greg On Fri, Feb 17, 2012 at 5:30 PM, Enis Söztutar enis@gmail.com wrote: While working on keeping bw compatibility

Re: Scan performance on a big table as combination of multiple logic tables

2012-02-21 Thread Jean-Daniel Cryans
On Sun, Feb 19, 2012 at 1:45 PM, Mikael Sitruk mikael.sit...@gmail.com wrote: During compaction the region is not out of service. According to documentation the max region size for V2 format is 20G And now the question: Assuming that 20G is the limit and the number of regions in a single RS

Re: Detecting HBase cluster idle

2012-02-21 Thread Jean-Daniel Cryans
(this is a user question so sending to the relevant mailing list and putting dev@ in BCC) Yeah it could be one way. Be aware that some janitorial processes in the master do a full .META. scan every few minutes you might see very quick bursts on one machine. J-D On Sat, Feb 18, 2012 at 4:57 PM,

Re: Scan performance on a big table as combination of multiple logic tables

2012-02-21 Thread Mikael Sitruk
This is interesting J.D. so, is there a limitation on the region size or not? Can it be really any number? If so beside the collection time is there any impact (perhaps the documentation should be updated too)? Regarding the number of regions you have (14,398) is it for a single RS? What is your

Re: Scan performance on a big table as combination of multiple logic tables

2012-02-21 Thread Jean-Daniel Cryans
On Tue, Feb 21, 2012 at 1:17 PM, Mikael Sitruk mikael.sit...@gmail.com wrote: This is interesting J.D. so, is there a limitation on the region size or not? Your imagination? Like I said nothing blocks you in the code. Can it be really any number? That's what it implies. If so beside the

Re: Scan performance on a big table as combination of multiple logic tables

2012-02-21 Thread Mikael Sitruk
See inline On Feb 21, 2012 11:40 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Tue, Feb 21, 2012 at 1:17 PM, Mikael Sitruk mikael.sit...@gmail.com wrote: This is interesting J.D. so, is there a limitation on the region size or not? Your imagination? Like I said nothing blocks you

Re: Scan performance on a big table as combination of multiple logic tables

2012-02-21 Thread Jean-Daniel Cryans
On Tue, Feb 21, 2012 at 1:57 PM, Mikael Sitruk mikael.sit...@gmail.com wrote: If so beside the collection time is there any impact (perhaps the documentation should be updated too)? Collection time? You mean GC? Sorry I don't get what you mean. *Sorry, typo mistake (from mobile) I meant

Re: Scan performance on a big table as combination of multiple logic tables

2012-02-21 Thread Mikael Sitruk
Ok, so this is approx 150 regions per RS What are the maths between the memory (index size) and number of regions? (Btw at the beginning when I mentionned 500 regions it was per RS.) I'm trying to figure out what should be my cluster configuration, regarding region, region size, memory size, and

Meeting notes

2012-02-21 Thread Todd Lipcon
As advertised on the list over the last few weeks, several of us met today to discuss wire compatibility improvements: Here's the notes: http://wiki.apache.org/hadoop/HBaseWireCompatibility20120221 I'll file JIRAs for the subtasks discussed later today. -Todd -- Todd Lipcon Software Engineer,

Re: Scan performance on a big table as combination of multiple logic tables

2012-02-21 Thread Jean-Daniel Cryans
This describes how they are written, with your knowledge of your data size and key average size you can do the math: http://hbase.apache.org/book.html#d0e9542 J-D On Tue, Feb 21, 2012 at 2:30 PM, Mikael Sitruk mikael.sit...@gmail.com wrote: Ok, so this is approx 150 regions per RS What are

Build failed in Jenkins: HBase-0.92 #296

2012-02-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/HBase-0.92/296/changes Changes: [tedyu] HBASE-5209 Addendum adds znode creation call (David Wang) -- [...truncated 6668 lines...] Running org.apache.hadoop.hbase.regionserver.TestMemStore Tests run: 21, Failures: 0,

Re: Meeting notes

2012-02-21 Thread Stack
On Tue, Feb 21, 2012 at 3:01 PM, Todd Lipcon t...@cloudera.com wrote: As advertised on the list over the last few weeks, several of us met today to discuss wire compatibility improvements: Here's the notes: http://wiki.apache.org/hadoop/HBaseWireCompatibility20120221 I'll file JIRAs for the

Re: Scan performance on a big table as combination of multiple logic tables

2012-02-21 Thread Stack
On Tue, Feb 21, 2012 at 1:17 PM, Mikael Sitruk mikael.sit...@gmail.com wrote: This is interesting J.D. so, is there a limitation on the region size or not? Can it be really any number? If so beside the collection time is there any impact (perhaps the documentation should be updated too)? Yes.

Re: Scan performance on a big table as combination of multiple logic tables

2012-02-21 Thread M. C. Srivas
On Tue, Feb 21, 2012 at 12:08 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Sun, Feb 19, 2012 at 1:45 PM, Mikael Sitruk mikael.sit...@gmail.com wrote: During compaction the region is not out of service. According to documentation the max region size for V2 format is 20G And now the

Re: Scan performance on a big table as combination of multiple logic tables

2012-02-21 Thread Jean-Daniel Cryans
In the documentation 20GB is given as an example of a larger size that can be supported, but nothing blocks you from going way higher than that. I've done some import tests and had 100GB regions. It just takes a while to compact the bigger files. With no impact on Java GC going nuts?  FB

Re: Scan performance on a big table as combination of multiple logic tables

2012-02-21 Thread Stack
On Tue, Feb 21, 2012 at 5:44 PM, M. C. Srivas mcsri...@gmail.com wrote: With no impact on Java GC going nuts?  FB reported (a few months ago) it was bad to run a region-server with -Xmx larger than 15G or 16G. Unless its no longer true, wouldn't that be limiting factor for how large one

Build failed in Jenkins: HBase-0.92 #297

2012-02-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/HBase-0.92/297/ -- [...truncated 6706 lines...] Running org.apache.hadoop.hbase.rest.TestStatusResource Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.055 sec Forking command line: /bin/sh -c cd

Re: Scan performance on a big table as combination of multiple logic tables

2012-02-21 Thread M. C. Srivas
On Tue, Feb 21, 2012 at 6:16 PM, Stack st...@duboce.net wrote: On Tue, Feb 21, 2012 at 5:44 PM, M. C. Srivas mcsri...@gmail.com wrote: With no impact on Java GC going nuts? FB reported (a few months ago) it was bad to run a region-server with -Xmx larger than 15G or 16G. Unless its no

Re: Scan performance on a big table as combination of multiple logic tables

2012-02-21 Thread Stack
On Tue, Feb 21, 2012 at 9:29 PM, M. C. Srivas mcsri...@gmail.com wrote: Yes,  that was my thinking ---  to do a major compaction  the region-server would have to load all the flushed files for that region, merge them, and then write out the new region. If the region-file was 20g in size, the

Build failed in Jenkins: HBase-0.92-security #93

2012-02-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/HBase-0.92-security/93/changes Changes: [tedyu] HBASE-5209 Addendum adds znode creation call (David Wang) -- [...truncated 6731 lines...] Running org.apache.hadoop.hbase.rest.TestScannersWithFilters Tests run: 10,

Build failed in Jenkins: HBase-0.92 #298

2012-02-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/HBase-0.92/298/ -- [...truncated 6703 lines...] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.156 sec Forking command line: /bin/sh -c cd https://builds.apache.org/job/HBase-0.92/ws/trunk

Re: Meeting notes

2012-02-21 Thread Todd Lipcon
I've just filed a bunch of JIRAs as subtasks of HBASE-5305 based on the things we discussed at the meeting: https://issues.apache.org/jira/browse/HBASE-5305#summary I didn't fill in any target version yet... before I did so, I had the idea that we may want to do this work on a feature branch.

Re: Meeting notes

2012-02-21 Thread Stack
On Tue, Feb 21, 2012 at 10:56 PM, Todd Lipcon t...@cloudera.com wrote: I didn't fill in any target version yet... before I did so, I had the idea that we may want to do this work on a feature branch. What do you folks think? It would allow us to commit partial work without leaving us in a spot