Hi guys, 1/ I checked quickly the GC logs and saw nothing. Since I need very fast lookup I set the zookeeper.session.timeout parameter to 10s to consider the RS as dead after very short pauses, and that did not occur.
2/ I did not check but I don't think I ran out of sockets since the ulimit has been set very high, but I'll check ! 3/ Benchmark can launch several R/W threads, but even the simplest program leads to my issue : Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, "test"); for (<1, 10, 100 or 1000>) getsList.add(new Get(<randomKey>) table.get(getsList) table.close() 4/ I will share more logs tomorrow to dig deeper, I personally need a long STW-pause :-) Cheers, On Thu, Aug 23, 2012 at 7:49 PM, N Keywal <nkey...@gmail.com> wrote: > Hi Adrien, > > As well, if you can share the client code (number of threads, regions, > is it a set of single get, or are they multi gets, this kind of > stuff). > > Cheers, > > N. > > > On Thu, Aug 23, 2012 at 7:40 PM, Jean-Daniel Cryans <jdcry...@apache.org> > wrote: >> Hi Adrien, >> >> I would love to see the region server side of the logs while those >> socket timeouts happen, also check the GC log, but one thing people >> often hit while doing pure random read workloads with tons of clients >> is running out of sockets because they are all stuck in CLOSE_WAIT. >> You can check that by using lsof. There are other discussion on this >> mailing list about it. >> >> J-D >> >> On Thu, Aug 23, 2012 at 10:24 AM, Adrien Mogenet >> <adrien.moge...@gmail.com> wrote: >>> Hi there, >>> >>> While I'm performing read-intensive benchmarks, I'm seeing storm of >>> "CallerDisconnectedException" in certain RegionServers. As the >>> documentation says, my client received a SocketTimeoutException >>> (60000ms etc...) at the same time. >>> It's always happening and I get very poor read-performances (from 10 >>> to 5000 reads/sc) in a 10 nodes cluster. >>> >>> My benchmark consists in several iterations launching 10, 100 and 1000 >>> Get requests on a given random rowkey with a single CF/qualifier. >>> I'm using HBase 0.94.1 (a few commits before the official stable >>> release) with Hadoop 1.0.3. >>> Bloom filters have been enabled (at the rowkey level). >>> >>> I do not find very clear informations about these exceptions. From the >>> reference guide : >>> (...) you should consider digging in a bit more if you aren't doing >>> something to trigger them. >>> >>> Well... could you help me digging? :-) -- AM