On Mon, Nov 28, 2016 at 10:25 AM, Timothy Brown <t...@siftscience.com> wrote:
> Responses inlined. > > ... > > > > > > What is the difference when you compare servers? More requests? More > i/o? > > Thread dump the metadata server and let us see a link in here? (What you > > attached below is cut-off... just as it is getting to the good part). > > > > > > There are more requests to the server containing meta. The network in > bytes are greater for the meta regionserver than the others but the network > out bytes are less. > > Here's a dropbox link to the output https://dl.dropboxusercontent.com/u/ > 54494127/thread_dump.txt. I apologize for the cliffhanger. > > The in bytes are < the out bytes on the hbase:meta server? Or compared to other servers? Queries are usually smaller than response and in hbase:meta case, I'd think that we'd be mostly querying/reading with out much bigger than in. Anything else running on this machine besides Master? If you turn on RPC-level TRACE logging for a minute or so, anything about the client addresses that seems interesting? Looking at the thread dump (thanks), you have 1k handlers running? Thread 1037 (B.defaultRpcServer.handler=999,queue=99,port=60020): They are all idle in this thread dump (Same for the readers). I've found that having handlers == # of cpus seems to do the best when mostly a random read workload.... If lots of writes, good to have a few extras in case one gets occupied but 1k is a little OTT. Any particular reason for this many handlers? Would suggest trying way less. Might help w/ CPU. 1k is a lot. GCG1? (See HBASE-17072 CPU usage starts to climb up to 90-100% when using G1GC; purge ThreadLocal usage) > > > > > > Here's some more info about our cluster: > > > HBase version 1.2 > > > > > > > Which 1.2? > > > > 1.2.0 which is bundled with CDH 5.8.0 > > > > > > > > Number of regions: 72 > > > Number of tables: 97 > > > > > > > On whole cluster? (Can't have more tables than regions...) > > > > > > An error on my part, I meant to put 72 region servers. > > > > > > > Approx. requests per second to meta region server: 3k > > > > That is not much. If all cached should be able to do way more than that. > > > > Can you see who is hitting he meta region most? (Enable rpc-level TRACE > > logging on the server hosting meta for a minute or so and see where the > > requests are coming in from). > > > > What is your cache hit rate? Can you get it higher? > > > > Cache hit rate is above 99%. We see very little disk reads. > > > > Is there much writing going on against meta? Or is cluster stable regards > > region movement/creation? > > > > Writing is very infrequent. The cluster is stable with regards to region > movement and creation. > > > > > > > > Approx. requests per second to entire HBase cluster: 90k > > > > > > Additional info: > > > > > > > > > From Storefile Metrics: > > > Stores Num: 1 > > > Storefiles: 1 > > > Storefile Size: 30m > > > Uncompressed Storefile Size: 30m > Super small. St.Ack > > > Index Size: 459k > > > > > > > > This from meta table? That is very small. > > > > Yes this is from the meta table. > > > > > > > > > > I/O for the region server with only meta on it: > > > 48M bytes in > > > > > > > > > Whats all the writing about? > > > > I'm not sure. According to the AWS dashboard there are no disk writes at > that time. > > > > > > > > 5.9B bytes out > > > > > > > > This is disk or network? If network, is that 5.9 bytes? > > > > This is network and thats 5.9 billion byes. (I'm using the AWS dashboard > for this) > > > > Thanks Tim, > > S > > > > > > > > > I used the debug dump on the region server's UI but it was too large > > > for paste bin so here's a portion of it: http://pastebin.com/nkYhEceE > > > > > > > > > Thanks for the help, > > > > > > Tim > > > > > >