Hi Sathya and Nick,

Here are the stack traces of the region server dumps when the huge .tmp
files are created:

https://drive.google.com/open?id=0B1tQg4D17jKQNDdFZkFQTlg4ZjQ&authuser=0
<https://drive.google.com/open?id=0B1tQg4D17jKQNDdFZkFQTlg4ZjQ&authuser=0>  

As background we are not using compression. Compaction is occurs every hour.
Everything else is default.  

OpenTSDB v2.0 is running on top of Cloudera 5.3.1 in AWS. We have a 7 node
Cloudera cluster(each node with 32GB ram and 3TB disk space), with 5
OpenTSDB instances dedicated for writing and 2 for reading. We are using AWS
ELB’s in front of OpenTSDB to balance the read/writes. 

We are load testing OpenTSDB using SOCKETS, but running into several issues.
Let me explain first how we do this load testing:

1.From another AWS system, we have written a testing framework to generate
load. 

2. The framework takes several parameters, we can specify the number of
threads, the loop size (i.e. the number of sockets that each thread will
open) and the batch size (i.e. the number of PUT’s, or inserts, that each
socket connection will handle).  

3. To simplify troubleshooting, we removed variables from the tests, we have
just 1 OpenTSDB instance behind the AWS ELB so the load is being sent to 1
instance only.

4. We are initially creating the openTSDB tables without any pre-splitting
of regions. 

5. We are doing the loading with 1 metric only for ease of querying in the
UI.

6. We are sending under 5000 inserts per second:

7. At the top of the hour, the row compaction kicks in and the region server
is too busy so we lose data. it recovers the first time. But the 2nd hour,
there is so much data presumably, that it doesn’t recover. To fix it, we
have to restart cloudera, reboot the nodes, drop the tsdb tables and
re-create them. Otherwise the .tmp file keeps growing until it fills the 3TB
disks and the system is unresponsive. 

8. We see problems with region splits happening under heavy load. We noted a
code fix committed on Jan 11 for this but I presume that is not in RC2.1. 

Thanks








--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/HBase-with-opentsdb-creates-huge-tmp-file-runs-out-of-hdfs-space-tp4067577p4068627.html
Sent from the HBase User mailing list archive at Nabble.com.

Reply via email to