Huge GC pauses can be mitigated by ensuring you're using the Accumulo native maps library.
On Wed, Jul 5, 2017 at 11:05 AM Cyrille Savelief <[email protected]> wrote: > Hi Massimilian*,* > > Using a MultiTableBatchWriter we are able to ingest about 600K entries/s > on a single node (30Gb of memory, 8 vCPU) running Hadoop, Zookeeper, > Accumulo and our ingest process. For us, "valleys" came from huge GC pauses. > > Best, > > Cyrille > > Le mer. 5 juil. 2017 à 14:37, Massimilian Mattetti <[email protected]> > a écrit : > >> Hi all, >> >> I have an Accumulo 1.8.1 cluster made by 12 bare metal servers. Each >> server has 256GB of Ram and 2 x 10 cores CPU. 2 machines are used as >> masters (running HDFS NameNodes, Accumulo Master and Monitor). The other 10 >> machines has 12 Disks of 1 TB (11 used by HDFS DataNode process) and are >> running Accumulo TServer processes. All the machines are connected via a >> 10Gb network and 3 of them are running ZooKeeper. I have run some heavy >> ingestion test on this cluster but I have never been able to reach more >> than *20% *CPU usage on each Tablet Server. I am running an ingestion >> process (using batch writers) on each data node. The table is pre-split in >> order to have 4 tablets per tablet server. Monitoring the network I have >> seen that data is received/sent from each node with a peak rate of about >> 120MB/s / 100MB/s while the aggregated disk write throughput on each tablet >> servers is around 120MB/s. >> >> The table configuration I am playing with are: >> "table.file.replication": "2", >> "table.compaction.minor.logs.threshold": "10", >> "table.durability": "flush", >> "table.file.max": "30", >> "table.compaction.major.ratio": "9", >> "table.split.threshold": "1G" >> >> while the tablet server configuration is: >> "tserver.wal.blocksize": "2G", >> "tserver.walog.max.size": "8G", >> "tserver.memory.maps.max": "32G", >> "tserver.compaction.minor.concurrent.max": "50", >> "tserver.compaction.major.concurrent.max": "8", >> "tserver.total.mutation.queue.max": "50M", >> "tserver.wal.replication": "2", >> "tserver.compaction.major.thread.files.open.max": "15" >> >> the tablet server heap has been set to 32GB >> >> From Monitor UI >> >> >> As you can see I have a lot of valleys in which the ingestion rate >> reaches 0. >> What would be a good procedure to identify the bottleneck which causes >> the 0 ingestion rate periods? >> Thanks. >> >> Best Regards, >> Max >> >>
