Re: maximize usage of cluster resources during ingestion

Dave Marion Thu, 06 Jul 2017 05:18:56 -0700

That's a good point. I would also look at increasing 
tserver.total.mutation.queue.max. Are you seeing hold times? If not, I would 
keep pushing harder until you do, then move to multiple tablet servers. Do you 
have any GC logs?



> On July 6, 2017 at 4:47 AM Cyrille Savelief <[email protected]> wrote:
> 
>     Are you sure Accumulo is not waiting for your app's data? There might be 
> GC pauses in your ingest code (we have already experienced that).
> 
>     Le jeu. 6 juil. 2017 à 10:32, Massimilian Mattetti <[email protected] 
> mailto:[email protected] > a écrit :
> 
>         > > Thank you all for the suggestions.
> > 
> >         About the native memory map I checked the logs on each tablet 
> > server and it was loaded correctly (of course the 
> > tserver.memory.maps.native.enabled was set to true), so the GC pauses 
> > should not be the problem eventually. I managed to get much better 
> > ingestion graph by reducing the native map size to 2GB and increasing the 
> > Batch Writer threads number from the default (3 was really bad for my 
> > configuration) to 10 (I think it does not make sense having more threads 
> > than tablet servers, am I right?).
> > 
> >         The configuration that I used for the table is:
> >         "table.file.replication": "2",
> >         "table.compaction.minor.logs.threshold": "3",
> >         "table.durability": "flush",
> >         "table.split.threshold": "1G"
> > 
> >         while for the tablet servers is:
> >         "tserver.wal.blocksize": "1G",
> >          "tserver.walog.max.size": "2G",
> >         "tserver.memory.maps.max": "2G",
> >         "tserver.compaction.minor.concurrent.max": "50",
> >         "tserver.compaction.major.concurrent.max": "20",
> >         "tserver.wal.replication": "2",
> >          "tserver.compaction.major.thread.files.open.max": "15"
> > 
> >         The new graph:
> > 
> > 
> >         I still have the problem of a CPU usage that is less than 20%. So I 
> > am thinking to run multiple tablet servers per node (like 5 or 10) in order 
> > to maximize the CPU usage. Besides that I do not have any other idea on how 
> > to stress those servers with ingestion.
> >         Any suggestions are very welcome. Meanwhile, thank you all again 
> > for your help.
> > 
> > 
> >         Best Regards,
> >         Massimiliano
> > 
> > 
> > 
> >         From:        Jonathan Wonders <[email protected] 
> > mailto:[email protected] >
> >         To:        [email protected] mailto:[email protected]
> >         Date:        06/07/2017 04:01
> >         Subject:        Re: maximize usage of cluster resources during 
> > ingestion
> > 
> >         ---------------------------------------------
> > 
> > 
> > 
> >         Hi Massimilian,
> > 
> >         Are you seeing held commits during the ingest pauses?  Just based 
> > on having looked at many similar graphs in the past, this might be one of 
> > the major culprits.  A tablet server has a memory region with a bounded 
> > size (tserver.memory.maps.max) where it buffers data that has not yet been 
> > written to RFiles (through the process of minor compaction).  The region is 
> > segmented by tablet and each tablet can have a buffer that is undergoing 
> > ingest as well as a buffer that is undergoing minor compaction.  A memory 
> > manager decides when to initiate minor compactions for the tablet buffers 
> > and the default implementation tries to keep the memory region 80-90% full 
> > while preferring to compact the largest tablet buffers.  Creating larger 
> > RFiles during minor compaction should lead to less major compactions.  
> > During a minor compaction, the tablet buffer still "consumes" memory within 
> > the in memory map and high ingest rates can lead to exhausing the remaining 
> > capacity.  The default memory manage uses an adaptive strategy to predict 
> > the expected memory usage and makes compaction decisions that should 
> > maintain some free memory.  Batch writers can be bursty and a bit 
> > unpredictable which could throw off these estimates.  Also, depending on 
> > the ingest profile, sometimes an in-memory tablet buffer will consume a 
> > large percentage of the total buffer.  This leads to long minor compactions 
> > when the buffer size is large which can allow ingest enough time to exhaust 
> > the buffer before that memory can be reclaimed.  When a tablet server has 
> > to block ingest, it can affect client ingest rates to other tablet servers 
> > due to the way that batch writers work.  This can lead to other tablet 
> > servers underestimating future ingest rates which can further exacerbate 
> > the problem.
> > 
> >         There are some configuration changes that could reduce the severity 
> > of held commits, although they might reduce peak ingest rates.  Reducing 
> > the in memory map size can reduce the maximum pause time due to held 
> > commits.  Adding additional tablets should help avoid the problem of a 
> > single tablet buffer consuming a large percentage of the memory region.  It 
> > might be better to aim for ~20 tablets per server if your problem allows 
> > for it.  It is also possible to replace the memory manager with a custom 
> > one.  I've tried this in the past and have seen stability improvements by 
> > making the memory thresholds less aggressive (50-75% full).  This did 
> > reduce peak ingest rate in some cases, but that was a reasonable tradeoff.
> > 
> >         Based on your current configuration, if a tablet server is serving 
> > 4 tablets and has a 32GB buffer, your first minor compactions will be at 
> > least 8GB and they will probably grow larger over time until the tablets 
> > naturally split.  Consider how long it would take to write this RFile 
> > compared to your peak ingest rate.  As others have suggested, make sure to 
> > use the native maps.  Based on your current JVM heap size, using the Java 
> > in-memory map would probably lead to OOME or very bad GC performance.
> > 
> >         Accumulo can trace minor compaction durations so you can get a feel 
> > for max pause times or measure the effect of configuration changes.
> > 
> >         Cheers,
> >         --Jonathan
> > 
> >         On Wed, Jul 5, 2017 at 7:16 PM, Dave Marion <[email protected] 
> > mailto:[email protected] > wrote:
> >          
> > 
> >         Based on what Cyrille said, I would look at garbage collection, 
> > specifically I would look at how much of your newly allocated objects spill 
> > into the old generation before they are flushed to disk. Additionally, I 
> > would turn off the debug log or log to SSD’s if you have them. Another 
> > thought, seeing that you have 256GB RAM / node, is to run multiple tablet 
> > servers per node. Do you have 10 threads on your Batch Writers? What about 
> > the Batch Writer latency, is it too low such that you are not filling the 
> > buffer?
> > 
> >          
> > 
> >         From: Massimilian Mattetti [mailto:[email protected] 
> > mailto:[email protected] ]
> >         Sent: Wednesday, July 05, 2017 8:37 AM
> >         To: [email protected] mailto:[email protected]
> >         Subject: maximize usage of cluster resources during ingestion
> > 
> >          
> > 
> >         Hi all,
> > 
> >         I have an Accumulo 1.8.1 cluster made by 12 bare metal servers. 
> > Each server has 256GB of Ram and 2 x 10 cores CPU. 2 machines are used as 
> > masters (running HDFS NameNodes, Accumulo Master and Monitor). The other 10 
> > machines has 12 Disks of 1 TB (11 used by HDFS DataNode process) and are 
> > running Accumulo TServer processes. All the machines are connected via a 
> > 10Gb network and 3 of them are running ZooKeeper. I have run some heavy 
> > ingestion test on this cluster but I have never been able to reach more 
> > than 20% CPU usage on each Tablet Server. I am running an ingestion process 
> > (using batch writers) on each data node. The table is pre-split in order to 
> > have 4 tablets per tablet server. Monitoring the network I have seen that 
> > data is received/sent from each node with a peak rate of about 120MB/s / 
> > 100MB/s while the aggregated disk write throughput on each tablet servers 
> > is around 120MB/s.
> > 
> >         The table configuration I am playing with are:
> >         "table.file.replication": "2",
> >         "table.compaction.minor.logs.threshold": "10",
> >         "table.durability": "flush",
> >         "table.file.max": "30",
> >         "table.compaction.major.ratio": "9",
> >         "table.split.threshold": "1G"
> > 
> >         while the tablet server configuration is:
> >         "tserver.wal.blocksize": "2G",
> >         "tserver.walog.max.size": "8G",
> >         "tserver.memory.maps.max": "32G",
> >         "tserver.compaction.minor.concurrent.max": "50",
> >         "tserver.compaction.major.concurrent.max": "8",
> >         "tserver.total.mutation.queue.max": "50M",
> >         "tserver.wal.replication": "2",
> >         "tserver.compaction.major.thread.files.open.max": "15"
> > 
> >         the tablet server heap has been set to 32GB
> > 
> >         From Monitor UI
> > 
> > 
> >         As you can see I have a lot of valleys in which the ingestion rate 
> > reaches 0.
> >         What would be a good procedure to identify the bottleneck which 
> > causes the 0 ingestion rate periods?
> >         Thanks.
> > 
> >         Best Regards,
> >         Max
> > 
> > 
> >     >

Re: maximize usage of cluster resources during ingestion

Reply via email to