> > Alternatively, if the servers in question are under constant memory > pressure and receive a fair number of updates, they may be > prioritizing flushing of inserted rows at the expense of updates, > causing the tablets to retain a great number of WAL segments > (containing older updates) for durability's sake.
Just an FYI in case it helps confirm or rule it out, this refers to KUDU-3002 <https://issues.apache.org/jira/browse/KUDU-3002>, which will be fixed in the upcoming release. Can you determine whether your tablet servers are under memory pressure? On Mon, Mar 30, 2020 at 11:17 AM Adar Lieber-Dembo <a...@cloudera.com> wrote: > > - the number of open files in the Kudu process in the tablet servers has > increased to now more than 150,000 (as counted using 'lsof'); we raised the > limit of maximum number of open files twice already to avoid a crash, but > we (and our vendor) are concerned that something might not be right with > such a high number of open files. > > Using lsof, can you figure out which files are open? WAL segments? > Data files? Something else? Given the high WAL usage, I'm guessing > it's the former and these are actually one and the same problem, but > would be good to confirm nonetheless. > > > - in some of the tablet servers the disk space used by the WALs is > significantly (and concerningly) higher than in most of the other tablet > servers; we use a 1TB SSD drive (about 950GB usable) to store the WALs on > each tablet server, and this week was the second time where we saw a tablet > server almost fill the whole WAL disk. We had to stop and restart the > tablet server, so its tablets would be migrated to different TS's, and we > could manually clean up the WALs directory, but this is definitely not > something we would like to do in the future. We took a look inside the WAL > directory on that TS before wiping it, and we observed that there were a > few tablets whose WALs were in excess of 30GB. Another piece of information > is that the table that the largest of these tablets belong to, receives > about 15M transactions a day, of which about 25% are new inserts and the > rest are updates of existing rows. > > Sounds like there are at least several tablets with follower replicas > that have fallen behind their leaders and are trying to catch up. In > these situations, a leader will preserve as many WAL segments as > necessary in order to catch up the lagging follower replica, at least > until some threshold is reached (at which point the master will bring > a new replica online and the lagging replica will be evicted). These > calculations are done in terms of the number of WAL segments; in the > affected tablets, do you recall how many WAL segment files there were > before you deleted the directories? > > Alternatively, if the servers in question are under constant memory > pressure and receive a fair number of updates, they may be > prioritizing flushing of inserted rows at the expense of updates, > causing the tablets to retain a great number of WAL segments > (containing older updates) for durability's sake. If you recall the > affected tablet IDs, do your logs indicate the nature of the > background operations performed for those tablets? > > Some of these questions can also be answered via Kudu metrics.There's > the ops_behind_leader tablet-level metric, which can tell you how far > behind a replica may be. Unfortunately I can't find a metric for > average number of WAL segments retained (or a histogram); I thought we > had that, but maybe not. > -- Andrew Wong