Hi Gary, Please find my answers inline.
On Sun, Aug 12, 2018 at 01:17:03PM +0800, Gary Gao wrote: > I have a kudu cluster of 40 nodes, when I realized that > maintenance_manager_num_threads=1 is too small, I updated config file and > restarted a kudu tablet server, but it took too long to start, longer than > --follower_unavailable_considered_failed_sec=600, causing tablet > redistribution. > Even if the kudu server started, it also spent too much copying tablet, as > the following tablet block copying log: > > > Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table > 'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s) not > RUNNING > 41e4489d38924c85a4810bd33ef60d80 (bj-yz-hadoop01-1-12:7050): bad state > State: INITIALIZED > Data state: TABLET_DATA_COPYING > Last status: Tablet Copy: Downloading block 0000000084111077 > (299837/1177225) > 52a9ede038a04566860ecd2e54388738 (bj-yz-hadoop01-1-51:7050): RUNNING > b133f6fd0c274b93b21ffcbdcbbde830 (bj-yz-hadoop01-1-14:7050): RUNNING > [LEADER] > Which version are you using? The recent versions are using 3-4-3 replica replacement, meaning the tablet copy should be automatically canceled when the third replica comes online and the copy hasn't finished yet. > > My Question are: > > 1. It seems kudu server spent a long time to open log block container, how > to speed up restarting kudu server ? The startup time of the tablet servers mostly depends on the number of tablets hosted on the server. I'm not sure if there's any way to tune it, aside from reducing the number of tablets. How many tablets do you have per tablet server? > > 2. I think the number of blocks have an influence on kudu server restarting > time and query time on specific tablet, more number of blocks, more > restarting time and query time. Is this right ? I'm not sure how much the number of blocks influences the restart time, maybe someone else can shed some light on this one. I'd focus on the number of tablets though. The query latencies depend on how many blocks the server needs to read from, but it's a matter of how well the data is compacted (either by sequential writes instead of random writes, or whether the maintenance managers compacted them), rather than the number of total blocks. > > 3. Why there are more than 1 million blocks in a tablet, as shown in above > Tablet Copy log, while there are less than 500 thousands of records in the > tablet ? > Each rowset will have multiple blocks (one per column, UNDO and REDO deltas, and bloom filters). The number of rowsets depends on the number of rows. > 4. How to reduce the number of block in tablet ? The maintenance managers perform compactions that reduce the number of blocks per tablets. Other than this, less columns or less rows also results in less blocks of course. - Attila
signature.asc
Description: PGP signature