*Hi:*

 Here is my understanding, hope it useful to you:

1. Batch sending is recommend for data collecting, such as website activity
tracking or log aggregation, which requires high throughput and could
tolerate some latency. On the contrary, simple sending is good for message
delivery in latency-sensitive distributed systems, such as trading or
financial senarios.

2.If REJECTREQUEST happens, it usually means the broker is carrying a heavy
load. It is suggested to expand the cluster. Just as you say, add new
Master into the cluster.


dongeforever
Best Regards

2018-06-15 10:17 GMT+08:00 Von Gosling <[email protected]>:

> Hi,
>
> Batch Message is designed for the streaming case in which you could get a
> high throughput.
>
> Best Regards,
>
>
> 在 2018年6月15日,09:39,Jason Joo <[email protected]> 写道:
>
> Hello,
>
> We have a cluster with 2m-2s architecture. TPS is between 1000 to 6000 and
> about 200 million messages in 24 hours typically.
> The commit log size is about 200GB and can be cached in memory fully.
>
> Our server(one rocketmq each, 4 servers in this cluster):
>
> 6 core Xeon-E5-2620 x 2 (24 thread total)
> 256G ddr3 1600MHz
> 240G Intel SSD 530(SSDSC2BW240H6) x 6 (Raid 1)
>
>
> To "IMPROVE" producing performance(mainly reduce network packages) we
> start to use batch sending feature, and we begain to receive so much "slow
> log" after that(batch is 50~70, 100~1000B per message body).
>
> After some tuning(osPageCacheBusyTimeOutMills, waitTimeMillsInSendQueue
> to 500ms, etc) we only increasing the "time cost" printed in slow log.
>
> The pagecache rt in log before using batch sending:
>
> *[PAGECACHERT] TotalPut 50650, PutMessageDistributeTime [<=0ms]:44250
> [0~10ms]:5865 [10~50ms]:514 [50~100ms]:10 [100~200ms]:11 [200~500ms]:0
> [500ms~1s]:0 [1~2s]:0 [2~3s]:0 [3~4s]:0 [4~5s]:0 [5~10s]:0 [10s~]:0 *
>
>
> after:
>
> *[PAGECACHERT] TotalPut 1970, PutMessageDistributeTime [<=0ms]:1428
> [0~10ms]:459 [10~50ms]:4 [50~100ms]:2 [100~200ms]:5 [200~500ms]:10
> [500ms~1s]:10 [1~2s]:25 [2~3s]:24 [3~4s]:3 [4~5s]:0 [5~10s]:0 [10s~]:0*
>
>
>
>
> After reading CommitLog i found the whole batch message is surrounded in
> one lock together. So it make sense the average executing time of PUT would
> increase. but we alse will receive some "in lock slow log" which will
> trigger flow control for a while:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *2018-06-14 21:02:42 WARN SendMessageThread_102 - [NOTIFYME]putMessages in
> lock cost time(ms)=1373, bodyLength=49380
> AppendMessageResult=AppendMessageResult{status=PUT_OK,
> wroteOffset=68238809866237, wroteBytes=54420, 
> msgId='0A0A63CF00002A9F00003E1016059FFD,0A0A63CF00002A9F00003E101605A37D,0A0A63CF00002A9F00003E101605A6DE,0A0A63CF00002A9F00003E101605A8C3,0A0A63CF00002A9F00003E101605ABA3,0A0A63CF00002A9F00003E101605AE81,0A0A63CF00002A9F00003E101605B066,0A0A63CF00002A9F00003E101605B24B,0A0A63CF00002A9F00003E101605B4CC,0A0A63CF00002A9F00003E101605B7F5,0A0A63CF00002A9F00003E101605BB78,0A0A63CF00002A9F00003E101605BE70,0A0A63CF00002A9F00003E101605C19F,0A0A63CF00002A9F00003E101605C4BF,0A0A63CF00002A9F00003E101605C78B,0A0A63CF00002A9F00003E101605CAE4,0A0A63CF00002A9F00003E101605CDC4,0A0A63CF00002A9F00003E101605D206,0A0A63CF00002A9F00003E101605D581,0A0A63CF00002A9F00003E101605D83B,0A0A63CF00002A9F00003E101605DB5B,0A0A63CF00002A9F00003E101605DED0,0A0A63CF00002A9F00003E101605E1AA,0A0A63CF00002A9F00003E101605E54D,0A0A63CF00002A9F00003E101605E8C7,0A0A63CF00002A9F00003E101605EB92,0A0A63CF00002A9F00003E101605EF0C,0A0A63CF00002A9F00003E101605F0D8,0A0A63CF00002A9F00003E101605F346,0A0A63CF00002A9F00003E101605F5FB,0A0A63CF00002A9F00003E101605F8D6,0A0A63CF00002A9F00003E101605FC51,0A0A63CF00002A9F00003E101605FFD1,0A0A63CF00002A9F00003E101606019D,0A0A63CF00002A9F00003E1016060509,0A0A63CF00002A9F00003E1016060816,0A0A63CF00002A9F00003E1016060B7A,0A0A63CF00002A9F00003E1016060E61,0A0A63CF00002A9F00003E10160611DF,0A0A63CF00002A9F00003E1016061558,0A0A63CF00002A9F00003E10160618A6,0A0A63CF00002A9F00003E1016061B68,0A0A63CF00002A9F00003E1016061E6B,0A0A63CF00002A9F00003E10160621E5,0A0A63CF00002A9F00003E10160624A2,0A0A63CF00002A9F00003E101606281D,0A0A63CF00002A9F00003E1016062AE3,0A0A63CF00002A9F00003E1016062E60,0A0A63CF00002A9F00003E101606313E,0A0A63CF00002A9F00003E101606343A,0A0A63CF00002A9F00003E10160637B5,0A0A63CF00002A9F00003E1016063AE3,0A0A63CF00002A9F00003E1016063E67,0A0A63CF00002A9F00003E1016064218,0A0A63CF00002A9F00003E10160644E0,0A0A63CF00002A9F00003E101606487A,0A0A63CF00002A9F00003E1016064BF4,0A0A63CF00002A9F00003E1016064E61,0A0A63CF00002A9F00003E101606512E,0A0A63CF00002A9F00003E1016065489,0A0A63CF00002A9F00003E101606574F,0A0A63CF00002A9F00003E1016065AE3,0A0A63CF00002A9F00003E1016065CC8,0A0A63CF00002A9F00003E1016065FED,0A0A63CF00002A9F00003E10160662E2,0A0A63CF00002A9F00003E10160664C7,0A0A63CF00002A9F00003E10160667F6,0A0A63CF00002A9F00003E1016066B81,0A0A63CF00002A9F00003E1016066E54,0A0A63CF00002A9F00003E1016067132',
> storeTimestamp=1528981361218, logicsOffset=5784993480, pagecacheRT=1373,
> msgNum=70}2018-06-14 21:02:42 WARN SendMessageThread_102 - not in lock
> eclipse time(ms)=1374, bodyLength=493802018-06-14 21:02:42 WARN
> SendMessageThread_79 - not in lock eclipse time(ms)=688,
> bodyLength=2742018-06-14 21:02:42 WARN SendMessageThread_84 - not in lock
> eclipse time(ms)=562, bodyLength=482422018-06-14 21:02:42 WARN
> SendMessageThread_107 - not in lock eclipse time(ms)=839,
> bodyLength=2742018-06-14 21:02:42 WARN SendMessageThread_97 - not in lock
> eclipse time(ms)=1199, bodyLength=494692018-06-14 21:02:42 WARN
> SendMessageThread_111 - not in lock eclipse
> time(ms)=1307, bodyLength=473292018-06-14 21:02:42 WARN
> SendMessageThread_91 - not in lock eclipse time(ms)=786,
> bodyLength=2752018-06-14 21:02:42 WARN SendMessageThread_67 - not in lock
> eclipse time(ms)=807, bodyLength=5752018-06-14 21:02:42 WARN
> SendMessageThread_65 - not in lock eclipse time(ms)=1326,
> bodyLength=472942018-06-14 21:02:43 INFO FlushRealTimeService - Flush data
> to disk costs 2271 ms2018-06-14 21:02:43 WARN SendMessageThread_116 - not
> in lock eclipse time(ms)=1542, bodyLength=500392018-06-14 21:02:43 WARN
> SendMessageThread_119 - not in lock eclipse
> time(ms)=1202, bodyLength=500242018-06-14 21:02:43 WARN
> SendMessageThread_55 - not in lock eclipse time(ms)=1436, bodyLength=907*
>
>
> So the question is:
>
>    - Is batch sending recommended?
>    - It maybe "slower" compared to simple sending, should i increase the
>    in-lock-timeout?
>    - how we can solve the performance problem(many REJECTREQUEST will
>    generated in producers), add new Master into the cluster?
>
>
>
> My question
>
> best regards,
>
> Jason
>
>
>

Reply via email to