Re: Optimize libprocess performance

Benjamin Mahler Wed, 04 Jan 2017 17:27:12 -0800

Which areas does the performance not meet your needs? There are a lot of
aspects to libprocess that can be optimized, so it would be good to focus
on each of your particular use cases via benchmarks, this allows us to have
a shared way to profile and measure improvements.

Copy elimination is one area where a lot of improvement can be made across
libprocess, note that libprocess was implemented before we had C++11 move
support available. We've recently made some improvements to update the HTTP
serving path towards zero-copies but it's not completely done. Can you
submit patches for the ProcessBase::send() path copy elimination? We can
have a move overload for ProcessBase::send and have ProtobufProcess::send()
and encode() perform moves instead of a copy.

With respect to the MessageEncoder, since it's less trivial, you can submit
a benchmark that captures the use case you care about and we can drive
improvements using it. I have some suggestions here as well but we can
discuss once we have the benchmarks committed.

How does that sound to start?

On Tue, Jan 3, 2017 at 7:31 PM, pangbingqiang <pangbingqi...@huawei.com>
wrote:

> Hi All:
>
>   We use libprocess as our underlying communication library, but we find
> it’s performance don’t meet, we want to optimize it, for example:
>
> *  ‘send’ function *implementation one metadata has four times memory
> copy,
>
> *1. ProtobufMessage SerializeToString then processbase ‘encode’ construct
> string once;*
>
> *2. In ‘encode’ function Message body copy again;*
>
> *3. In MessageEncoder in order to construct HTTP Request, copy again;*
>
> *4.       **MessageEncoder return copy again;*
>
>   How to optimize this scenario may be useful.
>
>   Also , in libprocess it has so many lock:
>
> *1.       **SocketManager:   std::recursive_mutex mutex;*
>
> *2.       **ProcessManager:  std::recursive_mutex processes_mutex;*  
> *std::recursive_mutex
> runq_mutex; std::recursive_mutex firewall_mutex;*
>
> In particular, everytime event enqueue/dequeue both need to get lock,
> maybe use lookfree struct is better.
>
>
>
> If have any optimize suggestion or discussion, please let me know, thanks.
>
>
>
> [image: cid:image001.png@01D0E8C5.8D08F440]
>
>
>
> Bingqiang Pang(庞兵强)
>
>
>
> Distributed and Parallel Software Lab
>
> Huawei Technologies Co., Ltd.
>
> Email:pangbingqi...@huawei.com <sut...@huawei.com>
>
>
>
>
>

Re: Optimize libprocess performance

Reply via email to