[incubator-skywalking-website] branch master updated: Mesh performance test blog (#30)

wusheng Fri, 25 Jan 2019 07:10:48 -0800

This is an automated email from the ASF dual-hosted git repository.

wusheng pushed a commit to branch master
in repository 
https://gitbox.apache.org/repos/asf/incubator-skywalking-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 96d7dee  Mesh performance test blog (#30)
96d7dee is described below

commit 96d7dee1cdb19d0bae5fb1319f415a9e50b057d7
Author: Gao Hongtao <hanahm...@gmail.com>
AuthorDate: Fri Jan 25 23:10:18 2019 +0800

    Mesh performance test blog (#30)
    
    * add mesh performance test blog
    
    * fix some issues
---
 .../blog/2019-01-25-mesh-loadtest/image1.png       | Bin 0 -> 39044 bytes
 .../blog/2019-01-25-mesh-loadtest/image2.png       | Bin 0 -> 6095 bytes
 .../blog/2019-01-25-mesh-loadtest/image3.png       | Bin 0 -> 317634 bytes
 .../blog/2019-01-25-mesh-loadtest/image4.png       | Bin 0 -> 308347 bytes
 .../blog/2019-01-25-mesh-loadtest/image5.png       | Bin 0 -> 248460 bytes
 .../blog/2019-01-25-mesh-loadtest/image6.png       | Bin 0 -> 51443 bytes
 docs/blog/2019-01-25-mesh-loadtest.md              |  80 +++++++++++++++++++++
 docs/blog/README.md                                |   5 ++
 8 files changed, 85 insertions(+)

diff --git 
a/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image1.png 
b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image1.png
new file mode 100755
index 0000000..2b11cd4
Binary files /dev/null and 
b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image1.png differ
diff --git 
a/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image2.png 
b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image2.png
new file mode 100755
index 0000000..f666067
Binary files /dev/null and 
b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image2.png differ
diff --git 
a/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image3.png 
b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image3.png
new file mode 100755
index 0000000..dcdde59
Binary files /dev/null and 
b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image3.png differ
diff --git 
a/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image4.png 
b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image4.png
new file mode 100755
index 0000000..721bc6e
Binary files /dev/null and 
b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image4.png differ
diff --git 
a/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image5.png 
b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image5.png
new file mode 100755
index 0000000..99d0786
Binary files /dev/null and 
b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image5.png differ
diff --git 
a/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image6.png 
b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image6.png
new file mode 100755
index 0000000..24a3ea2
Binary files /dev/null and 
b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image6.png differ
diff --git a/docs/blog/2019-01-25-mesh-loadtest.md 
b/docs/blog/2019-01-25-mesh-loadtest.md
new file mode 100644
index 0000000..cb840b2
--- /dev/null
+++ b/docs/blog/2019-01-25-mesh-loadtest.md
@@ -0,0 +1,80 @@
+# SkyWalking performance in Service Mesh scenario
+
+- Auther: Hongtao Gao, Apache SkyWalking & ShardingShpere PMC
+- [GitHub](https://github.com/hanahmily), 
[Twitter](https://twitter.com/hanahmily), 
[Linkedin](https://www.linkedin.com/in/gao-hongtao-47b835168/)
+
+Jan. 25th, 2019
+
+Service mesh receiver was first introduced in Apache SkyWalking 6.0.0-beta. It 
is designed to provide a common entrance for receiving telemetry data from 
service mesh framework, for instance, Istio, Linkerd, Envoy etc. What’s the 
service mesh? According to Istio’s explain:
+
+The term service mesh is used to describe the network of microservices that 
make up such applications and the interactions between them.
+
+As a PMC member of Apache SkyWalking, I tested trace receiver and well 
understood the performance of collectors in trace scenario. I also would like 
to figure out the performance of service mesh receiver.
+
+## Different between trace and service mesh
+
+Following chart presents a typical trace map:
+
+![](../.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image5.png)
+
+You could find a variety of elements in it just like web service, local 
method, database, cache, MQ and so on. But service mesh only collect service 
network telemetry data that contains the entrance and exit data of a service 
for now(more elements will be imported soon, just like Database). A smaller 
quantity of data is sent to the service mesh receiver than the trace.
+
+![](../.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image1.png)
+
+But using sidecar is a little different.The client requesting “A” that will 
send a segment to service mesh receiver from “A”’s sidecar. If “A” depends on 
“B”,  another segment will be sent from “A”’s sidecar. But for a trace system, 
only one segment is received by the collector. The sidecar model splits one 
segment into small segments, that will increase service mesh receiver network 
overhead.
+
+## Deployment Architecture
+
+In this test, I will pick two different backend deployment. One is called mini 
unit, consist of one collector and one elasticsearch instance. Another is a 
standard production cluster, contains three collectors and three elasticsearch 
instances.
+
+Mini unit is a suitable architecture for dev or test environment. It saves 
your time and VM resources, speeds up depolyment process.
+
+The standard cluster provides good performance and HA for a production 
scenario. Though you will pay more money and take care of the cluster 
carefully, the reliability of the cluster will be a good reward to you.
+
+I pick 8 CPU and 16GB VM to set up the test environment. This test targets the 
performance of normal usage scenarios, so that choice is reasonable. The 
cluster is built on Google Kubernetes Engine(GKE), and every node links each 
other with a VPC network. For running collector is a CPU intensive task, the 
resource request of collector deployment should be 8 CPU, which means every 
collector instance occupy a VM node. 
+
+## Testing Process
+
+Receiving mesh fragments per second(MPS) depends on the following variables.
+
+ 1. Ingress query per second(QPS)
+ 1. The topology of a microservice cluster
+ 1. Service mesh mode(proxy or sidecar)
+
+In this test, I use Bookinfo app as a demo cluster.
+
+![](../.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image6.png)
+
+So every request will touch max 4 nodes. Plus picking the sidecar mode(every 
request will send two telemetry data),  the MPS will be QPS * 4 *2. 
+
+There are also some important metrics that should be explained
+
+ * Client Query Latency: GraphQL API query response time heatmap.
+ * Client Mesh Sender: Send mesh segments per second. The total line 
represents total send amount and the error line is the total number of failed 
send.
+ * Mesh telemetry latency: service mesh receiver handling data heatmap.
+ * Mesh telemetry received: received mesh telemetry data per second.
+
+### Mini Unit
+
+![](../.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image3.png)
+
+You could find collector can process up to **25k** data per second. The CPU 
usage is about 4 cores. Most of the query latency is less than 50ms. After 
login the VM on which collector instance running, I know that system load is 
reaching the limit(max is 8).
+
+![](../.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image2.png)
+
+According to the previous formula, a single collector instance could process 
**3k** QPS of Bookinfo traffic.
+
+### Standard Cluster
+
+![](../.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image4.png)
+
+Compare to the mini-unit, cluster’s throughput increases linearly. Three 
instances provide total 80k per second processing power. Query latency 
increases slightly, but it’s also very small(less than 500ms). I also checked 
every collector instance system load that all reached the limit. 10k QPS of 
BookInfo telemetry data could be processed by the cluster.
+
+## Conclusion
+
+Let’s wrap them up. There are some important things you could get from this 
test.
+ * QPS varies by the there variables. The test results in this blog are not 
important. The user should pick property value according to his system.
+ * Collector cluster’s processing power could scale out.
+ * The collector is CPU intensive application. So you should provide 
sufficient CPU resource to it.
+
+This blog gives people a common method to evaluate the throughput of Service 
Mesh Receiver. Users could use this to design their Apache Skywalking backend 
deployment architecture.
diff --git a/docs/blog/README.md b/docs/blog/README.md
index 960dae9..d2d6f9e 100755
--- a/docs/blog/README.md
+++ b/docs/blog/README.md
@@ -3,6 +3,11 @@ layout: LayoutBlog
 
 blog:
 
+- title: SkyWalking performance in Service Mesh scenario
+  name: 2019-01-25-mesh-loadtest
+  time: Hongtao Gao 25th, 2019
+  short: Service mesh receiver performance test on Google Kubernetes Engine.
+
 - title: Understand distributed trace easier in the incoming 6-GA
   name: 2019-01-01-Understand-Trace
   time: Sheng Wu. Jan. 1st, 2019

[incubator-skywalking-website] branch master updated: Mesh performance test blog (#30)

Reply via email to