Re: [PR] feat: replace events library with shdict [apisix]
lchpersonal commented on PR #12353: URL: https://github.com/apache/apisix/pull/12353#issuecomment-3187808944 > Breakout from: #12263 Fixes # Currently nacos discovery uses event library and during testing it was found out that when using lua-resty-events, sometimes not all workers get the events. And inconsistencies emerge. Moreover the idiomatic way to share data between workers is through a shared dict. Therefore this PR migrates nacos from older events mechanism to newer shared dict way. Now only priviliged agent will fetch data from nacos and write to shdict while all workers will read from it. > > ### Checklist > * [ ] I have explained the need for this PR and the problem it solves > * [ ] I have explained the changes or the new features added to this PR > * [ ] I have added tests corresponding to this change > * [ ] I have updated the documentation to reflect this change > * [ ] I have verified that this change is backward compatible (If not, please discuss on the [APISIX mailing list](https://github.com/apache/apisix/tree/master#community) first) How to understand: Now only priviliged agent will fetch data from nacos and write to shdict while all workers will read from it ? In the current code, does each worker periodically fetch registration information from Nacos? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat: replace events library with shdict [apisix]
Revolyssup commented on PR #12353: URL: https://github.com/apache/apisix/pull/12353#issuecomment-2999367932 @bzp2010 I ran a benchmark to test whether the `shdict:get` brings too much performance degradation. But based on below results I don't think this should be an issue. ## Configuration used config.yaml ```yaml apisix: node_listen: 1984 deployment: role: data_plane role_data_plane: config_provider: yaml nginx_config: worker_processes: 8 discovery: nacos: host: - "http://nacos:[email protected]:8848"; prefix: "/nacos/v1/" fetch_interval: 1 weight: 1 timeout: connect: 2000 send: 2000 read: 5000 ``` apisix.yaml ```yaml routes: - uri: /hello upstream: service_name: APISIX-NACOS discovery_type: nacos type: roundrobin #END ``` Here are the results. ```bash ❯ wrk -c 10 -t 5 -d 30s -R 5 http://localhost:1984/hello Running 30s test @ http://localhost:1984/hello 5 threads and 10 connections Thread calibration: mean lat.: 2999.418ms, rate sampling interval: 10395ms Thread calibration: mean lat.: 2885.546ms, rate sampling interval: 10125ms Thread calibration: mean lat.: 2857.330ms, rate sampling interval: 10059ms Thread calibration: mean lat.: 2967.000ms, rate sampling interval: 10231ms Thread calibration: mean lat.: 2953.790ms, rate sampling interval: 10256ms Thread Stats Avg Stdev Max +/- Stdev Latency11.12s 3.06s 16.47s57.49% Req/Sec 4.52k82.05 4.63k60.00% 689405 requests in 30.00s, 110.45MB read Requests/sec: 22980.47 Transfer/sec: 3.68MB ``` ```bash ❯ wrk -c 10 -t 5 -d 30s -R 5 http://localhost:1984/hello Running 30s test @ http://localhost:1984/hello 5 threads and 10 connections Thread calibration: mean lat.: 3060.053ms, rate sampling interval: 10584ms Thread calibration: mean lat.: 3084.401ms, rate sampling interval: 10772ms Thread calibration: mean lat.: 3254.041ms, rate sampling interval: 11198ms Thread calibration: mean lat.: 3093.539ms, rate sampling interval: 10985ms Thread calibration: mean lat.: 3062.424ms, rate sampling interval: 10772ms Thread Stats Avg Stdev Max +/- Stdev Latency11.89s 3.36s 17.74s56.68% Req/Sec 4.13k39.11 4.18k60.00% 623009 requests in 30.00s, 99.81MB read Requests/sec: 20767.18 Transfer/sec: 3.33MB ``` ## Result The QPS dropped from 22980.47 to 20767.18 with ~2k difference. I think this should be acceptable given the issue with events library this PR solves. According to me, it's an okay tradeoff -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat: replace events library with shdict [apisix]
bzp2010 commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r2163363557
##
apisix/discovery/nacos/init.lua:
##
@@ -371,40 +348,18 @@ function _M.nodes(service_name, discovery_args)
discovery_args.namespace_id or default_namespace_id
local group_name = discovery_args
and discovery_args.group_name or default_group_name
-
-local logged = false
--- maximum waiting time: 5 seconds
-local waiting_time = 5
-local step = 0.1
-while not applications and waiting_time > 0 do
-if not logged then
-log.warn('wait init')
-logged = true
-end
-ngx.sleep(step)
-waiting_time = waiting_time - step
-end
-
-if not applications or not applications[namespace_id]
-or not applications[namespace_id][group_name]
-then
+local key = get_key(namespace_id, group_name, service_name)
+local value = nacos_dict:get(key)
Review Comment:
When we encounter performance issues, consider adding caching with TTL at
the worker level to further improve locality.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: replace events library with shdict [apisix]
bzp2010 commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r2159883179
##
apisix/discovery/nacos/init.lua:
##
@@ -371,40 +348,18 @@ function _M.nodes(service_name, discovery_args)
discovery_args.namespace_id or default_namespace_id
local group_name = discovery_args
and discovery_args.group_name or default_group_name
-
-local logged = false
--- maximum waiting time: 5 seconds
-local waiting_time = 5
-local step = 0.1
-while not applications and waiting_time > 0 do
-if not logged then
-log.warn('wait init')
-logged = true
-end
-ngx.sleep(step)
-waiting_time = waiting_time - step
-end
-
-if not applications or not applications[namespace_id]
-or not applications[namespace_id][group_name]
-then
+local key = get_key(namespace_id, group_name, service_name)
+local value = nacos_dict:get(key)
Review Comment:
You're using `shdict:get` here to read a shared memory, and I'd like to
point out that it locks the shdict even when it reads the shared memory.
> ref:
https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L1593-L1609
Each request will use this nodes function, which is obviously a very "hot"
function, which will limit the concurrency that can be handled by waiting for
locks, is this appropriate?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: replace events library with shdict [apisix]
nic-6443 commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r2158602660
##
apisix/discovery/nacos/init.lua:
##
@@ -318,50 +310,35 @@ local function fetch_full_registry(premature)
goto CONTINUE
end
-if not up_apps[namespace_id] then
-up_apps[namespace_id] = {}
-end
-
-if not up_apps[namespace_id][group_name] then
-up_apps[namespace_id][group_name] = {}
-end
-
+local nodes = {}
+local key = get_key(namespace_id, group_name,
service_info.service_name)
+service_names[key] = true
for _, host in ipairs(data.hosts) do
-local nodes = up_apps[namespace_id]
-[group_name][service_info.service_name]
-if not nodes then
-nodes = {}
-up_apps[namespace_id]
-[group_name][service_info.service_name] = nodes
-end
-
local node = {
host = host.ip,
port = host.port,
weight = host.weight or default_weight,
}
-
-- docs:
https://github.com/yidongnan/grpc-spring-boot-starter/pull/496
if is_grpc(scheme) and host.metadata and host.metadata.gRPC_port
then
node.port = host.metadata.gRPC_port
end
core.table.insert(nodes, node)
end
-
+if #nodes > 0 then
+local content = core.json.encode(nodes)
+nacos_dict:set(key, content)
+end
::CONTINUE::
end
-local new_apps_md5sum = ngx.md5(core.json.encode(up_apps))
-local old_apps_md5sum = ngx.md5(core.json.encode(applications))
-if new_apps_md5sum == old_apps_md5sum then
-return
-end
-applications = up_apps
-local ok, err = events:post(events_list._source, events_list.updating,
-applications)
-if not ok then
-log.error("post_event failure with ", events_list._source,
- ", update application error: ", err)
+local old_curr_service_in_use = curr_service_in_use
+curr_service_in_use = service_names
+-- remove services that are not in use anymore
+for key, _ in pairs(old_curr_service_in_use) do
+if not service_names[key] then
+nacos_dict:delete(key)
+end
end
Review Comment:
```suggestion
-- remove services that are not in use anymore
for key, _ in pairs(curr_service_in_use) do
if not service_names[key] then
nacos_dict:delete(key)
end
end
curr_service_in_use = service_names
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: replace events library with shdict [apisix]
nic-6443 commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r2158493058
##
apisix/discovery/nacos/init.lua:
##
@@ -417,7 +363,20 @@ end
function _M.dump_data()
-return {config = local_conf.discovery.nacos, services = applications or {}}
+local keys = nacos_dict:get_keys(0)
+local applications = {}
+for _, key in ipairs(keys) do
+local value = nacos_dict:get(key)
+if value then
+local nodes = core.json.decode(value)
+if nodes and #nodes > 0 then
Review Comment:
`dump_data` is for debug purpose, if value in shared dict for this key is
empty , we should return the real empty value for debugging instead hide it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: replace events library with shdict [apisix]
nic-6443 commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r2158486595
##
apisix/discovery/nacos/init.lua:
##
@@ -417,7 +359,20 @@ end
function _M.dump_data()
-return {config = local_conf.discovery.nacos, services = applications or {}}
Review Comment:
make sense.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: replace events library with shdict [apisix]
Revolyssup commented on code in PR #12353: URL: https://github.com/apache/apisix/pull/12353#discussion_r2158318034 ## apisix/discovery/nacos/init.lua: ## @@ -34,28 +34,21 @@ local str_find = core.string.find local log= core.log local default_weight -local applications +local nacos_dict = ngx.shared.nacos --key: namespace_id.group_name.service_name Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat: replace events library with shdict [apisix]
membphis commented on code in PR #12353: URL: https://github.com/apache/apisix/pull/12353#discussion_r2157958128 ## apisix/discovery/nacos/init.lua: ## @@ -34,28 +34,21 @@ local str_find = core.string.find local log= core.log local default_weight -local applications +local nacos_dict = ngx.shared.nacos --key: namespace_id.group_name.service_name Review Comment: check if `nacos_dict` is nil -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat: replace events library with shdict [apisix]
Revolyssup commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r215746
##
t/discovery/nacos2.t:
##
@@ -148,18 +148,43 @@ routes:
type: roundrobin
#END
pipelined_requests eval
-[
-"GET /hello",
-"GET /hello",
-]
response_body_like eval
-[
-qr/server [1-2]/,
-qr/server [1-2]/,
-]
no_error_log
-[error, error]
+--- config
+location /t {
+content_by_lua_block {
+local http = require("resty.http")
+local uri = "http://127.0.0.1:"; .. ngx.var.server_port
+
+-- Wait for 2 seconds for APISIX initialization
Review Comment:
Test is modified to add this sleep because: Since retry mechanism is
removed, priviliged agent fills data in shdict might take some time so we want
to send the request after that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat: replace events library with shdict [apisix]
Revolyssup commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r2157384474
##
apisix/discovery/nacos/init.lua:
##
@@ -417,7 +359,20 @@ end
function _M.dump_data()
-return {config = local_conf.discovery.nacos, services = applications or {}}
Review Comment:
config is removed from here as it's a security risk.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
