Re: [PR] feat: replace events library with shdict [apisix]

2025-08-14 Thread via GitHub


lchpersonal commented on PR #12353:
URL: https://github.com/apache/apisix/pull/12353#issuecomment-3187808944

   > Breakout from: #12263 Fixes # Currently nacos discovery uses event library 
and during testing it was found out that when using lua-resty-events, sometimes 
not all workers get the events. And inconsistencies emerge. Moreover the 
idiomatic way to share data between workers is through a shared dict. Therefore 
this PR migrates nacos from older events mechanism to newer shared dict way. 
Now only priviliged agent will fetch data from nacos and write to shdict while 
all workers will read from it.
   > 
   > ### Checklist
   > * [ ]  I have explained the need for this PR and the problem it solves
   > * [ ]  I have explained the changes or the new features added to this PR
   > * [ ]  I have added tests corresponding to this change
   > * [ ]  I have updated the documentation to reflect this change
   > * [ ]  I have verified that this change is backward compatible (If not, 
please discuss on the [APISIX mailing 
list](https://github.com/apache/apisix/tree/master#community) first)
   
   
   
   How to understand:   Now only priviliged agent will fetch data from nacos 
and write to shdict while all workers will read from it ?
   
   In the current code, does each worker periodically fetch registration 
information from Nacos?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: replace events library with shdict [apisix]

2025-06-25 Thread via GitHub


Revolyssup commented on PR #12353:
URL: https://github.com/apache/apisix/pull/12353#issuecomment-2999367932

   @bzp2010 I ran a benchmark to test whether the `shdict:get` brings too much 
performance degradation. But based on below results I don't think this should 
be an issue.
   
   ## Configuration used
   config.yaml
   ```yaml
   apisix:
 node_listen: 1984
   deployment:
 role: data_plane
 role_data_plane:
   config_provider: yaml
   nginx_config:
 worker_processes: 8
   discovery:
 nacos:
 host:
   - "http://nacos:[email protected]:8848";
 prefix: "/nacos/v1/"
 fetch_interval: 1
 weight: 1
 timeout:
   connect: 2000
   send: 2000
   read: 5000
   ```
   
   apisix.yaml
   ```yaml
   routes:
 -
   uri: /hello
   upstream:
 service_name: APISIX-NACOS
 discovery_type: nacos
 type: roundrobin
   #END
   ```
   Here are the results.
   ```bash
   ❯ wrk -c 10 -t 5 -d 30s -R 5 http://localhost:1984/hello
   Running 30s test @ http://localhost:1984/hello
 5 threads and 10 connections
 Thread calibration: mean lat.: 2999.418ms, rate sampling interval: 10395ms
 Thread calibration: mean lat.: 2885.546ms, rate sampling interval: 10125ms
 Thread calibration: mean lat.: 2857.330ms, rate sampling interval: 10059ms
 Thread calibration: mean lat.: 2967.000ms, rate sampling interval: 10231ms
 Thread calibration: mean lat.: 2953.790ms, rate sampling interval: 10256ms
 Thread Stats   Avg  Stdev Max   +/- Stdev
   Latency11.12s 3.06s   16.47s57.49%
   Req/Sec 4.52k82.05 4.63k60.00%
 689405 requests in 30.00s, 110.45MB read
   Requests/sec:  22980.47
   Transfer/sec:  3.68MB
   
   ```
   
   
   ```bash
   ❯ wrk -c 10 -t 5 -d 30s -R 5 http://localhost:1984/hello
   Running 30s test @ http://localhost:1984/hello
 5 threads and 10 connections
 Thread calibration: mean lat.: 3060.053ms, rate sampling interval: 10584ms
 Thread calibration: mean lat.: 3084.401ms, rate sampling interval: 10772ms
 Thread calibration: mean lat.: 3254.041ms, rate sampling interval: 11198ms
 Thread calibration: mean lat.: 3093.539ms, rate sampling interval: 10985ms
 Thread calibration: mean lat.: 3062.424ms, rate sampling interval: 10772ms
 Thread Stats   Avg  Stdev Max   +/- Stdev
   Latency11.89s 3.36s   17.74s56.68%
   Req/Sec 4.13k39.11 4.18k60.00%
 623009 requests in 30.00s, 99.81MB read
   Requests/sec:  20767.18
   Transfer/sec:  3.33MB
   ```
   
   ## Result
   The QPS dropped from 22980.47 to 20767.18 with ~2k difference. I think this 
should be acceptable given the issue with events library this PR solves. 
According to me, it's an okay tradeoff


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: replace events library with shdict [apisix]

2025-06-24 Thread via GitHub


bzp2010 commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r2163363557


##
apisix/discovery/nacos/init.lua:
##
@@ -371,40 +348,18 @@ function _M.nodes(service_name, discovery_args)
 discovery_args.namespace_id or default_namespace_id
 local group_name = discovery_args
 and discovery_args.group_name or default_group_name
-
-local logged = false
--- maximum waiting time: 5 seconds
-local waiting_time = 5
-local step = 0.1
-while not applications and waiting_time > 0 do
-if not logged then
-log.warn('wait init')
-logged = true
-end
-ngx.sleep(step)
-waiting_time = waiting_time - step
-end
-
-if not applications or not applications[namespace_id]
-or not applications[namespace_id][group_name]
-then
+local key = get_key(namespace_id, group_name, service_name)
+local value = nacos_dict:get(key)

Review Comment:
   When we encounter performance issues, consider adding caching with TTL at 
the worker level to further improve locality.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: replace events library with shdict [apisix]

2025-06-20 Thread via GitHub


bzp2010 commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r2159883179


##
apisix/discovery/nacos/init.lua:
##
@@ -371,40 +348,18 @@ function _M.nodes(service_name, discovery_args)
 discovery_args.namespace_id or default_namespace_id
 local group_name = discovery_args
 and discovery_args.group_name or default_group_name
-
-local logged = false
--- maximum waiting time: 5 seconds
-local waiting_time = 5
-local step = 0.1
-while not applications and waiting_time > 0 do
-if not logged then
-log.warn('wait init')
-logged = true
-end
-ngx.sleep(step)
-waiting_time = waiting_time - step
-end
-
-if not applications or not applications[namespace_id]
-or not applications[namespace_id][group_name]
-then
+local key = get_key(namespace_id, group_name, service_name)
+local value = nacos_dict:get(key)

Review Comment:
   You're using `shdict:get` here to read a shared memory, and I'd like to 
point out that it locks the shdict even when it reads the shared memory.
   
   > ref: 
https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L1593-L1609
   
   Each request will use this nodes function, which is obviously a very "hot" 
function, which will limit the concurrency that can be handled by waiting for 
locks, is this appropriate?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: replace events library with shdict [apisix]

2025-06-20 Thread via GitHub


nic-6443 commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r2158602660


##
apisix/discovery/nacos/init.lua:
##
@@ -318,50 +310,35 @@ local function fetch_full_registry(premature)
 goto CONTINUE
 end
 
-if not up_apps[namespace_id] then
-up_apps[namespace_id] = {}
-end
-
-if not up_apps[namespace_id][group_name] then
-up_apps[namespace_id][group_name] = {}
-end
-
+local nodes = {}
+local key = get_key(namespace_id, group_name, 
service_info.service_name)
+service_names[key] = true
 for _, host in ipairs(data.hosts) do
-local nodes = up_apps[namespace_id]
-[group_name][service_info.service_name]
-if not nodes then
-nodes = {}
-up_apps[namespace_id]
-[group_name][service_info.service_name] = nodes
-end
-
 local node = {
 host = host.ip,
 port = host.port,
 weight = host.weight or default_weight,
 }
-
 -- docs: 
https://github.com/yidongnan/grpc-spring-boot-starter/pull/496
 if is_grpc(scheme) and host.metadata and host.metadata.gRPC_port 
then
 node.port = host.metadata.gRPC_port
 end
 
 core.table.insert(nodes, node)
 end
-
+if #nodes > 0 then
+local content = core.json.encode(nodes)
+nacos_dict:set(key, content)
+end
 ::CONTINUE::
 end
-local new_apps_md5sum = ngx.md5(core.json.encode(up_apps))
-local old_apps_md5sum = ngx.md5(core.json.encode(applications))
-if new_apps_md5sum == old_apps_md5sum then
-return
-end
-applications = up_apps
-local ok, err = events:post(events_list._source, events_list.updating,
-applications)
-if not ok then
-log.error("post_event failure with ", events_list._source,
-  ", update application error: ", err)
+local old_curr_service_in_use = curr_service_in_use
+curr_service_in_use = service_names
+-- remove services that are not in use anymore
+for key, _ in pairs(old_curr_service_in_use) do
+if not service_names[key] then
+nacos_dict:delete(key)
+end
 end

Review Comment:
   ```suggestion
   -- remove services that are not in use anymore
   for key, _ in pairs(curr_service_in_use) do
   if not service_names[key] then
   nacos_dict:delete(key)
   end
   end
   curr_service_in_use = service_names 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: replace events library with shdict [apisix]

2025-06-20 Thread via GitHub


nic-6443 commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r2158493058


##
apisix/discovery/nacos/init.lua:
##
@@ -417,7 +363,20 @@ end
 
 
 function _M.dump_data()
-return {config = local_conf.discovery.nacos, services = applications or {}}
+local keys = nacos_dict:get_keys(0)
+local applications = {}
+for _, key in ipairs(keys) do
+local value = nacos_dict:get(key)
+if value then
+local nodes = core.json.decode(value)
+if nodes and #nodes > 0 then

Review Comment:
   `dump_data` is for debug purpose, if value in shared dict for this key is 
empty , we should return the real empty value for debugging instead hide it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: replace events library with shdict [apisix]

2025-06-20 Thread via GitHub


nic-6443 commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r2158486595


##
apisix/discovery/nacos/init.lua:
##
@@ -417,7 +359,20 @@ end
 
 
 function _M.dump_data()
-return {config = local_conf.discovery.nacos, services = applications or {}}

Review Comment:
   make sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: replace events library with shdict [apisix]

2025-06-20 Thread via GitHub


Revolyssup commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r2158318034


##
apisix/discovery/nacos/init.lua:
##
@@ -34,28 +34,21 @@ local str_find   = core.string.find
 local log= core.log
 
 local default_weight
-local applications
+local nacos_dict = ngx.shared.nacos --key: namespace_id.group_name.service_name

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: replace events library with shdict [apisix]

2025-06-19 Thread via GitHub


membphis commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r2157958128


##
apisix/discovery/nacos/init.lua:
##
@@ -34,28 +34,21 @@ local str_find   = core.string.find
 local log= core.log
 
 local default_weight
-local applications
+local nacos_dict = ngx.shared.nacos --key: namespace_id.group_name.service_name

Review Comment:
   check if `nacos_dict` is nil



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: replace events library with shdict [apisix]

2025-06-19 Thread via GitHub


Revolyssup commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r215746


##
t/discovery/nacos2.t:
##
@@ -148,18 +148,43 @@ routes:
   type: roundrobin
 
 #END
 pipelined_requests eval
-[
-"GET /hello",
-"GET /hello",
-]
 response_body_like eval
-[
-qr/server [1-2]/,
-qr/server [1-2]/,
-]
 no_error_log
-[error, error]
+--- config
+location /t {
+content_by_lua_block {
+local http = require("resty.http")
+local uri = "http://127.0.0.1:"; .. ngx.var.server_port
+
+-- Wait for 2 seconds for APISIX initialization

Review Comment:
   Test is modified to add this sleep because: Since retry mechanism is 
removed, priviliged agent fills data in shdict might take some time so we want 
to send the request after that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat: replace events library with shdict [apisix]

2025-06-19 Thread via GitHub


Revolyssup commented on code in PR #12353:
URL: https://github.com/apache/apisix/pull/12353#discussion_r2157384474


##
apisix/discovery/nacos/init.lua:
##
@@ -417,7 +359,20 @@ end
 
 
 function _M.dump_data()
-return {config = local_conf.discovery.nacos, services = applications or {}}

Review Comment:
   config is removed from here as it's a security risk.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]