-
Notifications
You must be signed in to change notification settings - Fork 81
Description
When using lua-resty-lock to create and dispose large amount of locks will cause
the internal memo table behave incorrectly.
We injected some debugging code in lock.lua and output the "leaked" objects in the memo table:
--- /usr/local/openresty/lualib/resty/lock.lua 2022-07-27 21:12:38.000000000 -0700
+++ resty_lock.lua 2022-07-05 04:22:57.000000000 -0700
@@ -25,6 +25,37 @@
local memo = {}
if debug then _M.memo = memo end
+local l = ngx.now()
+local nkeys = require "table.nkeys"
+local function dd()
+ if ngx.now() - l <= 1 then
+ return
+ end
+ l = ngx.now()
+ local total = nkeys(memo)
+ local used = 0
+ for _, k in pairs(memo) do
+ if type(k) ~= "number" then
+ used = used+ 1
+ end
+ end
+
+ local unused = 0
+ local ii = FREE_LIST_REF
+ while unused < total do
+ local p = memo[ii]
+ if type(p) == "number" then
+ unused = unused + 1
+ else
+ break
+ end
+ ii = p
+ end
+
+
+ ngx.log(ngx.ERR, " --------- keys: ", total, " used: ", used, " unused: ", unused, " leak: ", total - used - unused)
+end
+
local function ref_obj(key)
if key == nil then
@@ -36,8 +67,17 @@
else
ref = #memo + 1
+
+ if ref-1 < nkeys(memo) then
+ ngx.log(ngx.ERR, "########### table tail ", ref-1, " is smaller than nkeys(memo) ", nkeys(memo), "??????")
+ end
end
+ if type(key) == "number" then
+ ngx.log(ngx.ERR, "########### got a number ", key, " ", debug.traceback())
+ end
+
memo[ref] = key
+ dd()
-- print("ref key_id returned ", ref)
return ref
@@ -219,3 +259,4 @@
return _M
+create a test script:
local lock = require "resty_lock"
local t = 0
while true do
local locks = {}
t = t + 1
for i=1, 5000 do
locks[i] = lock:new("locks")
locks[i]:lock(t .. "a" .. i)
locks[i]:unlock(t .. "a" .. i)
end
ngx.sleep(0.1)
endand a file to inject shm directive:
$ cat lock_include.conf
lua_shared_dict locks 8m;
Then run with resty --http-include lock_include.conf test.lua, you will observe leak: increases, sometimes very slow, sometime very fast; in one cases i'm able to observe 10k keys leaked per second:
$ resty --http-include lock_include.conf lock4.lua
2022/07/27 22:20:19 [error] 3067000#0: *2 [lua] resty_lock.lua:56: dd(): --------- keys: 22301 used: 10001 unused: 0 leak: 12300, context: ngx.timer
2022/07/27 22:20:20 [error] 3067000#0: *2 [lua] resty_lock.lua:56: dd(): --------- keys: 36093 used: 10001 unused: 2313 leak: 23779, context: ngx.timer
2022/07/27 22:20:21 [error] 3067000#0: *2 [lua] resty_lock.lua:56: dd(): --------- keys: 46141 used: 10001 unused: 0 leak: 36140, context: ngx.timer
(For convience I attached the codes used above in an archive lock_leak.tgz)
In production environment we also see this leak happens continously until it hits LuaJIT table limit and throws table overflow error.
We also investigated the un-released keys in memo and found all of them are the refs to the shdict object created at https://github.com/openresty/lua-resty-lock/blob/master/lib/resty/lock.lua#L90. It seems that when ref_obj is called (https://github.com/openresty/lua-resty-lock/blob/master/lib/resty/lock.lua#L33-L40) and memory allocation may happen on new key, and LuaJIT may jump into GC and thus calls unref_obj, then makes the link in the memo table broken.
We have came up with two approaches:
- to use the undocumented
newproxyAPI to replace ref/unref (fix: replace ref/unref methods with newproxy Kong/lua-resty-lock#1), we observe 20% performance drop on the lock/unlock API calls - to reduce the number of refs to shdict object (fix: de-duplicate shm references Kong/lua-resty-lock#2), this has no performance penalty, but may not fix the root cause of problem