Skip to content

memory leak under high GC pressure #31

@fffonion

Description

@fffonion

When using lua-resty-lock to create and dispose large amount of locks will cause
the internal memo table behave incorrectly.

We injected some debugging code in lock.lua and output the "leaked" objects in the memo table:

--- /usr/local/openresty/lualib/resty/lock.lua	2022-07-27 21:12:38.000000000 -0700
+++ resty_lock.lua	2022-07-05 04:22:57.000000000 -0700
@@ -25,6 +25,37 @@
 local memo = {}
 if debug then _M.memo = memo end

+local l = ngx.now()
+local nkeys = require "table.nkeys"
+local function dd()
+        if ngx.now() - l <= 1 then
+                return
+        end
+        l = ngx.now()
+        local total = nkeys(memo)
+        local used = 0
+        for _, k in pairs(memo) do
+                if type(k) ~= "number" then
+                        used = used+ 1
+                end
+        end
+
+        local unused = 0
+        local ii = FREE_LIST_REF
+        while unused < total do
+                local p = memo[ii]
+                if type(p) == "number" then
+                        unused = unused + 1
+                else
+                        break
+                end
+                ii = p
+        end
+
+
+        ngx.log(ngx.ERR, " --------- keys: ", total, " used: ", used, " unused: ", unused, " leak: ", total - used - unused)
+end
+

 local function ref_obj(key)
     if key == nil then
@@ -36,8 +67,17 @@

     else
         ref = #memo + 1
+
+        if ref-1 < nkeys(memo) then
+            ngx.log(ngx.ERR, "########### table tail ", ref-1, " is smaller than nkeys(memo) ", nkeys(memo), "??????")
+        end
     end
+    if type(key) == "number" then
+        ngx.log(ngx.ERR, "########### got a number ", key, " ", debug.traceback())
+    end
+
     memo[ref] = key
+    dd()

     -- print("ref key_id returned ", ref)
     return ref
@@ -219,3 +259,4 @@


 return _M
+

create a test script:

local lock = require "resty_lock"
local t = 0
while true do
  local locks = {}

  t = t + 1
  for i=1, 5000 do
    locks[i] = lock:new("locks")
    locks[i]:lock(t .. "a" .. i)
    locks[i]:unlock(t .. "a" .. i)
  end
  ngx.sleep(0.1)
end

and a file to inject shm directive:

$ cat lock_include.conf
lua_shared_dict locks                  8m;

Then run with resty --http-include lock_include.conf test.lua, you will observe leak: increases, sometimes very slow, sometime very fast; in one cases i'm able to observe 10k keys leaked per second:

$ resty --http-include lock_include.conf lock4.lua
2022/07/27 22:20:19 [error] 3067000#0: *2 [lua] resty_lock.lua:56: dd():  --------- keys: 22301 used: 10001 unused: 0 leak: 12300, context: ngx.timer
2022/07/27 22:20:20 [error] 3067000#0: *2 [lua] resty_lock.lua:56: dd():  --------- keys: 36093 used: 10001 unused: 2313 leak: 23779, context: ngx.timer
2022/07/27 22:20:21 [error] 3067000#0: *2 [lua] resty_lock.lua:56: dd():  --------- keys: 46141 used: 10001 unused: 0 leak: 36140, context: ngx.timer

(For convience I attached the codes used above in an archive lock_leak.tgz)

In production environment we also see this leak happens continously until it hits LuaJIT table limit and throws table overflow error.

We also investigated the un-released keys in memo and found all of them are the refs to the shdict object created at https://github.com/openresty/lua-resty-lock/blob/master/lib/resty/lock.lua#L90. It seems that when ref_obj is called (https://github.com/openresty/lua-resty-lock/blob/master/lib/resty/lock.lua#L33-L40) and memory allocation may happen on new key, and LuaJIT may jump into GC and thus calls unref_obj, then makes the link in the memo table broken.

We have came up with two approaches:

  1. to use the undocumented newproxy API to replace ref/unref (fix: replace ref/unref methods with newproxy Kong/lua-resty-lock#1), we observe 20% performance drop on the lock/unlock API calls
  2. to reduce the number of refs to shdict object (fix: de-duplicate shm references Kong/lua-resty-lock#2), this has no performance penalty, but may not fix the root cause of problem

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions