Skip to content

systemctl reload frr failing with more recent systemd #20430

@panlinux

Description

@panlinux

Description

UPDATE: originally I thought this was related to a change between 10.5.0 and 10.5.1, but now even with 10.5.0 I get the same behavior on current debian sid and ubuntu resolute installs. I won't be able to edit all the logs below, so please just keep this in mind.

With 10.5.0, debian and ubuntu installs, reload used to work:

ubuntu@sid:~$ sudo systemctl status frr
● frr.service - FRRouting
     Loaded: loaded (/usr/lib/systemd/system/frr.service; enabled; preset: enabled)
    Drop-In: /run/systemd/system/service.d
             └─zzz-lxc-service.conf
     Active: active (running) since Sat 2026-01-10 20:49:17 UTC; 1min 33s ago
 Invocation: 13e3f39acf0d4417a83a0e12a1982a6b
       Docs: https://frrouting.readthedocs.io/en/latest/setup.html
    Process: 933 ExecStart=/usr/lib/frr/frrinit.sh start (code=exited, status=0/SUCCESS)
    Process: 1020 ExecReload=/usr/lib/frr/frrinit.sh reload (code=exited, status=0/SUCCESS)
   Main PID: 1038 (watchfrr)
     Status: "FRR Operational"
      Tasks: 8 (limit: 17968)
     Memory: 24.5M (peak: 55.7M)
        CPU: 444ms
     CGroup: /system.slice/frr.service
             ├─ 952 /usr/lib/frr/mgmtd -d -F traditional -A 127.0.0.1
             ├─ 954 /usr/lib/frr/zebra -d -F traditional -A 127.0.0.1 -s 90000000
             ├─ 959 /usr/lib/frr/staticd -d -F traditional -A 127.0.0.1
             └─1038 /usr/lib/frr/watchfrr -d mgmtd zebra staticd

Jan 10 20:50:40 sid watchfrr[942]: [NG1AJ-FP2TQ] Terminating on signal
Jan 10 20:50:40 sid frrinit.sh[1020]: Stopped watchfrr.
Jan 10 20:50:40 sid frrinit.sh[1020]: Starting watchfrr with command: '  /usr/lib/frr/watchfrr  -d   mgmtd zebra staticd'.
Jan 10 20:50:40 sid watchfrr[1038]: [T83RR-8SM5G] watchfrr 10.5.0 starting: vty@0
Jan 10 20:50:40 sid watchfrr[1038]: [QDG3Y-BY5TN] mgmtd state -> up : connect succeeded
Jan 10 20:50:40 sid watchfrr[1038]: [QDG3Y-BY5TN] zebra state -> up : connect succeeded
Jan 10 20:50:40 sid watchfrr[1038]: [QDG3Y-BY5TN] staticd state -> up : connect succeeded
Jan 10 20:50:40 sid watchfrr[1038]: [KWE5Q-QNGFC] all daemons up, doing startup-complete notify
Jan 10 20:50:40 sid frrinit.sh[1020]: Started watchfrr.
Jan 10 20:50:40 sid systemd[1]: Reloaded frr.service - FRRouting.
ubuntu@sid:~$ 

Logs show:

Jan 10 20:50:40 sid systemd[1]: Reloading frr.service - FRRouting...
Jan 10 20:50:40 sid watchfrr[942]: [NG1AJ-FP2TQ] Terminating on signal
Jan 10 20:50:40 sid frrinit.sh[1020]: Stopped watchfrr.
Jan 10 20:50:40 sid frrinit.sh[1020]: Starting watchfrr with command: '  /usr/lib/frr/watchfrr  -d   mgmtd zebra staticd'.
Jan 10 20:50:40 sid watchfrr[1038]: [T83RR-8SM5G] watchfrr 10.5.0 starting: vty@0
Jan 10 20:50:40 sid watchfrr[1038]: [QDG3Y-BY5TN] mgmtd state -> up : connect succeeded
Jan 10 20:50:40 sid watchfrr[1038]: [QDG3Y-BY5TN] zebra state -> up : connect succeeded
Jan 10 20:50:40 sid watchfrr[1038]: [QDG3Y-BY5TN] staticd state -> up : connect succeeded
Jan 10 20:50:40 sid watchfrr[1038]: [KWE5Q-QNGFC] all daemons up, doing startup-complete notify
Jan 10 20:50:40 sid frrinit.sh[1020]: Started watchfrr.
Jan 10 20:50:40 sid systemd[1]: Reloaded frr.service - FRRouting.

With 10.5.1, however:

root@sid:~# systemctl reload frr
Job for frr.service canceled.

root@sid:~# systemctl status frr
● frr.service - FRRouting
     Loaded: loaded (/usr/lib/systemd/system/frr.service; enabled; preset: enabled)
    Drop-In: /run/systemd/system/service.d
             └─zzz-lxc-service.conf
     Active: deactivating (stop-sigterm) since Sat 2026-01-10 20:55:55 UTC; 38s ago
 Invocation: 8395288194124ec8b5d803c0085efbea
       Docs: https://frrouting.readthedocs.io/en/latest/setup.html
    Process: 912 ExecStart=/usr/lib/frr/frrinit.sh start (code=exited, status=0/SUCCESS)
    Process: 1076 ExecReload=/usr/lib/frr/frrinit.sh reload (code=exited, status=0/SUCCESS)
   Main PID: 1094 (watchfrr)
     Status: "FRR Operational"
      Tasks: 8 (limit: 17968)
     Memory: 24.4M (peak: 55.2M)
        CPU: 434ms
     CGroup: /system.slice/frr.service
             ├─ 931 /usr/lib/frr/mgmtd -d -F traditional -A 127.0.0.1
             ├─ 933 /usr/lib/frr/zebra -d -F traditional -A 127.0.0.1 -s 90000000
             ├─ 938 /usr/lib/frr/staticd -d -F traditional -A 127.0.0.1
             └─1094 /usr/lib/frr/watchfrr -d mgmtd zebra staticd

Jan 10 20:55:55 sid systemd[1]: Reloading frr.service - FRRouting...
Jan 10 20:55:55 sid watchfrr[921]: [NG1AJ-FP2TQ] Terminating on signal
Jan 10 20:55:55 sid frrinit.sh[1076]: Stopped watchfrr.
Jan 10 20:55:55 sid frrinit.sh[1076]: Starting watchfrr with command: '  /usr/lib/frr/watchfrr  -d   mgmtd zebra staticd'.
Jan 10 20:55:55 sid watchfrr[1094]: [T83RR-8SM5G] watchfrr 10.5.1 starting: vty@0
Jan 10 20:55:55 sid watchfrr[1094]: [QDG3Y-BY5TN] mgmtd state -> up : connect succeeded
Jan 10 20:55:55 sid watchfrr[1094]: [QDG3Y-BY5TN] zebra state -> up : connect succeeded
Jan 10 20:55:55 sid watchfrr[1094]: [QDG3Y-BY5TN] staticd state -> up : connect succeeded
Jan 10 20:55:55 sid watchfrr[1094]: [KWE5Q-QNGFC] all daemons up, doing startup-complete notify
Jan 10 20:55:55 sid frrinit.sh[1076]: Started watchfrr.

Notice deactivating state above.

And logs show that systemd gave up waiting on something after a while:

Jan 10 20:55:55 sid systemd[1]: Reloading frr.service - FRRouting...
Jan 10 20:55:55 sid watchfrr[921]: [NG1AJ-FP2TQ] Terminating on signal
Jan 10 20:55:55 sid frrinit.sh[1076]: Stopped watchfrr.
Jan 10 20:55:55 sid frrinit.sh[1076]: Starting watchfrr with command: '  /usr/lib/frr/watchfrr  -d   mgmtd zebra staticd'.
Jan 10 20:55:55 sid watchfrr[1094]: [T83RR-8SM5G] watchfrr 10.5.1 starting: vty@0
Jan 10 20:55:55 sid watchfrr[1094]: [QDG3Y-BY5TN] mgmtd state -> up : connect succeeded
Jan 10 20:55:55 sid watchfrr[1094]: [QDG3Y-BY5TN] zebra state -> up : connect succeeded
Jan 10 20:55:55 sid watchfrr[1094]: [QDG3Y-BY5TN] staticd state -> up : connect succeeded
Jan 10 20:55:55 sid watchfrr[1094]: [KWE5Q-QNGFC] all daemons up, doing startup-complete notify
Jan 10 20:55:55 sid frrinit.sh[1076]: Started watchfrr.

Jan 10 20:57:55 sid systemd[1]: frr.service: State 'stop-sigterm' timed out. Killing.
Jan 10 20:57:55 sid systemd[1]: frr.service: Killing process 1094 (watchfrr) with signal SIGKILL.
Jan 10 20:57:55 sid systemd[1]: frr.service: Killing process 931 (mgmtd) with signal SIGKILL.
Jan 10 20:57:55 sid systemd[1]: frr.service: Killing process 933 (zebra) with signal SIGKILL.
Jan 10 20:57:55 sid systemd[1]: frr.service: Killing process 938 (staticd) with signal SIGKILL.
Jan 10 20:57:55 sid systemd[1]: frr.service: Main process exited, code=killed, status=9/KILL
Jan 10 20:57:55 sid systemd[1]: frr.service: Failed with result 'timeout'.
Jan 10 20:57:55 sid systemd[1]: frr.service: Triggering OnFailure= dependencies.
Jan 10 20:57:55 sid systemd[1]: frr.service: Failed to enqueue OnFailure=heartbeat-failed@frr.service job, ignoring: Unit heartbeat-failed@frr.service not found.
Jan 10 20:58:00 sid systemd[1]: frr.service: Scheduled restart job, restart counter is at 1.
Jan 10 20:58:00 sid systemd[1]: Starting frr.service - FRRouting...
Jan 10 20:58:00 sid frrinit.sh[1164]: Starting watchfrr with command: '  /usr/lib/frr/watchfrr  -d   mgmtd zebra staticd'.
Jan 10 20:58:00 sid watchfrr[1173]: [T83RR-8SM5G] watchfrr 10.5.1 starting: vty@0
Jan 10 20:58:00 sid watchfrr[1173]: [ZCJ3S-SPH5S] mgmtd state -> down : initial connection attempt failed
Jan 10 20:58:00 sid watchfrr[1173]: [ZCJ3S-SPH5S] zebra state -> down : initial connection attempt failed
Jan 10 20:58:00 sid watchfrr[1173]: [ZCJ3S-SPH5S] staticd state -> down : initial connection attempt failed
Jan 10 20:58:00 sid watchfrr[1173]: [YFT0P-5Q5YX] Forked background command [pid 1174]: /usr/lib/frr/watchfrr.sh restart all
Jan 10 20:58:01 sid frrinit.sh[1187]: 2026/01/10 20:58:01 ZEBRA: [WVJCK-PPMGD][EC 4043309093] generic-netlink-cmd (NS 0) error: Operation not permitted, type=(35), seq=2, pid=1187
Jan 10 20:58:01 sid frrinit.sh[1187]: 2026/01/10 20:58:01 ZEBRA: [KGY44-D47GD][EC 4043309111] Disabling MPLS support (no kernel support)
Jan 10 20:58:01 sid frrinit.sh[1197]: [1197|mgmtd] sending configuration
Jan 10 20:58:01 sid frrinit.sh[1198]: [1198|zebra] sending configuration
Jan 10 20:58:01 sid frrinit.sh[1198]: [1198|zebra] done
Jan 10 20:58:01 sid frrinit.sh[1212]: [1212|watchfrr] sending configuration
Jan 10 20:58:01 sid frrinit.sh[1214]: [1214|staticd] sending configuration
Jan 10 20:58:01 sid frrinit.sh[1197]: [1197|mgmtd] done
Jan 10 20:58:01 sid watchfrr[1173]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jan 10 20:58:01 sid frrinit.sh[1195]: Waiting for children to finish applying config...
Jan 10 20:58:01 sid staticd[1193]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jan 10 20:58:01 sid frrinit.sh[1214]: [1214|staticd] done
Jan 10 20:58:01 sid frrinit.sh[1212]: [1212|watchfrr] done
Jan 10 20:58:01 sid watchfrr[1173]: [QDG3Y-BY5TN] mgmtd state -> up : connect succeeded
Jan 10 20:58:01 sid watchfrr[1173]: [QDG3Y-BY5TN] zebra state -> up : connect succeeded
Jan 10 20:58:01 sid watchfrr[1173]: [QDG3Y-BY5TN] staticd state -> up : connect succeeded
Jan 10 20:58:01 sid watchfrr[1173]: [KWE5Q-QNGFC] all daemons up, doing startup-complete notify
Jan 10 20:58:01 sid frrinit.sh[1164]: Started watchfrr.
Jan 10 20:58:01 sid systemd[1]: Started frr.service - FRRouting.

I haven't spotted anything obvious in the diff between 10.5.0 and 10.5.1, and definitely nothing in the reload script itself.

Version

Hello, this is FRRouting (version 10.5.1).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

How to reproduce

Install frr 10.5.1 on debian sid or ubuntu resolute, and issue systemctl reload frr. Notice the command fails, and exits 1.

Expected behavior

Reload should work as it has before.

Actual behavior

Reload fails, and systemd later on, after a 2min timeout, forcibly restarts frr once more.

Additional context

No response

Checklist

  • I have searched the open issues for this bug.
  • I have not included sensitive information in this report.

Metadata

Metadata

Assignees

No one assigned

    Labels

    triageNeeds further investigation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions