The Heartbeat plugin exposes a HTTP endpoint where it can receive "heartbeats" from external
systems. The heartbeat.received check then makes sure that a heartbeat has been received
within a given timeframe.
This can be used to monitor batch jobs, such as cron tasks, or just to periodically indicate service status when the service is not directly monitorable.
The heartbeat plugin must be explicitly configured with the port, and optionally the path, it will listen on:
plugin heartbeat {
# Port is required
port = 8080
# Path is optional, if specified all requests must start with the specified path.
path = "/heartbeat/"
}
Tip: The heartbeat plugin does not support TLS connections. It is strongly recommended that you use a reverse proxy such as Nginx, Haproxy or Caddy to perform TLS termination.
Services sending heartbeats must use a 32-character hexadecimal identifier, which
is included in the path when sending a heartbeat. For example with a configured
path of /heartbeat/, a service will send a HTTP GET to a URL like:
https://example.com/heartbeat/bf7a7fa97f112bf949e6de4188d6a991
Requests do not need any specific parameters, headers or payloads.
Tip: You can generate a random ID using the command
openssl rand -hex 16
It is recommended that callers automatically retry if they do not get a 2xx status code
in response to the heartbeat, for example with curl:
$ curl --retry 10 --retry-all-errors https://example.com/heartbeat/bf7a7fa97f112bf949e6de4188d6a991check heartbeat.received "cron" {
id = "bf7a7fa97f112bf949e6de4188d6a991"
within = 1d2h
}
Checks to ensure that a heartbeat with the given ID has been received within the time period (in the example, 1 day 2 hour). Each check must have a unique heartbeat ID.
Tip: Ensure the
withinwindow is long enough to account for the scheduling time and any execution time the job may take. For example, a daily cron job that varies in duration between 10 and 30 minutes will have up to a 1d20m gap between heartbeats.Also be aware that daylight savings time changes may cause a ±1h change depending on your configured timezone(s).
You may wish to consider setting a custom interval, good_threshold and
failing_threshold for this check. For a heartbeat that should be executed daily a
recommended configuration is:
check heartbeat.received "cron" {
id = "58139f843d770c638420a327e530834d53b691f74c8449a86474b081e22b02f3"
within = 1d2h
interval = 30m
good_threshold = 1
failing_threshold = 1
}
This will trigger an alert between 26 and 26½ hours after the last heartbeat was received, and revert to a "good" status within half an hour of a subsequent heartbeat being received.