"A distributed system is a collection of autonomous compute nodes (sometimes unreliable) that appears to it's users as a single coherent reliable system"
The goal of Nomad-node-problem-detector (a.k.a NNPD) is to abstract these nodes problems from the user, so that user experience is more reliable, when using the Nomad orchestration system.
When a user submits a job ( job --> task_groups(N) --> tasks(N) ) each task in the job needs a task driver e.g. docker, java, QEMU, containerd etc to execute this task. In the current architecture, if a task driver e.g. docker is Unhealthy on a Nomad client node and one of the tasks in the job requires docker to execute, Nomad scheduler will not schedule this job on that particular Nomad client.
Question: What is the definition of a task driver being unhealthy?
Answer: A task driver executes a Fingerprint operation every X seconds (configurable within the task driver) and reports it's HealthState to the Nomad client. Nomad client reports this HealthState to the scheduler. Scheduler can then schedule jobs based on the health states of all the task drivers running on each nomad client nodes.
An example fingerprinting operation for docker task driver.
However,
- If I need to add a custom health check in docker task driver, I would have to modify the fingerprinting operation
here, add this new check and open a PR tohashicorp/nomadrepo. This custom health check could also be specific to my environment, so adding it to the upstreamhashicorp/nomadrepo might not be possible. NNPD decouples this from thehashicorp/nomadcodebase, and provides a framework for adding custom health checks easily. - Nomad clients could be running under
CPU,memoryordiskpressure at various times. NNPD constantly monitor the nodes for these situations and take the nodes out of the scheduling pool when they are underCPU,memoryordiskpressure. It also put the nodes back into the scheduling pool when the pressure is relieved. - The scheduler is only concerned with the task driver health state, when scheduling jobs. However there could be additional problems happening on the node. e.g. ntp service down, kernel issues, corrupted file systems. These can be integrated with NNPD, so nodes can be taken out of the scheduling pool if the node is unhealthy.
In a nutshell, NNPD provides a blackbox (a framework) where we can dump all our node problems, and if a node is running into one of these problems, NNPD will take the node out of the scheduling pool, so no new jobs gets scheduled on this faulty node, until the problem is fixed. In case of a transient issue, if the node recovers, NNPD will also move the node back to the scheduling pool, so new jobs can be scheduled on this node.
NOTE: NNPD as the name suggests Nomad-node-problem-detector is only concerned with the problems happening on the node. Problems external to the node e.g. docker registry down should not be added to NNPD, otherwise it might take all the nodes out of the scheduling pool.
NNPD is composed of two main components:
-
Detector: is responsible for scanning through the node health checks, and exposing the node health at
/v1/nodehealthHTTP endpoint. It also exposes a/v1/healthendpoint, which tells if thedetectoritself ishealthyorunhealthy.Detector relies on an external health check repo, which is used for defining the node health checks.
A sample health check repo is provided for reference: https://github.com/shishir-a412ed/nomad-health-checksNOTE: The sample
health check repodo not contain real health checks, but only provides a reference for defining your own health checks. -
Aggregator: is responsible for getting the node health (
/v1/nodehealth) for each node runningdetector. Based on the node health results, aggregator will mark the node aseligibleorineligiblefor scheduling.
NNPD is packaged as a single go binary, which can be run either in detector or aggregator mode.
$ git clone git@github.com:Roblox/nomad-node-problem-detector.git
$ cd nomad-node-problem-detector
$ make build (This will build your npd binary)
$ make install (This will install npd binary in your /usr/local/bin)
NOTE: The binary name is npd eventhough the application is called nnpd.
As mentioned in the Architecture section, detector relies on an external health check repo
for determining the node health (/v1/nodehealth). A separate github repository can be defined for your health checks. This sample repo can be used as a reference.
At the root of the health check repo, a master config (config.json) will be defined. It has two main fields:
- type: Directory name where the actual health check (
health_check) is located. - health_check: Name of the health check script file.
e.g. In the sample config.json, type docker and health_check docker_health_check.sh defines that docker_health_check.sh will be located under docker directory in the nomad health checks repo.
npdshould be installed on allNomadclient nodes.detectorshould be deployed beforeaggregator.
detector can be deployed either using artifactory based job or a docker prestart hook based job
NOTE: You only need to deploy detector using one of the modes, not both.
In either deployment mode (artifactory or docker prestart hook), detector first unpacks the health check repo
onto the Nomad client filesystem under Nomad allocation directory, so that the detector can scan (and execute)
these health checks and expose the node health (/v1/nodehealth) for the aggregator, followed by starting the detector daemon.
- Modify the artifactory
sourceto point to your health check repo. Please check Setup health check repo on how to setup your health check repo.
$ nomad job plan detector-artifact.nomad
$ nomad job run detector-artifact.nomad
$ nomad job status detector
How to deploy detector using docker prestart hook
Official aggregator docker image: shm32/npd-aggregator:1.1.0
You can find the aggregator nomad job spec here
$ nomad job plan aggregator.nomad
$ nomad job run aggregator.nomad
$ nomad job status aggregator
So, you were able to deploy detector and aggregator successfully. We have NNPD system up and running.
Question: How do I add a new health check, and do a rolling upgrade on detector?
- git clone <your_health_check_repo>
- Add your new health check in the locally cloned copy.
- Don't forget to update your master config (
config.json).
hint: Usenpd config generate --root-dir <dir>to update your master config. - Follow these
instructionsto upgrade yourdetectorusingdocker prestart hook mode.
You can enable a token based authentication for detector HTTP endpoints (/v1/health/ and /v1/nodehealth/) by starting the detector with --auth flag.
DETECTOR_HTTP_TOKEN=<your_token> environment variable must be set when deploying aggregator and detector jobs.
aggregator will use DETECTOR_HTTP_TOKEN to set the token in the authorization header when making the HTTP requests.
detector will use DETECTOR_HTTP_TOKEN for validating against the incoming token in the authorization header.
$ DETECTOR_HTTP_TOKEN=<your_token> npd detector --auth
The token is base64 encoded, so if you are trying things out using curl, you need to encode the token first before passing it in the authorization header.
$ echo -n <your_token> | base64
$ Note down your base64 encoded token.
$ curl -H "Authorization: Basic <base64_encoded_token>" http://localhost:8083/v1/nodehealth/
NOTE: In order to keep NNPD performant and lightweight, TLS is not support at this point.
Aggregator - Run npd in aggregator mode
npd aggregator --help for more info.
| Option | Type | Required | Default | Description |
|---|---|---|---|---|
| aggregation-cycle-time | string | no | 15s |
Time (in seconds) to wait between each aggregation cycle. |
| debug | bool | no | false | Enable debug logging. |
| detector-port | string | no | :8083 |
Detector HTTP server port |
| detector-datacenter | []string | no | N/A | List of datacenters where detector is running. If no datacenters are provided, aggregator will only reach out to nodes in $NOMAD_DC datacenter. |
| enforce-health-check | []string | no | N/A | Health checks in this list will be enforced i.e. node will be taken out of the scheduling pool if health-check fails. |
| nomad-server | string | no | http://localhost:4646 |
HTTP API address of a Nomad server or agent. |
| node-attribute | []string | no | N/A | Aggregator will filter nodes based on these attributes. E.g. if you set os.name=ubuntu, aggregator will only reach out to ubuntu nodes in the cluster. |
| threshold-percentage | int | no | 85 |
If the number of eligible nodes goes below the threshold, npd will stop marking nodes as ineligible. |
| prometheus-server-port | int | no | 3000 |
The port used to expose aggregator metrics in the prometheus format |
| prometheus-server-addr | string | no | 0.0.0.0 |
The address to bind the aggregator metrics exporter |
Detector - Run nomad node problem detector HTTP server
npd detector --help for more info.
| Option | Type | Required | Default | Description |
|---|---|---|---|---|
| detector-cycle-time | string | no | 3s |
Time (in seconds) to wait between each detector cycle. |
| port | string | no | :8083 |
Address to listen on for detector HTTP server. NOTE: If your detector is listening on a non-default port, don't forget to start your aggregator with --detector-port flag. This will inform aggregator which detector port to reach out to. |
| prometheus-metrics-path | string | no | /v1/metrics/ |
Set the path that is used by the metrics endpoint to expose detector metrics in the prometheus format. |
| auth | bool | no | false | If set to true, detector must set DETECTOR_HTTP_TOKEN=<your_token> as an environment variable when starting detector. |
| root-dir | string | no | /var/lib/nnpd |
Location of health checks. |
| cpu-limit | string | no | 85 |
CPU threshold in percentage. |
| memory-limit | string | no | 80 |
Memory threshold in percentage. |
| disk-limit | string | no | 90 |
Disk threshold in percentage. |
Config - Run config and health checks related commands.
npd config --help for more info.
There are two subcommands in npd config command:
- npd config generate - Generates the config.
| Option | Type | Required | Default | Description |
|---|---|---|---|---|
| root-dir | string | no | pwd - present working directory |
Location of health checks |
- npd config build - Copy your health checks into a docker image.
| Option | Type | Required | Default | Description |
|---|---|---|---|---|
| image | string | yes | N/A |
Fully qualified docker image name |
| root-dir | string | no | pwd - present working directory |
Location of health checks |
vagrant up will start a local vagrant VM nnpd, which has all the dependencies (e.g. nomad, golang) already installed, which are required to run the integration tests.
To run the tests locally in the vagrant VM.
$ vagrant up
$ vagrant ssh nnpd
$ sudo make test
make clean
This will delete your npd binary.
vagrant destroy
This will destroy your vagrant VM.
Copyright 2021 Roblox Corporation
Licensed under the Apache License, Version 2.0 (the "License"). For more information read the License.
