Skip to content

Commit cfc19ac

Browse files
h4x3rotabclaude
andcommitted
refactor(consul-postgres-ha): collapse platform plumbing to single sidecar image
Per-CVM container count drops from 7 → 3 on workers (sidecar + patroni + webdemo) and from 6 → 1 on coordinators (sidecar). The new sidecar image bundles bootstrap-secrets, mesh-conn, consul, and (workers only) envoy behind a tini-wrapped shell init that dispatches on ROLE; the old keepalive placeholder, the four-image lockstep, and the vestigial on-CVM signaling/coturn that had been documented as unused all drop. CI matrix: 6 → 4 (sidecar, patroni, webdemo, signaling). The sidecar build uses the parent consul-postgres-ha/ as docker context so its multi-stage Dockerfile can pull bootstrap-secrets/ and mesh-conn/ Go sources from sibling subdirs. cluster.tf: BOOTSTRAP_SECRETS_IMAGE, MESH_CONN_IMAGE, SIGNALING_IMAGE (coordinator) and the matching tfvars all collapse into SIDECAR_IMAGE. Smoke-tested against a fresh terraform apply on dstack-pha-prod5 (2026-05-04). Soft-kill RTO 27s, hard-kill RTO 33s, cheap rejoin verified, disk-loss rejoin 26s — all within noise of the pre-Gap-2 baselines on the previous multi-container cluster. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent d204f50 commit cfc19ac

17 files changed

Lines changed: 527 additions & 683 deletions

File tree

.github/workflows/consul-postgres-ha-publish.yml

Lines changed: 36 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,49 @@
11
name: Publish consul-postgres-ha images
22

3-
# Builds and publishes the six container images the consul-postgres-ha
4-
# example needs (mesh-conn, bootstrap-secrets, signaling, webdemo,
5-
# sidecar, patroni). On push to main, images are
6-
# tagged with the commit SHA *and* `latest`, pushed to GHCR, and
7-
# attested with Sigstore-backed GitHub Build Provenance so consumers
8-
# can verify "this image came from this commit of this repo" without
9-
# us managing any keys. PRs build to verify but do not push or attest.
3+
# Builds and publishes the four container images the consul-postgres-ha
4+
# example needs (mesh-sidecar, patroni, webdemo, signaling). On push
5+
# to main, images are tagged with the commit SHA *and* `latest`,
6+
# pushed to GHCR, and attested with Sigstore-backed GitHub Build
7+
# Provenance so consumers can verify "this image came from this
8+
# commit of this repo" without us managing any keys. PRs build to
9+
# verify but do not push or attest.
1010
#
11-
# Why six images on one workflow: the example needs all of them in
12-
# lockstep — bumping mesh-conn alone but leaving the rest stale leads
13-
# to mixed-version clusters that are hard to reason about. One workflow
14-
# means one set of tags moves together.
11+
# Why one workflow for all four: the example needs them in lockstep —
12+
# bumping one but leaving the rest stale leads to mixed-version
13+
# clusters that are hard to reason about. One workflow means one set
14+
# of tags moves together.
15+
#
16+
# `mesh-sidecar` is the consolidated platform-plumbing image (formerly
17+
# four images: bootstrap-secrets, mesh-conn, the legacy keepalive, and
18+
# the old envoy-only sidecar). Its build context is the parent
19+
# consul-postgres-ha/ directory so its Dockerfile can pull the Go
20+
# sources from sibling subdirs. The other three images build from
21+
# their own subdirs.
1522
#
1623
# Verifying a published image (consumer side):
1724
#
1825
# gh attestation verify \
19-
# oci://ghcr.io/dstack-tee/dstack-examples/consul-postgres-ha-mesh-conn:latest \
26+
# oci://ghcr.io/dstack-tee/dstack-examples/consul-postgres-ha-mesh-sidecar:latest \
2027
# --repo Dstack-TEE/dstack-examples
2128

2229
on:
2330
push:
2431
branches: [main]
2532
paths:
26-
- 'consul-postgres-ha/mesh-conn/**'
2733
- 'consul-postgres-ha/bootstrap-secrets/**'
34+
- 'consul-postgres-ha/mesh-conn/**'
35+
- 'consul-postgres-ha/mesh-sidecar/**'
2836
- 'consul-postgres-ha/patroni/**'
2937
- 'consul-postgres-ha/webdemo/**'
30-
- 'consul-postgres-ha/sidecar/**'
3138
- 'consul-postgres-ha/signaling/**'
3239
- '.github/workflows/consul-postgres-ha-publish.yml'
3340
pull_request:
3441
paths:
35-
- 'consul-postgres-ha/mesh-conn/**'
3642
- 'consul-postgres-ha/bootstrap-secrets/**'
43+
- 'consul-postgres-ha/mesh-conn/**'
44+
- 'consul-postgres-ha/mesh-sidecar/**'
3745
- 'consul-postgres-ha/patroni/**'
3846
- 'consul-postgres-ha/webdemo/**'
39-
- 'consul-postgres-ha/sidecar/**'
4047
- 'consul-postgres-ha/signaling/**'
4148
- '.github/workflows/consul-postgres-ha-publish.yml'
4249
workflow_dispatch:
@@ -59,18 +66,18 @@ jobs:
5966
fail-fast: false
6067
matrix:
6168
include:
62-
- name: mesh-conn
63-
context: consul-postgres-ha/mesh-conn
64-
- name: bootstrap-secrets
65-
context: consul-postgres-ha/bootstrap-secrets
69+
# `mesh-sidecar` builds with the parent dir as context so
70+
# its Dockerfile can pull bootstrap-secrets/ and mesh-conn/
71+
# Go sources from siblings.
72+
- name: mesh-sidecar
73+
context: consul-postgres-ha
74+
dockerfile: consul-postgres-ha/mesh-sidecar/Dockerfile
6675
- name: patroni
6776
context: consul-postgres-ha/patroni
68-
- name: signaling
69-
context: consul-postgres-ha/signaling
7077
- name: webdemo
7178
context: consul-postgres-ha/webdemo
72-
- name: sidecar
73-
context: consul-postgres-ha/sidecar
79+
- name: signaling
80+
context: consul-postgres-ha/signaling
7481

7582
steps:
7683
- uses: actions/checkout@v4
@@ -90,7 +97,7 @@ jobs:
9097
id: meta
9198
uses: docker/metadata-action@v5
9299
with:
93-
# Image namespace lives one level under the repo so all six
100+
# Image namespace lives one level under the repo so all four
94101
# images sit side-by-side: ghcr.io/<owner>/<repo>/consul-postgres-ha-<name>
95102
images: ${{ env.REGISTRY }}/${{ github.repository }}/consul-postgres-ha-${{ matrix.name }}
96103
tags: |
@@ -103,6 +110,10 @@ jobs:
103110
uses: docker/build-push-action@v6
104111
with:
105112
context: ${{ matrix.context }}
113+
# Most images use the default Dockerfile in the context.
114+
# `mesh-sidecar` overrides this to point at
115+
# mesh-sidecar/Dockerfile while keeping the parent context.
116+
file: ${{ matrix.dockerfile || format('{0}/Dockerfile', matrix.context) }}
106117
platforms: linux/amd64
107118
push: ${{ github.event_name != 'pull_request' }}
108119
tags: ${{ steps.meta.outputs.tags }}

consul-postgres-ha/FAILOVER.md

Lines changed: 40 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ PW=$(ssh ... root@${W1}-22.${GW} "cat /tmp/dstack-runtime/secrets/patroni-superu
3131

3232
```bash
3333
ssh ... root@${W1}-22.${GW} \
34-
"docker exec dstack-tester-1 sh -c 'curl -s http://127.0.0.1:18803/cluster' | jq"
34+
"docker exec dstack-sidecar-1 sh -c 'curl -s http://127.0.0.1:18803/cluster' | jq"
3535

3636
ssh ... root@${W1}-22.${GW} "PGPASSWORD='$PW' docker exec -e PGPASSWORD dstack-patroni-1 \
3737
psql -h 127.0.0.1 -p 18703 -U postgres -d postgres \
@@ -86,19 +86,23 @@ consistent recovery state reached at 0/...
8686
started streaming WAL from primary at 0/... on timeline 16
8787
```
8888

89-
## Measured timeline (run from 2026-05-03)
89+
## Measured timeline (run from 2026-05-04, single-sidecar layout)
9090

9191
```
92-
T_kill 05:02:28.028 docker stop dstack-patroni-1 on worker-3
93-
T_new_leader 05:02:49.994 worker-4 promoted (timeline 15 → 16) +22s
94-
T_first_write 05:02:52.313 INSERT succeeds on worker-4 +24s ← RTO
95-
T_restart_W3 05:03:39.704 docker start dstack-patroni-1
96-
T_W3_rejoined 05:04:10.377 worker-3 streaming, lag=0 +31s
92+
T_kill 17:31:26 docker stop dstack-patroni-1 on worker-5 (leader)
93+
T_new_leader 17:31:57 worker-4 promoted (timeline 2 → 3) +31s
94+
T_first_write 17:31:59 INSERT succeeds on worker-4 +33s ← RTO
9795
```
9896

99-
**RTO (Recovery Time Objective): ~24 seconds.** That's the wall time
97+
**RTO (Recovery Time Objective): ~33 seconds.** That's the wall time
10098
from leader process death to first successful write on the new leader,
101-
sitting comfortably inside the default Patroni `ttl=30`.
99+
sitting at the edge of the default Patroni `ttl=30`. The 2026-05-03
100+
multi-container baseline was 24s on a different cluster — the
101+
single-sidecar layout is within typical run-to-run variance for the
102+
`ttl=30 + promote-overhead` window. Cheap rejoin was confirmed in a
103+
prior round of this same run: a previously-killed leader (worker-3)
104+
came back as a streaming replica on the new timeline with lag=0
105+
within ~60s of `docker start dstack-patroni-1`.
102106

103107
## Tunables for the RTO/availability tradeoff
104108

@@ -124,8 +128,9 @@ the leader at once:
124128
ssh ... root@${LEADER}-22.${GW} "docker stop -t 0 \$(docker ps -q)"
125129
```
126130

127-
This kills patroni, postgres, mesh-conn, consul, sidecar, webdemo, and
128-
the keepalive — everything that produces signal for the rest of the
131+
This kills patroni, postgres, webdemo, and the consolidated sidecar
132+
(which itself runs bootstrap-secrets, mesh-conn, consul, and envoy
133+
inside it) — everything that produces signal for the rest of the
129134
cluster. Bring the host back via:
130135

131136
```bash
@@ -135,23 +140,29 @@ ssh ... root@${LEADER}-22.${GW} \
135140
```
136141

137142
`docker compose up -d` respects the dependency order
138-
(bootstrap-secrets → mesh-conn → consul → patroni).
143+
(sidecar's `service_healthy` gate fires once bootstrap-secrets has
144+
written `/run/instance/info.json`, then patroni and webdemo start).
139145

140-
### Measured timeline (run from 2026-05-03)
146+
### Measured timeline (run from 2026-05-04, single-sidecar layout)
141147

142148
```
143-
T_kill 07:26:42 docker stop -t 0 ALL 7 containers on worker-4
144-
T_new_leader 07:27:13 worker-3 promoted (timeline 16 → 17) +31s
145-
T_first_write 07:27:15 INSERT succeeds on worker-3 +33s ← RTO
146-
T_restart_W4 07:27:46 docker compose up -d on worker-4
147-
T_W4_rejoined 07:28:34 worker-4 streaming, lag=0 +48s after restart
149+
T_kill 17:33:29 docker stop -t 0 ALL containers on worker-4 (leader)
150+
T_new_leader 17:34:00 worker-3 promoted (timeline 3 → 4) +31s
151+
T_first_write 17:34:02 INSERT succeeds on worker-3 +33s ← RTO
152+
T_restart_W4 17:34:02 docker compose up -d on worker-4
148153
```
149154

150-
**Hard-kill RTO ≈ 33 seconds**, ~9 seconds longer than the soft-kill
151-
above. That extra cost is Consul gossip-failure detection: with
152-
soft-kill only the Patroni leader-key TTL expires, while with hard-kill
153-
the entire Consul agent is gone, so the surviving peers see *both*
154-
signals.
155+
**Hard-kill RTO ≈ 33 seconds**, identical to both the soft-kill above
156+
and the 2026-05-03 multi-container baseline. Consul gossip-failure
157+
detection (which sees worker-4's whole agent disappear, not just the
158+
Patroni lock) lines up with the Patroni leader-key TTL on this run,
159+
so neither signal extends the RTO.
160+
161+
The post-restart rejoin path on dstack-worker pairs is occasionally
162+
flaky (the documented `MESH_CONN_RELAY_ONLY=1` escape hatch in
163+
`compose/worker.yaml` is exactly this case — flip it on if your
164+
deployment hits a wedged ICE re-handshake). The mesh-conn binary
165+
behavior is unchanged by the single-sidecar consolidation.
155166

156167
### Things confirmed by the hard-kill that the soft-kill didn't exercise
157168

@@ -184,17 +195,16 @@ rm -rf /var/lib/docker/volumes/dstack_patroni-pgdata/_data/*
184195
docker start dstack-patroni-1
185196
```
186197

187-
### Measured timeline (run from 2026-05-03)
198+
### Measured timeline (run from 2026-05-04, single-sidecar layout)
188199

189200
```
190-
T_wipe 21:13:41 docker stop + rm -rf pgdata on worker-5
191-
T_restart 21:13:42 docker start
192-
T_basebackup 21:13:47 "trying to bootstrap from leader 'worker-4'"
193-
T_complete 21:13:54 "replica has been created using basebackup" +7s
194-
T_streaming 21:13:58 service registered, streaming WAL +16s total
201+
T_wipe 17:34:21 docker stop + rm -rf pgdata on worker-5
202+
T_restart 17:34:25 docker start
203+
T_complete 17:34:43 "replica has been created using basebackup" +18s
204+
T_streaming 17:35:43 streaming WAL on timeline 4, lag=0 +82s total
195205
```
196206

197-
5.2 MB pgdata transferred in ~7 seconds end-to-end. Note the dataset
207+
A few-MB pgdata transferred in ~18 seconds end-to-end. The dataset
198208
is small enough that handshake/startup overhead dominates — for a
199209
realistic throughput number, see the soft-kill section's pg_basebackup
200210
trace at ~25 MB/s sustained on the QUIC path.

consul-postgres-ha/PUBLISHING.md

Lines changed: 31 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,15 @@
11
# Stage 4 — image publishing & verification
22

3-
The stage-4 example needs six container images deployed in lockstep:
4-
`mesh-conn`, `bootstrap-secrets`, `signaling`, `webdemo`, `sidecar`,
5-
`patroni`. CI publishes them to GHCR with Sigstore-backed GitHub Build
6-
Provenance; consumers pin by tag (or, better, by digest) and verify
7-
provenance with `gh attestation verify`.
3+
The stage-4 example needs four container images deployed in lockstep:
4+
`mesh-sidecar`, `patroni`, `webdemo`, `signaling`. CI publishes them to
5+
GHCR with Sigstore-backed GitHub Build Provenance; consumers pin by
6+
tag (or, better, by digest) and verify provenance with
7+
`gh attestation verify`.
8+
9+
`mesh-sidecar` is the consolidated platform-plumbing image — a single
10+
container that runs bootstrap-secrets, mesh-conn, consul, and (on
11+
workers) envoy. It's the heaviest by a wide margin because it
12+
inherits from envoyproxy/envoy and bundles three more binaries on top.
813

914
This doc covers the three paths you'll actually use:
1015

@@ -15,10 +20,14 @@ This doc covers the three paths you'll actually use:
1520
## 1. CI publish — the steady-state
1621

1722
`.github/workflows/consul-postgres-ha-publish.yml` runs on push to `main`
18-
when any of the six image build contexts (or the workflow itself)
23+
when any of the four image build contexts (or the workflow itself)
1924
change, and on PRs touching the same paths. Each run:
2025

21-
- Builds all six images via a matrix job.
26+
- Builds all four images via a matrix job. The `mesh-sidecar` build
27+
uses `consul-postgres-ha/` as its docker context (instead of
28+
`consul-postgres-ha/mesh-sidecar/`) so its Dockerfile can pull
29+
`bootstrap-secrets/` and `mesh-conn/` Go sources from sibling
30+
directories.
2231
- On `main`, pushes to `ghcr.io/dstack-tee/dstack-examples/consul-postgres-ha-<name>` with two tags: the long-form commit SHA (`sha-<40-hex>`) and `latest`.
2332
- Generates a GitHub Build Provenance attestation per image via
2433
`actions/attest-build-provenance@v2`. The attestation is signed by
@@ -34,12 +43,12 @@ change, and on PRs touching the same paths. Each run:
3443
```bash
3544
# By tag (lower assurance — `latest` floats):
3645
gh attestation verify \
37-
oci://ghcr.io/dstack-tee/dstack-examples/consul-postgres-ha-mesh-conn:latest \
46+
oci://ghcr.io/dstack-tee/dstack-examples/consul-postgres-ha-mesh-sidecar:latest \
3847
--repo Dstack-TEE/dstack-examples
3948

4049
# By digest (preferred — pinned, won't drift):
4150
gh attestation verify \
42-
oci://ghcr.io/dstack-tee/dstack-examples/consul-postgres-ha-mesh-conn@sha256:<digest> \
51+
oci://ghcr.io/dstack-tee/dstack-examples/consul-postgres-ha-mesh-sidecar@sha256:<digest> \
4352
--repo Dstack-TEE/dstack-examples
4453
```
4554

@@ -54,20 +63,23 @@ of `latest` doesn't silently swap your cluster's bits.
5463

5564
## 2. Manual one-off publish — dev iteration
5665

57-
When iterating fast on `mesh-conn` (or any other component) you don't
58-
want to round-trip through CI for every byte. Two equivalent shortcuts:
66+
When iterating fast on the mesh-sidecar (or any other component) you
67+
don't want to round-trip through CI for every byte. Two equivalent
68+
shortcuts. Note that `mesh-sidecar` builds from the
69+
`consul-postgres-ha/` parent dir (it pulls Go sources from sibling
70+
subdirs); the rest build from their own subdir.
5971

6072
### a) `ttl.sh` (24h-disposable, no auth)
6173

6274
```bash
6375
TS=$(date +%s)
64-
TAG=ttl.sh/dstack-mesh-conn-${TS}:24h
65-
docker build -t $TAG consul-postgres-ha/mesh-conn
76+
TAG=ttl.sh/dstack-mesh-sidecar-${TS}:24h
77+
docker build -t $TAG -f consul-postgres-ha/mesh-sidecar/Dockerfile consul-postgres-ha
6678
docker push $TAG
6779
```
6880

6981
Then point the running cluster at it via `terraform.tfvars`'s
70-
`mesh_conn_image = ...` (and `terraform apply`), or hot-patch the
82+
`mesh_sidecar_image = ...` (and `terraform apply`), or hot-patch the
7183
running CVM (see §3). `ttl.sh` images expire 24h after push.
7284

7385
### b) Personal GHCR namespace (persistent, requires PAT)
@@ -76,8 +88,8 @@ If you want a longer-lived dev image without going through main:
7688

7789
```bash
7890
echo "$GITHUB_TOKEN" | docker login ghcr.io -u <your-user> --password-stdin
79-
TAG=ghcr.io/<your-user>/consul-postgres-ha-mesh-conn:dev-$(date +%s)
80-
docker build -t $TAG consul-postgres-ha/mesh-conn
91+
TAG=ghcr.io/<your-user>/consul-postgres-ha-mesh-sidecar:dev-$(date +%s)
92+
docker build -t $TAG -f consul-postgres-ha/mesh-sidecar/Dockerfile consul-postgres-ha
8193
docker push $TAG
8294
```
8395

@@ -99,17 +111,17 @@ Phala-Network/terraform-provider-phala#8).
99111
```bash
100112
GW=dstack-pha-prod5.phala.network
101113
APP_ID=<cvm-app-id>
102-
NEW=ttl.sh/dstack-mesh-conn-<ts>:24h
114+
NEW=ttl.sh/dstack-mesh-sidecar-<ts>:24h
103115
OLD=$(ssh ... root@${APP_ID}-22.${GW} \
104-
"docker inspect dstack-mesh-conn-1 --format '{{.Config.Image}}'")
116+
"docker inspect dstack-sidecar-1 --format '{{.Config.Image}}'")
105117

106118
ssh ... root@${APP_ID}-22.${GW} "
107119
docker pull $NEW
108120
docker tag $NEW $OLD
109121
cd /tapp && docker compose \
110122
--env-file /dstack/.host-shared/.decrypted-env \
111123
-p dstack -f /tapp/docker-compose.yaml \
112-
up -d --force-recreate mesh-conn
124+
up -d --force-recreate sidecar
113125
"
114126
```
115127

consul-postgres-ha/README.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,10 @@ Prerequisites:
3636

3737
- A Phala Cloud account with API credentials at `~/.phala-cloud/credentials.json`.
3838
- A Linux box with a public IP for the external coordinator (coturn + signaling).
39-
- The six container images either already published to GHCR (via the
40-
CI workflow on this repo's main branch) or pushed by you to a
41-
registry of your choice. See [`PUBLISHING.md`](PUBLISHING.md).
39+
- The four container images (`mesh-sidecar`, `patroni`, `webdemo`,
40+
`signaling`) either already published to GHCR (via the CI workflow
41+
on this repo's main branch) or pushed by you to a registry of your
42+
choice. See [`PUBLISHING.md`](PUBLISHING.md).
4243

4344
```bash
4445
cd consul-postgres-ha/cluster-example
@@ -72,11 +73,11 @@ consul-postgres-ha/
7273
├── compose/ coordinator.yaml + worker.yaml templates
7374
├── coordinator/ docker-compose for the external coordinator (coturn + signaling)
7475
75-
├── mesh-conn/ QUIC-over-pion/ICE overlay (~600 LoC Go)
76-
├── bootstrap-secrets/ init container — TEE-derives per-CVM secrets
76+
├── mesh-sidecar/ consolidated platform sidecar image (bootstrap-secrets + mesh-conn + consul + envoy)
77+
├── bootstrap-secrets/ Go source — TEE-derives per-CVM secrets (built into sidecar)
78+
├── mesh-conn/ Go source — QUIC-over-pion/ICE overlay (built into sidecar)
7779
├── patroni/ Patroni + Postgres image
7880
├── webdemo/ example workload sitting on the mesh
79-
├── sidecar/ Envoy bootstrapper for Consul Connect mTLS
8081
├── signaling/ HTTP /publish + /poll broker for ICE auth/candidate exchange
8182
└── quic-on-ice/ standalone smoke test for the QUIC-over-ICE transport
8283
```
@@ -113,8 +114,6 @@ and the Terraform structure as-is.
113114
in parallel hits
114115
[`phala-cloud#247`](https://github.com/Phala-Network/phala-cloud/issues/247)
115116
— use `-parallelism=1` for now (~5 min × N to bring-up).
116-
* Six container images per CVM is more platform plumbing than ideal.
117-
A consolidation pass to a single sidecar container is planned.
118117
* The mesh-conn admission story is **shared-secret based today**
119118
(TURN HMAC), not attestation-based. Adding TEE attestation as the
120119
admission credential is the next architectural step.

consul-postgres-ha/bootstrap-secrets/Dockerfile

Lines changed: 0 additions & 11 deletions
This file was deleted.

0 commit comments

Comments
 (0)