Component: Phala Cloud + the official
Phala-Network/terraform-provider-phala
Use case: running stateful clusters (Consul, Postgres/Patroni,
etcd, Kafka, …) across dstack CVMs.
Background
dstack already has the right structural shape for stateful
replicas — an app can have multiple instances, each instance
is bound to its own disk, and the instance's identity is persisted
on that disk. That maps cleanly to GCP's Managed Instance Group
with a stateful_policy, or to Kubernetes' StatefulSet + PVC
template, or to AWS ASG built on top of EBS reattach + lifecycle
hooks.
What's missing today is the operator-facing control surface to
drive that model from Terraform, and the rollout policy for
multi-instance updates.
What works today (validated empirically)
We exercised the provider in a small shakedown
(context
includes the related storage_fs ForceNew bug we hit at the same
time), and confirmed:
phala_app with replicas: N provisions N CVMs all sharing the
same app_id (so a TEE-derived getKey() returns the same bytes
on every replica — important for cluster-wide secrets).
- In-place compose / env updates preserve
app_id and
primary_cvm_id (~3m39s on a tdx.small).
- Replicas show up in Terraform state as
cvm_ids = […].
So the basic "scale N stateful replicas under one app" already
works. The gaps below show up the moment you want fine-grained
operational control over the rollout and per-instance lifecycle.
Asks
1. Per-instance Terraform resource
Mirror GCP MIG's
google_compute_per_instance_config
or k8s' StatefulSet+PVC template:
resource "phala_app" "consul_servers" {
name = "consul-servers"
replicas = 3
# ... shared compose, env, etc.
}
resource "phala_app_instance" "consul_servers" {
for_each = { for i in range(3) : i => "consul-server-${i}" }
app_id = phala_app.consul_servers.app_id
ordinal = each.key # stable slot number
preserved_state {
disk = "data" # never delete, even on instance recreate
}
metadata = {
role = "server"
}
}
Use cases this unlocks:
terraform apply -target=phala_app_instance.consul_servers[\"1\"]
to upgrade exactly one replica.
- Per-instance overrides (e.g. AZ pinning, instance-specific
metadata).
- Per-instance state in Terraform — each replica has its own
terraform state entry instead of being collapsed into the
parent's cvm_ids list.
This is the structural primitive that lets external tooling
(rollout scripts, operators) gate updates on workload health.
2. update_policy block on phala_app
Mirror GCP MIG's
update_policy:
resource "phala_app" "consul_servers" {
name = "consul-servers"
replicas = 3
update_policy {
type = "PROACTIVE" # or "OPPORTUNISTIC"
minimal_action = "RESTART" # NONE / REFRESH / RESTART / REPLACE
max_unavailable_fixed = 0 # never reduce live capacity
max_surge_fixed = 1 # surge one extra during rollout
min_ready_seconds = 30 # green for at least 30s before next
}
}
Reads as "one out at a time, never two simultaneously, give it 30s
of green before moving on." Same shape as k8s StatefulSet's
RollingUpdate { partition, podManagementPolicy }.
Without this, every Phala app update today is opaque — the operator
can't tell whether the platform will roll instances one at a time
or restart them all at once. For a Consul quorum, the difference
is "uptime" vs "split-brain incident."
3. Workload health-gate hooks (lifecycle hooks equivalent)
GCP/AWS both expose lifecycle hooks (PRE_TERMINATE /
POST_LAUNCH) so the cluster can pause a rollout while a
workload-specific drain runs. Examples:
- Consul:
consul operator raft transfer-leader before killing
the leader.
- Postgres+Patroni:
patronictl switchover --candidate ... before
draining the primary.
- etcd:
etcdctl member promote after a new joiner is in-sync.
A simple version in dstack would be: an HTTP endpoint or a script
the platform execs in the CVM before terminating it, with a
configurable timeout. Without this, the only safe "stateful rolling
update" today is to do it manually outside of Terraform.
4. auto_healing with custom health checks
If a CVM dies, today the operator notices via
/v1/agent/members going red and runs terraform apply to
recreate. GCP MIG / k8s both reconcile automatically off a health
check. Same shape would fit:
auto_healing_policies {
health_check = phala_health_check.consul_healthy.id
initial_delay_sec = 60
}
resource "phala_health_check" "consul_healthy" {
http {
port = 8500
request_path = "/v1/health/state/passing"
expect_200 = true
}
check_interval_sec = 10
unhealthy_threshold = 3
}
Combined with preserved_state, the recreated CVM re-attaches its
existing disk → the cluster heals without losing membership.
Why this matters
Right now, anyone running a stateful cluster on Phala Cloud has to
either:
- Build the orchestration outside of Phala (custom scripts that
call the API directly, drain workloads, then terraform apply -target=... per instance), or
- Run an in-cluster operator (Consul Operator, Patroni controller,
etc.) that wraps the platform and adds the missing primitives.
(1) is what we'll do for our experiment in the meantime; it works
but the orchestration logic ends up duplicated across every project.
(2) is the "operator pattern" but only really pays off in
Kubernetes-shaped environments.
Adding the four primitives above would let the standard cloud-style
HCL pattern (the GCP MIG one) work natively on Phala Cloud, which
seems like the right abstraction to converge on long-term.
Related
Both small but in the same vicinity (tooling for multi-replica apps).
Happy to chat further
We're prototyping a Consul service mesh across dstack CVMs (mesh-conn
overlay + ICE/yamux) and these would graduate it from "demo" to
"managed cluster." Glad to provide more detail / iterate on the API
shape if useful.
Component: Phala Cloud + the official
Phala-Network/terraform-provider-phala
Use case: running stateful clusters (Consul, Postgres/Patroni,
etcd, Kafka, …) across dstack CVMs.
Background
dstack already has the right structural shape for stateful
replicas — an app can have multiple instances, each instance
is bound to its own disk, and the instance's identity is persisted
on that disk. That maps cleanly to GCP's Managed Instance Group
with a
stateful_policy, or to Kubernetes'StatefulSet+PVCtemplate, or to AWS ASG built on top of EBS reattach + lifecycle
hooks.
What's missing today is the operator-facing control surface to
drive that model from Terraform, and the rollout policy for
multi-instance updates.
What works today (validated empirically)
We exercised the provider in a small shakedown
(context
includes the related
storage_fsForceNew bug we hit at the sametime), and confirmed:
phala_appwithreplicas: Nprovisions N CVMs all sharing thesame
app_id(so a TEE-derivedgetKey()returns the same byteson every replica — important for cluster-wide secrets).
app_idandprimary_cvm_id(~3m39s on a tdx.small).cvm_ids = […].So the basic "scale N stateful replicas under one app" already
works. The gaps below show up the moment you want fine-grained
operational control over the rollout and per-instance lifecycle.
Asks
1. Per-instance Terraform resource
Mirror GCP MIG's
google_compute_per_instance_configor k8s' StatefulSet+PVC template:
Use cases this unlocks:
terraform apply -target=phala_app_instance.consul_servers[\"1\"]to upgrade exactly one replica.
metadata).
terraform stateentry instead of being collapsed into theparent's
cvm_idslist.This is the structural primitive that lets external tooling
(rollout scripts, operators) gate updates on workload health.
2.
update_policyblock onphala_appMirror GCP MIG's
update_policy:Reads as "one out at a time, never two simultaneously, give it 30s
of green before moving on." Same shape as k8s StatefulSet's
RollingUpdate { partition, podManagementPolicy }.Without this, every Phala app update today is opaque — the operator
can't tell whether the platform will roll instances one at a time
or restart them all at once. For a Consul quorum, the difference
is "uptime" vs "split-brain incident."
3. Workload health-gate hooks (lifecycle hooks equivalent)
GCP/AWS both expose lifecycle hooks (
PRE_TERMINATE/POST_LAUNCH) so the cluster can pause a rollout while aworkload-specific drain runs. Examples:
consul operator raft transfer-leaderbefore killingthe leader.
patronictl switchover --candidate ...beforedraining the primary.
etcdctl member promoteafter a new joiner is in-sync.A simple version in dstack would be: an HTTP endpoint or a script
the platform
execs in the CVM before terminating it, with aconfigurable timeout. Without this, the only safe "stateful rolling
update" today is to do it manually outside of Terraform.
4.
auto_healingwith custom health checksIf a CVM dies, today the operator notices via
/v1/agent/membersgoing red and runsterraform applytorecreate. GCP MIG / k8s both reconcile automatically off a health
check. Same shape would fit:
Combined with
preserved_state, the recreated CVM re-attaches itsexisting disk → the cluster heals without losing membership.
Why this matters
Right now, anyone running a stateful cluster on Phala Cloud has to
either:
call the API directly, drain workloads, then
terraform apply -target=...per instance), oretc.) that wraps the platform and adds the missing primitives.
(1) is what we'll do for our experiment in the meantime; it works
but the orchestration logic ends up duplicated across every project.
(2) is the "operator pattern" but only really pays off in
Kubernetes-shaped environments.
Adding the four primitives above would let the standard cloud-style
HCL pattern (the GCP MIG one) work natively on Phala Cloud, which
seems like the right abstraction to converge on long-term.
Related
—
storage_fsForceNew bug found in the same shakedown.—
phala cvms listcollapses replicas to one entry.Both small but in the same vicinity (tooling for multi-replica apps).
Happy to chat further
We're prototyping a Consul service mesh across dstack CVMs (mesh-conn
overlay + ICE/yamux) and these would graduate it from "demo" to
"managed cluster." Glad to provide more detail / iterate on the API
shape if useful.