runtime: cache grown goroutine stacks to reduce stack-growth cost#80137
runtime: cache grown goroutine stacks to reduce stack-growth cost#80137geekswaroop wants to merge 1 commit into
Conversation
A dead goroutine's stack is freed in gfput unless it is exactly
startingStackSize, and gfget re-frees a stack whose size no longer
matches. So a goroutine that grew its stack returns that memory as soon
as it dies, and the next goroutine to reuse the g must grow it again
from scratch.
In production we observe stack growth (copystack) costing ~3.9% of CPU
across the fleet, comparable to GC. The cost is dominated by short-lived
RPC goroutines that grow to 16-32 KiB and die before the next GC, so
their size is never reflected in startingStackSize and each one pays to
grow again.
Instead, retain a dead goroutine's grown stack on the per-P free-g list
for reuse, freeing only stacks larger than a 128 KiB cap. Ephemeral
goroutines reuse an already-grown stack, while long-lived goroutines are
still shrunk by GC. The per-P free-g list is already capped at 64
entries, so the additional retained memory is bounded.
BenchmarkIssue18138 spawns short-lived goroutines that grow their stacks
and then die. Goroutines whose stacks stay under the 128 KiB cap
(depth=64) reuse a grown stack instead of regrowing; those that exceed
the cap (depth=1000, ~1 MiB) are largely unaffected:
│ vanilla │ stackcache │
│ sec/op │ sec/op vs base │
Issue18138/depth=64-48 11.012µ ± 2% 4.309µ ± 5% -60.87% (p=0.000 n=10)
Issue18138/depth=1000-48 62.04µ ± 2% 57.30µ ± 3% -7.65% (p=0.000 n=10)
geomean 26.14µ 15.71µ -39.89%
Updates golang#77893
|
This PR (HEAD: 9aa2106) has been imported to Gerrit for code review. Please visit Gerrit at https://go-review.googlesource.com/c/go/+/793780. Important tips:
|
|
Message from Gopher Robot: Patch Set 1: Congratulations on opening your first change. Thank you for your contribution! Next steps: Most changes in the Go project go through a few rounds of revision. This can be During May-July and Nov-Jan the Go project is in a code freeze, during which Please don’t reply on this GitHub thread. Visit golang.org/cl/793780. |
|
Message from Jorropo: Patch Set 1: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/793780. |
|
Message from Jorropo: Patch Set 1: Commit-Queue+1 (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/793780. |
|
Message from golang-scoped@luci-project-accounts.iam.gserviceaccount.com: Patch Set 1: Dry run: CV is trying the patch. Bot data: {"action":"start","triggered_at":"2026-06-24T22:06:47Z","revision":"442d3d32d9c9502d4e10e26365ad56634cfbac4f"} Please don’t reply on this GitHub thread. Visit golang.org/cl/793780. |
|
Message from Jorropo: Patch Set 1: -Commit-Queue (Performed by <GERRIT_ACCOUNT_60063> on behalf of <GERRIT_ACCOUNT_55763>) Please don’t reply on this GitHub thread. Visit golang.org/cl/793780. |
|
Message from golang-scoped@luci-project-accounts.iam.gserviceaccount.com: Patch Set 1: This CL has passed the run Please don’t reply on this GitHub thread. Visit golang.org/cl/793780. |
|
Message from golang-scoped@luci-project-accounts.iam.gserviceaccount.com: Patch Set 1: LUCI-TryBot-Result+1 Please don’t reply on this GitHub thread. Visit golang.org/cl/793780. |
A dead goroutine's stack is freed in gfput unless it is exactly startingStackSize, and gfget re-frees a stack whose size no longer matches. So a goroutine that grew its stack returns that memory as soon as it dies, and the next goroutine to reuse the g must grow it again from scratch.
In production we observe stack growth (copystack) costing ~3.9% of CPU across the fleet, comparable to GC. The cost is dominated by short-lived RPC goroutines that grow to 16-32 KiB and die before the next GC, so their size is never reflected in startingStackSize and each one pays to grow again.
Instead, retain a dead goroutine's grown stack on the per-P free-g list for reuse, freeing only stacks larger than a 128 KiB cap. Ephemeral goroutines reuse an already-grown stack, while long-lived goroutines are still shrunk by GC. The per-P free-g list is already capped at 64 entries, so the additional retained memory is bounded.
BenchmarkIssue18138 spawns short-lived goroutines that grow their stacks and then die. Goroutines whose stacks stay under the 128 KiB cap (depth=64) reuse a grown stack instead of regrowing; those that exceed the cap (depth=1000, ~1 MiB) are largely unaffected:
Issue18138/depth=64-48 11.012µ ± 2% 4.309µ ± 5% -60.87% (p=0.000 n=10)
Issue18138/depth=1000-48 62.04µ ± 2% 57.30µ ± 3% -7.65% (p=0.000 n=10)
geomean 26.14µ 15.71µ -39.89%
Updates #77893