Skip to content

Commit 8ee959d

Browse files
authored
guide: add user journey memory tutorial (#4728)
* guide: add user journey memory docs guide: fix lint * fix: br tag * guide: apply review suggestions
1 parent 4385183 commit 8ee959d

16 files changed

+585
-0
lines changed
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
title: Diagnostics Guide
3+
layout: docs.hbs
4+
---
5+
6+
# Diagnostics Guide
7+
8+
These guides were created in the [Diagnostics Working Group](https://github.com/nodejs/diagnostics)
9+
with the objective to provide a guidance when diagnosing an issue
10+
in the user application.
11+
The documentation project is organized based on user journey.
12+
Those journeys are a coherent set of step-by-step procedures,
13+
that a user follows for problem determination of reported issues.
14+
15+
This is the available set of diagnostics guides:
16+
17+
* [Memory](/en/docs/guides/diagnostics/memory)
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
---
2+
title: Memory Diagnostics
3+
layout: docs.hbs
4+
---
5+
6+
# Memory
7+
8+
In this document you can learn about how to debug memory related issues.
9+
10+
* [Memory](#memory)
11+
* [My process runs out of memory](#my-process-runs-out-of-memory)
12+
* [Symptoms](#symptoms)
13+
* [Side Effects](#side-effects)
14+
* [Debugging](#debugging)
15+
* [My process utilizes memory inefficiently](#my-process-utilizes-memory-inefficiently)
16+
* [Symptoms](#symptoms-1)
17+
* [Side Effects](#side-effects-1)
18+
* [Debugging](#debugging-1)
19+
20+
## My process runs out of memory
21+
22+
Node.js _(JavaScript)_ is a garbage collected language, so having memory
23+
leaks is possible through retainers. As Node.js applications are usually
24+
multi-tenant, business critical, and long-running, providing an accessible and
25+
efficient way of finding a memory leak is essential.
26+
27+
### Symptoms
28+
29+
The user observes continuously increasing memory usage _(can be fast or slow,
30+
over days or even weeks)_ then sees the process crashing and restarting by the
31+
process manager. The process is maybe running slower than before and the
32+
restarts make certain requests to fail _(load balancer responds with 502)_.
33+
34+
### Side Effects
35+
36+
* Process restarts due to the memory exhaustion and request are dropped on the
37+
floor
38+
* Increased GC activity leads to higher CPU usage and slower response time
39+
* GC blocking the Event Loop causing slowness
40+
* Increased memory swapping slows down the process (GC activity)
41+
* May not have enough available memory to get a Heap Snapshot
42+
43+
### Debugging
44+
45+
To debug a memory issue we need to be able to see how much space our specific
46+
type of objects take, and what variables retain them to get garbage collected.
47+
For the effective debugging we also need to know the allocation pattern of our
48+
variables over time.
49+
50+
* [Using Heap Profiler](/en/docs/guides/diagnostics/memory/using-heap-profiler/)
51+
* [Using Heap Snapshot](/en/docs/guides/diagnostics/memory/using-heap-snapshot/)
52+
* [GC Traces](/en/docs/guides/diagnostics/memory/using-gc-traces)
53+
54+
## My process utilizes memory inefficiently
55+
56+
### Symptoms
57+
58+
The application uses an unexpected amount of memory and/or we observe elevated
59+
garbage collector activity.
60+
61+
### Side Effects
62+
63+
* An elevated number of page faults
64+
* Higher GC activity and CPU usage
65+
66+
### Debugging
67+
68+
To debug a memory issue we need to be able to see how much space our specific
69+
type of objects take, and what variables retain them to get garbage collected.
70+
For the effective debugging we also need to know the allocation pattern of our
71+
variables over time.
72+
73+
* [Using Heap Profiler](/en/docs/guides/diagnostics/memory/using-heap-profiler/)
74+
* [Using Heap Snapshot](/en/docs/guides/diagnostics/memory/using-heap-snapshot/)
75+
* [GC Traces](/en/docs/guides/diagnostics/memory/using-gc-traces)
Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
---
2+
title: Memory Diagnostics - Using GC Trace
3+
layout: docs.hbs
4+
---
5+
6+
# Tracing garbage collection
7+
8+
There's a lot to learn about how the garbage collector works, but if you learn
9+
one thing it's that when GC is running, your code is not.
10+
11+
You may want to know how often and how long the garbage collection is running,
12+
and what is the outcome.
13+
14+
## Runnig with garbage collection traces
15+
16+
You can see traces for garbage collection in console output of your process
17+
using the `--trace_gc` flag.
18+
19+
```console
20+
$ node --trace_gc app.js
21+
```
22+
23+
You might want to avoid getting traces from the entire lifetime of your
24+
process running on a server. In that case, set the flag from within the process,
25+
and switch it off once the need for tracing is over.
26+
27+
Here's how to print GC events to stdout for one minute.
28+
29+
```js
30+
const v8 = require('v8');
31+
v8.setFlagsFromString('--trace_gc');
32+
setTimeout(() => { v8.setFlagsFromString('--notrace_gc'); }, 60e3);
33+
```
34+
35+
### Examining a trace with `--trace_gc`
36+
37+
Obtained traces of garbage collection looks like the following lines.
38+
39+
```
40+
[19278:0x5408db0] 44 ms: Scavenge 2.3 (3.0) -> 1.9 (4.0) MB, 1.2 / 0.0 ms (average mu = 1.000, current mu = 1.000) allocation failure
41+
42+
[23521:0x10268b000] 120 ms: Mark-sweep 100.7 (122.7) -> 100.6 (122.7) MB, 0.15 / 0.0 ms (average mu = 0.132, current mu = 0.137) deserialize GC in old space requested
43+
```
44+
45+
This is how to interpret the trace data (for the second line):
46+
47+
<table>
48+
<tr>
49+
<th>Token value</th>
50+
<th>Interpretation</th>
51+
</tr>
52+
<tr>
53+
<td>23521</td>
54+
<td>PID of the running process</td>
55+
</tr>
56+
<tr>
57+
<td>0x10268db0</td>
58+
<td>Isolate (JS heap instance)</td>
59+
</tr>
60+
<tr>
61+
<td>120</td>
62+
<td>Time since the process start in ms</td>
63+
</tr>
64+
<tr>
65+
<td>Mark-sweep</td>
66+
<td>Type / Phase of GC</td>
67+
</tr>
68+
<tr>
69+
<td>100.7</td>
70+
<td>Heap used before GC in MB</td>
71+
</tr>
72+
<tr>
73+
<td>122.7</td>
74+
<td>Total heap before GC in MB</td>
75+
</tr>
76+
<tr>
77+
<td>100.6</td>
78+
<td>Heap used after GC in MB</td>
79+
</tr>
80+
<tr>
81+
<td>122.7</td>
82+
<td>Total heap after GC in MB</td>
83+
</tr>
84+
<tr>
85+
<td>0.15 / 0.0 <br/>
86+
(average mu = 0.132, current mu = 0.137)</td>
87+
<td>Time spent in GC in ms</td>
88+
</tr>
89+
<tr>
90+
<td>deserialize GC in old space requested</td>
91+
<td>Reason for GC</td>
92+
</tr>
93+
</table>
94+
95+
## Using performance hooks to trace garbage collection
96+
97+
In Node.js, you can use [performance hooks][] to trace
98+
garbage collection.
99+
100+
```js
101+
const { PerformanceObserver } = require('perf_hooks');
102+
103+
// Create a performance observer
104+
const obs = new PerformanceObserver((list) => {
105+
const entry = list.getEntries()[0];
106+
/*
107+
The entry would be an instance of PerformanceEntry containing
108+
metrics of garbage collection.
109+
For example:
110+
PerformanceEntry {
111+
name: 'gc',
112+
entryType: 'gc',
113+
startTime: 2820.567669,
114+
duration: 1.315709,
115+
kind: 1
116+
}
117+
*/
118+
});
119+
120+
// Subscribe notifications of GCs
121+
obs.observe({ entryTypes: ['gc'] });
122+
123+
// Stop subscription
124+
obs.disconnect();
125+
```
126+
127+
### Examining a trace with performance hooks
128+
129+
You can get GC statistics as [PerformanceEntry][] from the callback in
130+
[PerformanceObserver][].
131+
132+
For example:
133+
134+
```ts
135+
PerformanceEntry {
136+
name: 'gc',
137+
entryType: 'gc',
138+
startTime: 2820.567669,
139+
duration: 1.315709,
140+
kind: 1
141+
}
142+
```
143+
144+
<table>
145+
<tr>
146+
<th>Property</th>
147+
<th>Interpretation</th>
148+
</tr>
149+
<tr>
150+
<td>name</td>
151+
<td>The name of the performance entry.</td>
152+
</tr>
153+
<tr>
154+
<td>entryType</td>
155+
<td>The type of the performance entry. </td>
156+
</tr>
157+
<tr>
158+
<td>startTime</td>
159+
<td>The high resolution millisecond timestamp <br/> marking the starting time of the Performance Entry.</td>
160+
</tr>
161+
<tr>
162+
<td>duration</td>
163+
<td>The total number of milliseconds elapsed for this entry. </td>
164+
</tr>
165+
<tr>
166+
<td>kind</td>
167+
<td>The type of garbage collection operation that occurred.</td>
168+
</tr>
169+
<tr>
170+
<td>flags</td>
171+
<td>Additional information about garbage collection operation.</td>
172+
</tr>
173+
</table>
174+
175+
For more information, you can refer to
176+
[the documentation about performance hooks][performance hooks].
177+
178+
## Examples of diagnosing memory issues with trace option:
179+
180+
A. How to get context of bad allocations
181+
1. Suppose we observe that the old space is continously increasing.
182+
2. But due to heavy gc, the heap maximum is not hit, but the process is slow.
183+
3. Review the trace data and figure out how much is the total heap before and
184+
after the gc.
185+
4. Reduce `--max-old-space-size` such that the total heap is closer to the
186+
limit.
187+
5. Allow the program to run, hit the out of memory.
188+
6. The produced log shows the failing context.
189+
190+
B. How to assert whether there is a memory leak when heap growth is observed
191+
1. Suppose we observe that the old space is continously increasing.
192+
2. Due to heavy gc, the heap maximum is not hit, but the process is slow.
193+
3. Review the trace data and figure out how much is the total heap before and
194+
after the gc.
195+
4. Reduce `--max-old-space-size` such that the total heap is closer to the
196+
limit.
197+
5. Allow the program to run, see if it hits the out of memory.
198+
6. If it hits OOM, increment the heap size by ~10% or so and repeat few times.
199+
If the same pattern is observed, it is indicative of a memory leak.
200+
7. If there is no OOM, then freeze the heap size to that value - A packed heap
201+
reduces memory footprint and compaction latency.
202+
203+
C. How to assert whether too many gcs are happening or too many gcs are causing
204+
an overhead
205+
1. Review the trace data, specifically around time between consecutive gcs.
206+
2. Review the trace data, specifically around time spent in gc.
207+
3. If the time between two gc is less than the time spent in gc, the
208+
application is severely starving.
209+
4. If the time between two gcs and the time spent in gc are very high, probably
210+
the application can use a smaller heap.
211+
5. If the time between two gcs are much greater than the time spent in gc,
212+
application is relatively healthy.
213+
214+
[performance hooks]: https://nodejs.org/api/perf_hooks.html
215+
[PerformanceEntry]: https://nodejs.org/api/perf_hooks.html#perf_hooks_class_performanceentry
216+
[PerformanceObserver]: https://nodejs.org/api/perf_hooks.html#perf_hooks_class_performanceobserver

0 commit comments

Comments
 (0)