Commit 173c2a1
Recover stuck TIs when direct terminal-state API call fails (#66574)
* Recover stuck TIs when direct terminal-state API call fails
The supervisor's _handle_request for SucceedTask, RetryTask, DeferTask,
and RescheduleTask set _terminal_state BEFORE calling the matching
client.task_instances.{succeed,retry,defer,reschedule}() API. If that
API call raised (transient network blip, server 5xx, etc.),
_terminal_state was set on the supervisor but the server never saw
the transition. The supervisor's update_task_state_if_needed then
saw final_state in STATES_SENT_DIRECTLY and short-circuited the
recovery finish() call -- leaving the TaskInstance stuck RUNNING
on the server forever, blocking downstream dependencies and
triggering false alerts.
Two-part fix:
1. Make the direct API call FIRST. Only set _terminal_state and the
new _terminal_state_synced_to_server flag after the call returns
successfully. If the API raises, both stay unset and the exception
propagates to handle_requests, where the existing catch-all sends
an ErrorResponse to the task subprocess.
2. Have update_task_state_if_needed always call finish() when
_terminal_state_synced_to_server is False, regardless of what
final_state happens to return. The finish() API takes the state
value, so a SUCCESS / DEFERRED / etc. transition that originally
failed is re-attempted via finish() on subprocess exit.
Pre-existing semantics for the no-direct-API states (FAILED,
UP_FOR_RETRY without RetryTask, etc.) preserved -- those land in
the same finish() branch.
Tests added:
- _terminal_state not set when succeed() raises.
- update_task_state_if_needed calls finish() when synced flag is
False, even with final_state == SUCCESS.
- update_task_state_if_needed skips finish() when synced flag is
True (preserves the existing happy-path optimisation).
Reported by the L3 ASVS sweep at apache/tooling-agents#24 (FINDING-007).
* Refactor terminal-state dispatch and parametrize tests across all 4 states
Address review feedback on #66574:
- Extract `_send_terminal_state_msg` helper so the per-msg-type dispatch
for succeed / retry / defer / reschedule lives in one place. Both
`_handle_request` and `_replay_pending_terminal_state_msg` now go
through it instead of duplicating the four-branch isinstance chain.
- Parametrize the two recovery tests over all four terminal-state
message types (was only Succeed + Defer); add UP_FOR_RETRY and
UP_FOR_RESCHEDULE coverage.
* Narrow _pending_terminal_state_msg type to satisfy mypy
The field was annotated as BaseModel | None, but _send_terminal_state_msg
expects SucceedTask | RetryTask | DeferTask | RescheduleTask. mypy
couldn't prove the narrowing at the _replay_pending_terminal_state_msg
call site. Tighten the field type to the exact union the setter assigns
and the consumer accepts.
---------
Co-authored-by: vatsrahul1001 <rah.sharma11@gmail.com>
Co-authored-by: Rahul Vats <43964496+vatsrahul1001@users.noreply.github.com>1 parent f9faf65 commit 173c2a1
2 files changed
Lines changed: 219 additions & 24 deletions
File tree
- task-sdk
- src/airflow/sdk/execution_time
- tests/task_sdk/execution_time
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1152 | 1152 | | |
1153 | 1153 | | |
1154 | 1154 | | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
1155 | 1167 | | |
1156 | 1168 | | |
1157 | 1169 | | |
| |||
1269 | 1281 | | |
1270 | 1282 | | |
1271 | 1283 | | |
1272 | | - | |
1273 | | - | |
1274 | | - | |
1275 | | - | |
| 1284 | + | |
| 1285 | + | |
| 1286 | + | |
| 1287 | + | |
| 1288 | + | |
| 1289 | + | |
| 1290 | + | |
| 1291 | + | |
| 1292 | + | |
| 1293 | + | |
| 1294 | + | |
| 1295 | + | |
| 1296 | + | |
| 1297 | + | |
| 1298 | + | |
| 1299 | + | |
| 1300 | + | |
1276 | 1301 | | |
1277 | 1302 | | |
1278 | 1303 | | |
| |||
1281 | 1306 | | |
1282 | 1307 | | |
1283 | 1308 | | |
| 1309 | + | |
| 1310 | + | |
| 1311 | + | |
| 1312 | + | |
| 1313 | + | |
| 1314 | + | |
| 1315 | + | |
| 1316 | + | |
| 1317 | + | |
| 1318 | + | |
| 1319 | + | |
| 1320 | + | |
| 1321 | + | |
| 1322 | + | |
| 1323 | + | |
| 1324 | + | |
| 1325 | + | |
| 1326 | + | |
| 1327 | + | |
| 1328 | + | |
| 1329 | + | |
| 1330 | + | |
| 1331 | + | |
| 1332 | + | |
| 1333 | + | |
| 1334 | + | |
| 1335 | + | |
| 1336 | + | |
| 1337 | + | |
| 1338 | + | |
| 1339 | + | |
| 1340 | + | |
| 1341 | + | |
| 1342 | + | |
| 1343 | + | |
| 1344 | + | |
| 1345 | + | |
| 1346 | + | |
| 1347 | + | |
| 1348 | + | |
| 1349 | + | |
| 1350 | + | |
| 1351 | + | |
| 1352 | + | |
| 1353 | + | |
| 1354 | + | |
| 1355 | + | |
| 1356 | + | |
| 1357 | + | |
| 1358 | + | |
| 1359 | + | |
| 1360 | + | |
1284 | 1361 | | |
1285 | 1362 | | |
1286 | 1363 | | |
| |||
1452 | 1529 | | |
1453 | 1530 | | |
1454 | 1531 | | |
| 1532 | + | |
| 1533 | + | |
| 1534 | + | |
1455 | 1535 | | |
1456 | 1536 | | |
1457 | 1537 | | |
1458 | 1538 | | |
1459 | | - | |
1460 | 1539 | | |
1461 | 1540 | | |
1462 | | - | |
1463 | | - | |
1464 | | - | |
1465 | | - | |
1466 | | - | |
1467 | | - | |
1468 | | - | |
| 1541 | + | |
1469 | 1542 | | |
1470 | | - | |
1471 | 1543 | | |
1472 | 1544 | | |
1473 | | - | |
1474 | | - | |
1475 | | - | |
1476 | | - | |
1477 | | - | |
1478 | | - | |
1479 | | - | |
| 1545 | + | |
1480 | 1546 | | |
1481 | 1547 | | |
1482 | 1548 | | |
| |||
1512 | 1578 | | |
1513 | 1579 | | |
1514 | 1580 | | |
1515 | | - | |
1516 | 1581 | | |
1517 | | - | |
| 1582 | + | |
1518 | 1583 | | |
1519 | | - | |
1520 | | - | |
| 1584 | + | |
1521 | 1585 | | |
1522 | 1586 | | |
1523 | 1587 | | |
| |||
Lines changed: 131 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3079 | 3079 | | |
3080 | 3080 | | |
3081 | 3081 | | |
| 3082 | + | |
| 3083 | + | |
| 3084 | + | |
| 3085 | + | |
| 3086 | + | |
| 3087 | + | |
| 3088 | + | |
| 3089 | + | |
| 3090 | + | |
| 3091 | + | |
| 3092 | + | |
| 3093 | + | |
| 3094 | + | |
| 3095 | + | |
| 3096 | + | |
| 3097 | + | |
| 3098 | + | |
| 3099 | + | |
| 3100 | + | |
| 3101 | + | |
| 3102 | + | |
| 3103 | + | |
| 3104 | + | |
| 3105 | + | |
| 3106 | + | |
| 3107 | + | |
| 3108 | + | |
| 3109 | + | |
| 3110 | + | |
| 3111 | + | |
| 3112 | + | |
| 3113 | + | |
| 3114 | + | |
| 3115 | + | |
| 3116 | + | |
| 3117 | + | |
| 3118 | + | |
| 3119 | + | |
| 3120 | + | |
| 3121 | + | |
| 3122 | + | |
| 3123 | + | |
| 3124 | + | |
| 3125 | + | |
| 3126 | + | |
| 3127 | + | |
| 3128 | + | |
| 3129 | + | |
| 3130 | + | |
| 3131 | + | |
| 3132 | + | |
| 3133 | + | |
| 3134 | + | |
| 3135 | + | |
| 3136 | + | |
| 3137 | + | |
| 3138 | + | |
| 3139 | + | |
| 3140 | + | |
| 3141 | + | |
| 3142 | + | |
| 3143 | + | |
| 3144 | + | |
| 3145 | + | |
| 3146 | + | |
| 3147 | + | |
| 3148 | + | |
| 3149 | + | |
| 3150 | + | |
| 3151 | + | |
| 3152 | + | |
| 3153 | + | |
| 3154 | + | |
| 3155 | + | |
| 3156 | + | |
| 3157 | + | |
| 3158 | + | |
| 3159 | + | |
| 3160 | + | |
| 3161 | + | |
| 3162 | + | |
| 3163 | + | |
| 3164 | + | |
| 3165 | + | |
| 3166 | + | |
| 3167 | + | |
| 3168 | + | |
| 3169 | + | |
| 3170 | + | |
| 3171 | + | |
| 3172 | + | |
| 3173 | + | |
| 3174 | + | |
| 3175 | + | |
| 3176 | + | |
| 3177 | + | |
| 3178 | + | |
| 3179 | + | |
| 3180 | + | |
| 3181 | + | |
| 3182 | + | |
| 3183 | + | |
| 3184 | + | |
| 3185 | + | |
| 3186 | + | |
| 3187 | + | |
| 3188 | + | |
| 3189 | + | |
| 3190 | + | |
| 3191 | + | |
| 3192 | + | |
| 3193 | + | |
| 3194 | + | |
| 3195 | + | |
| 3196 | + | |
| 3197 | + | |
| 3198 | + | |
| 3199 | + | |
| 3200 | + | |
| 3201 | + | |
| 3202 | + | |
| 3203 | + | |
| 3204 | + | |
| 3205 | + | |
| 3206 | + | |
| 3207 | + | |
| 3208 | + | |
| 3209 | + | |
| 3210 | + | |
| 3211 | + | |
| 3212 | + | |
3082 | 3213 | | |
3083 | 3214 | | |
3084 | 3215 | | |
| |||
0 commit comments