-
-
Notifications
You must be signed in to change notification settings - Fork 33.8k
gh-116968: Reimplement Tier 2 counters #117144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 20 commits
e7e819b
8c74dfa
79036ff
54c1f8e
015bb00
1730295
95f93b7
7df0f10
925cae7
f0c7fb0
8f79a60
149e9c4
1d76112
a5ffe02
ce7726c
e2c39f2
8d22790
cd8264e
d72c2ef
f6bf194
b991843
9b9f26a
354cd81
c1de44f
1fe27c9
225ea17
63d8bc7
8aa2c75
7bd4daa
3f1c58a
8ce8068
7bb5618
63e286c
7f64392
8eee1b4
42c1f26
a80cd0a
545c60e
6c0bb30
a7c9b6d
3fee35f
df6f34c
72f6b0d
dcee362
f38d922
ef6366b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,117 @@ | ||||||
|
|
||||||
| #ifndef Py_INTERNAL_BACKOFF_H | ||||||
| #define Py_INTERNAL_BACKOFF_H | ||||||
| #ifdef __cplusplus | ||||||
| extern "C" { | ||||||
| #endif | ||||||
|
|
||||||
| #ifndef Py_BUILD_CORE | ||||||
| # error "this header requires Py_BUILD_CORE define" | ||||||
| #endif | ||||||
|
|
||||||
| #include <assert.h> | ||||||
| #include <stdbool.h> | ||||||
| #include <stdint.h> | ||||||
|
|
||||||
| /* 16-bit countdown counters using exponential backoff. | ||||||
| These are used by the adaptive specializer to count down until | ||||||
| it is time to specialize an instruction. If specialization fails | ||||||
| the counter is reset using exponential backoff. | ||||||
| Another use is for the Tier 2 optimizer to decide when to create | ||||||
| a new Tier 2 trace (executor). Again, exponential backoff is used. | ||||||
| The 16-bit counter is structured as a 12-bit unsigned 'value' | ||||||
| and a 4-bit 'backoff' field. When resetting the counter, the | ||||||
| backoff field is incremented (until it reaches a limit) and the | ||||||
| value is set to a bit mask representing the value 2**backoff - 1. | ||||||
| The maximum backoff is 12 (the number of value bits). | ||||||
| There is an exceptional value which must not be updated, 0xFFFF. | ||||||
| */ | ||||||
|
|
||||||
| typedef struct { | ||||||
| union { | ||||||
| uint16_t counter; | ||||||
| struct { | ||||||
| uint16_t value : 12; | ||||||
| uint16_t backoff : 4; | ||||||
| }; | ||||||
| }; | ||||||
| } backoff_counter_t; | ||||||
|
|
||||||
| static_assert(sizeof(backoff_counter_t) == 2, "backoff counter size should be 2 bytes"); | ||||||
|
||||||
| static_assert(sizeof(backoff_counter_t) == 2, "backoff counter size should be 2 bytes"); | |
| static_assert(sizeof(backoff_counter_t) == sizeof(_Py_CODEUNIT), "backoff counter size should be the same size as a code unit"); |
gvanrossum marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This used to restart the counter, increasing the backoff, rather than resetting it?
Maybe restart_counter_and_backoff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it increments backoff and restarts the counter at 2**(backoff). I renamed it to restart_backoff_counter. (All the APIs here have backoff_counter in their name, to distinguish them from other forms of counters.)
gvanrossum marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
gvanrossum marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid implementation details, maybe just counter_triggers?
Or merge into the "advance" which could return a bool indicating whether it has reached the threshold:
bool backoff_counter_advance(backoff_counter_t *counter)| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -448,18 +448,14 @@ write_location_entry_start(uint8_t *ptr, int code, int length) | |||||
|
|
||||||
| /** Counters | ||||||
| * The first 16-bit value in each inline cache is a counter. | ||||||
| * When counting misses, the counter is treated as a simple unsigned value. | ||||||
| * | ||||||
| * When counting executions until the next specialization attempt, | ||||||
| * exponential backoff is used to reduce the number of specialization failures. | ||||||
| * The high 12 bits store the counter, the low 4 bits store the backoff exponent. | ||||||
| * On a specialization failure, the backoff exponent is incremented and the | ||||||
| * counter set to (2**backoff - 1). | ||||||
| * Backoff == 6 -> starting counter == 63, backoff == 10 -> starting counter == 1023. | ||||||
| * See pycore_backoff.h for more details. | ||||||
| * On a specialization failure, the backoff counter is reset. | ||||||
|
||||||
| * On a specialization failure, the backoff counter is reset. | |
| * On a specialization failure, the backoff counter is restarted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe drop the adaptive_counter functions and replace them with the backoff_counter equivalents?
To do that cleanly we will probably need to change _Py_CODEUNIT to:
typedef union {
uint16_t cache;
struct {
uint8_t code;
uint8_t arg;
} op;
backoff_counter_t counter;
} _Py_CODEUNIT;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to keep them and not expose the backoff counter structure to other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could dispense with adaptive_counter_bits(), using make_backoff_counter() directly below, but I would oppose getting rid of the cooldown() and warmup() helpers, because they are used in several/many places.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this confusing. Is there a way to do this with a single call, rather than two calls and a field read?
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,6 +8,7 @@ | |
|
|
||
| #include "Python.h" | ||
| #include "pycore_abstract.h" // _PyIndex_Check() | ||
| #include "pycore_backoff.h" | ||
| #include "pycore_code.h" | ||
| #include "pycore_emscripten_signal.h" // _Py_CHECK_EMSCRIPTEN_SIGNALS | ||
| #include "pycore_function.h" | ||
|
|
@@ -2348,41 +2349,35 @@ dummy_func( | |
| JUMPBY(-oparg); | ||
| #if ENABLE_SPECIALIZATION | ||
| uint16_t counter = this_instr[1].cache; | ||
| this_instr[1].cache = counter + (1 << OPTIMIZER_BITS_IN_COUNTER); | ||
| /* We are using unsigned values, but we really want signed values, so | ||
| * do the 2s complement adjustment manually */ | ||
| uint32_t offset_counter = counter ^ (1 << 15); | ||
| uint32_t threshold = tstate->interp->optimizer_backedge_threshold; | ||
| assert((threshold & OPTIMIZER_BITS_MASK) == 0); | ||
| // Use '>=' not '>' so that the optimizer/backoff bits do not effect the result. | ||
| // Double-check that the opcode isn't instrumented or something: | ||
| if (offset_counter >= threshold && this_instr->op.code == JUMP_BACKWARD) { | ||
| _Py_CODEUNIT *start = this_instr; | ||
| /* Back up over EXTENDED_ARGs so optimizer sees the whole instruction */ | ||
| while (oparg > 255) { | ||
| oparg >>= 8; | ||
| start--; | ||
| } | ||
| _PyExecutorObject *executor; | ||
| int optimized = _PyOptimizer_Optimize(frame, start, stack_pointer, &executor); | ||
| ERROR_IF(optimized < 0, error); | ||
| if (optimized) { | ||
| assert(tstate->previous_executor == NULL); | ||
| tstate->previous_executor = Py_None; | ||
| GOTO_TIER_TWO(executor); | ||
| if (ADAPTIVE_COUNTER_IS_ZERO(counter)) { | ||
|
||
| if (counter == 0) { | ||
| // Dynamically initialize the counter | ||
| counter = tstate->interp->optimizer_backedge_threshold; | ||
| this_instr[1].cache = counter; | ||
| } | ||
| else { | ||
| int backoff = this_instr[1].cache & OPTIMIZER_BITS_MASK; | ||
| backoff++; | ||
| if (backoff < MIN_TIER2_BACKOFF) { | ||
| backoff = MIN_TIER2_BACKOFF; | ||
| if (ADAPTIVE_COUNTER_IS_ZERO(counter) && this_instr->op.code == JUMP_BACKWARD) { | ||
| _Py_CODEUNIT *start = this_instr; | ||
| /* Back up over EXTENDED_ARGs so optimizer sees the whole instruction */ | ||
| while (oparg > 255) { | ||
| oparg >>= 8; | ||
| start--; | ||
| } | ||
| _PyExecutorObject *executor; | ||
| int optimized = _PyOptimizer_Optimize(frame, start, stack_pointer, &executor); | ||
| ERROR_IF(optimized < 0, error); | ||
| if (optimized) { | ||
| assert(tstate->previous_executor == NULL); | ||
| tstate->previous_executor = Py_None; | ||
| GOTO_TIER_TWO(executor); | ||
| } | ||
| else if (backoff > MAX_TIER2_BACKOFF) { | ||
| backoff = MAX_TIER2_BACKOFF; | ||
| else { | ||
| this_instr[1].cache = adaptive_counter_backoff(counter); | ||
| } | ||
| this_instr[1].cache = ((UINT16_MAX << OPTIMIZER_BITS_IN_COUNTER) << backoff) | backoff; | ||
| } | ||
| } | ||
| else { | ||
| DECREMENT_ADAPTIVE_COUNTER(this_instr[1].cache); | ||
| } | ||
| #endif /* ENABLE_SPECIALIZATION */ | ||
| } | ||
|
|
||
|
|
@@ -3973,7 +3968,12 @@ dummy_func( | |
| ERROR_IF(next_opcode < 0, error); | ||
| next_instr = this_instr; | ||
| if (_PyOpcode_Caches[next_opcode]) { | ||
| INCREMENT_ADAPTIVE_COUNTER(this_instr[1].cache); | ||
| if (next_opcode != POP_JUMP_IF_FALSE && | ||
| next_opcode != POP_JUMP_IF_TRUE && | ||
| next_opcode != POP_JUMP_IF_NOT_NONE && | ||
| next_opcode != POP_JUMP_IF_NONE) { | ||
| INCREMENT_ADAPTIVE_COUNTER(next_instr[1].cache); | ||
| } | ||
gvanrossum marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| } | ||
| assert(next_opcode > 0 && next_opcode < 256); | ||
| opcode = next_opcode; | ||
|
|
@@ -4159,34 +4159,36 @@ dummy_func( | |
| tier2 op(_COLD_EXIT, (--)) { | ||
| _PyExecutorObject *previous = (_PyExecutorObject *)tstate->previous_executor; | ||
| _PyExitData *exit = &previous->exits[oparg]; | ||
| exit->temperature++; | ||
| PyCodeObject *code = _PyFrame_GetCode(frame); | ||
| _Py_CODEUNIT *target = _PyCode_CODE(code) + exit->target; | ||
| if (exit->temperature < (int32_t)tstate->interp->optimizer_side_threshold) { | ||
| GOTO_TIER_ONE(target); | ||
| } | ||
| _PyExecutorObject *executor; | ||
| if (target->op.code == ENTER_EXECUTOR) { | ||
| executor = code->co_executors->executors[target->op.arg]; | ||
| Py_INCREF(executor); | ||
| } else { | ||
| int optimized = _PyOptimizer_Optimize(frame, target, stack_pointer, &executor); | ||
| if (optimized <= 0) { | ||
| int32_t new_temp = -1 * tstate->interp->optimizer_side_threshold; | ||
| exit->temperature = (new_temp < INT16_MIN) ? INT16_MIN : new_temp; | ||
| if (optimized < 0) { | ||
| Py_DECREF(previous); | ||
| tstate->previous_executor = Py_None; | ||
| GOTO_UNWIND(); | ||
| #if ENABLE_SPECIALIZATION | ||
gvanrossum marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
gvanrossum marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| if (ADAPTIVE_COUNTER_IS_ZERO(exit->temperature)) { | ||
| _PyExecutorObject *executor; | ||
| if (target->op.code == ENTER_EXECUTOR) { | ||
| executor = code->co_executors->executors[target->op.arg]; | ||
| Py_INCREF(executor); | ||
| } | ||
| else { | ||
| int optimized = _PyOptimizer_Optimize(frame, target, stack_pointer, &executor); | ||
| if (optimized <= 0) { | ||
| exit->temperature = adaptive_counter_backoff(exit->temperature); | ||
| if (optimized < 0) { | ||
| Py_DECREF(previous); | ||
| tstate->previous_executor = Py_None; | ||
| GOTO_UNWIND(); | ||
| } | ||
| GOTO_TIER_ONE(target); | ||
| } | ||
| GOTO_TIER_ONE(target); | ||
| } | ||
| /* We need two references. One to store in exit->executor and | ||
| * one to keep the executor alive when executing. */ | ||
| Py_INCREF(executor); | ||
| exit->executor = executor; | ||
| GOTO_TIER_TWO(executor); | ||
| } | ||
| /* We need two references. One to store in exit->executor and | ||
| * one to keep the executor alive when executing. */ | ||
| Py_INCREF(executor); | ||
| exit->executor = executor; | ||
| GOTO_TIER_TWO(executor); | ||
| DECREMENT_ADAPTIVE_COUNTER(exit->temperature); | ||
| #endif | ||
| GOTO_TIER_ONE(target); | ||
| } | ||
|
|
||
| tier2 op(_START_EXECUTOR, (executor/4 --)) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is looks like this is just used for
is_unreachable_backoff_counter. Maybe havebackoff == 15 && value == MAXindicate that it is unreachable?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also used to get the full 16-bit value as an int. And I like that the unreachable value cannot be constructed from a (value, backoff) pair because the backoff needs to be <= 12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I had to break down and remove the
assert(backoff <= 12)because there were corner cases where the "unreachable" value would be put back in line this. But I still feel that.counteris useful -- it is how abackoff_counter_tvalue is converted to a plainuint16_tto store in certain (more) public spots.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that dropping the
counterhere and allowingbackoff_counter_tin more "public spots" would make the code considerably simpler.