Skip to content

Commit de91dc7

Browse files
committed
egraphs: don't let rematerialization override LICM.
This reworks the way that remat and LICM interact during aegraph elaboration. In principle, both happen during the same single-pass "code placement" algorithm: we decide where to place pure instructions (those that are eligible for movement), and remat pushes them one way while LICM pushes them the other. The interaction is a little more subtle than simple heuristic priority, though -- it's really a decision ordering issue. A remat'd value wants to sink as deep into the loop nest as it can (to the use's block), but we don't know *where* the uses go until we process them (and make LICM-related choices), and we process uses after defs during elaboration. Or more precisely, we have some work at the use before recursively processing the def, and some work after the recursion returns; and the LICM decision happens after recursion returns, because LICM wants to know where the defs are to know how high we can hoist. (The recursion is itself unrolled into a state machine on an explicit stack so that's a little hard to see but that's what is happening in principle.) The solution here is to make remat a separate just-in-time thing, once we have arg values. Just before we plug the final arg values into the elaborated instruction, we ask: is this a remat'd value, and if so, do we have a copy of the computation in this block yet. If not, we make one. This has to happen in two places (the main elab loop and the toplevel driver from the skeleton). The one downside of this solution is that it doesn't handle *recursive* rematerialization by default. This means that if we, for example, decide to remat single-constant-arg adds (as we actually do in our current rules), we won't then also recursively remat the constant arg to those adds. This can be seen in the `licm.clif` test case. This doesn't seem to be a dealbreaker to me because most such cases will be able to fold the constants anyway (they happen mostly because of pointer pre-computations: a loop over structs in Wasm computes heap_base + p + offset, and naive LICM pulls a `heap_base + offset` out of the loop for every struct field accessed in the loop, with horrible register pressure resulting; that's why we have that remat rule. Most such offsets are pretty small.). Fixes #7283.
1 parent 8d19280 commit de91dc7

7 files changed

Lines changed: 162 additions & 91 deletions

File tree

cranelift/codegen/src/egraph.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -670,7 +670,7 @@ pub(crate) struct Stats {
670670
pub(crate) elaborate_visit_node: u64,
671671
pub(crate) elaborate_memoize_hit: u64,
672672
pub(crate) elaborate_memoize_miss: u64,
673-
pub(crate) elaborate_memoize_miss_remat: u64,
673+
pub(crate) elaborate_remat: u64,
674674
pub(crate) elaborate_licm_hoist: u64,
675675
pub(crate) elaborate_func: u64,
676676
pub(crate) elaborate_func_pre_insts: u64,

cranelift/codegen/src/egraph/elaborate.rs

Lines changed: 101 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@ use super::cost::{pure_op_cost, Cost};
55
use super::domtree::DomTreeWithChildren;
66
use super::Stats;
77
use crate::dominator_tree::DominatorTree;
8-
use crate::fx::FxHashSet;
8+
use crate::fx::{FxHashMap, FxHashSet};
9+
use crate::hash_map::Entry as HashEntry;
910
use crate::ir::{Block, Function, Inst, Value, ValueDef};
1011
use crate::loop_analysis::{Loop, LoopAnalysis, LoopLevel};
1112
use crate::scoped_hash_map::ScopedHashMap;
@@ -56,6 +57,8 @@ pub(crate) struct Elaborator<'a> {
5657
elab_result_stack: Vec<ElaboratedValue>,
5758
/// Explicitly-unrolled block elaboration stack.
5859
block_stack: Vec<BlockStackEntry>,
60+
/// Copies of values that have been rematerialized.
61+
remat_copies: FxHashMap<(Block, Value), Value>,
5962
/// Stats for various events during egraph processing, to help
6063
/// with optimization of this infrastructure.
6164
stats: &'a mut Stats,
@@ -95,7 +98,6 @@ enum ElabStackEntry {
9598
inst: Inst,
9699
result_idx: usize,
97100
num_args: usize,
98-
remat: bool,
99101
before: Inst,
100102
},
101103
}
@@ -134,6 +136,7 @@ impl<'a> Elaborator<'a> {
134136
elab_stack: vec![],
135137
elab_result_stack: vec![],
136138
block_stack: vec![],
139+
remat_copies: FxHashMap::default(),
137140
stats,
138141
}
139142
}
@@ -258,8 +261,48 @@ impl<'a> Elaborator<'a> {
258261
self.elab_result_stack.pop().unwrap()
259262
}
260263

264+
/// Possibly rematerialize the instruction producing the value in
265+
/// `arg` and rewrite `arg` to refer to it, if needed. Returns
266+
/// `true` if a rewrite occurred.
267+
fn maybe_remat_arg(
268+
remat_values: &FxHashSet<Value>,
269+
func: &mut Function,
270+
remat_copies: &mut FxHashMap<(Block, Value), Value>,
271+
insert_block: Block,
272+
before: Inst,
273+
arg: &mut ElaboratedValue,
274+
stats: &mut Stats,
275+
) -> bool {
276+
// TODO: we may want to consider recursive rematerialization
277+
// as well. We could process the arguments of the
278+
// rematerialized instruction up to a certain depth. This
279+
// would affect, e.g., adds-with-one-constant-arg, which are
280+
// currently rematerialized. Right now we don't do this, to
281+
// avoid the need for another fixpoint loop here.
282+
if arg.in_block != insert_block && remat_values.contains(&arg.value) {
283+
let new_value = match remat_copies.entry((insert_block, arg.value)) {
284+
HashEntry::Occupied(o) => *o.get(),
285+
HashEntry::Vacant(v) => {
286+
let inst = func.dfg.value_def(arg.value).inst().unwrap();
287+
debug_assert_eq!(func.dfg.inst_results(inst).len(), 1);
288+
let new_inst = func.dfg.clone_inst(inst);
289+
func.layout.insert_inst(new_inst, before);
290+
let new_result = func.dfg.inst_results(new_inst)[0];
291+
*v.insert(new_result)
292+
}
293+
};
294+
trace!("rematerialized {} as {}", arg.value, new_value);
295+
arg.value = new_value;
296+
stats.elaborate_remat += 1;
297+
true
298+
} else {
299+
false
300+
}
301+
}
302+
261303
fn process_elab_stack(&mut self) {
262304
while let Some(entry) = self.elab_stack.last() {
305+
trace!("elab_stack: {:?}", entry);
263306
match entry {
264307
&ElabStackEntry::Start { value, before } => {
265308
// We always replace the Start entry, so pop it now.
@@ -283,39 +326,17 @@ impl<'a> Elaborator<'a> {
283326
// eclass.
284327
trace!("looking up best value for {}", value);
285328
let (_, best_value) = self.value_to_best_value[value];
286-
debug_assert_ne!(best_value, Value::reserved_value());
287329
trace!("elaborate: value {} -> best {}", value, best_value);
330+
debug_assert_ne!(best_value, Value::reserved_value());
331+
332+
if let Some(elab_val) = self.value_to_elaborated_value.get(&canonical_value) {
333+
// Value is available; use it.
334+
trace!("elaborate: value {} -> {:?}", value, elab_val);
335+
self.stats.elaborate_memoize_hit += 1;
336+
self.elab_result_stack.push(*elab_val);
337+
continue;
338+
}
288339

289-
let remat = if let Some(elab_val) =
290-
self.value_to_elaborated_value.get(&canonical_value)
291-
{
292-
// Value is available. Look at the defined
293-
// block, and determine whether this node kind
294-
// allows rematerialization if the value comes
295-
// from another block. If so, ignore the hit
296-
// and recompute below.
297-
let remat = elab_val.in_block != self.cur_block
298-
&& self.remat_values.contains(&best_value);
299-
if !remat {
300-
trace!("elaborate: value {} -> {:?}", value, elab_val);
301-
self.stats.elaborate_memoize_hit += 1;
302-
self.elab_result_stack.push(*elab_val);
303-
continue;
304-
}
305-
trace!("elaborate: value {} -> remat", canonical_value);
306-
self.stats.elaborate_memoize_miss_remat += 1;
307-
// The op is pure at this point, so it is always valid to
308-
// remove from this map.
309-
self.value_to_elaborated_value.remove(&canonical_value);
310-
true
311-
} else {
312-
// Value not available; but still look up
313-
// whether it's been flagged for remat because
314-
// this affects placement.
315-
let remat = self.remat_values.contains(&best_value);
316-
trace!(" -> not present in map; remat = {}", remat);
317-
remat
318-
};
319340
self.stats.elaborate_memoize_miss += 1;
320341

321342
// Now resolve the value to its definition to see
@@ -363,7 +384,6 @@ impl<'a> Elaborator<'a> {
363384
inst,
364385
result_idx,
365386
num_args,
366-
remat,
367387
before,
368388
});
369389

@@ -380,25 +400,23 @@ impl<'a> Elaborator<'a> {
380400
inst,
381401
result_idx,
382402
num_args,
383-
remat,
384403
before,
385404
} => {
386405
self.elab_stack.pop();
387406

388407
trace!(
389-
"PendingInst: {} result {} args {} remat {} before {}",
408+
"PendingInst: {} result {} args {} before {}",
390409
inst,
391410
result_idx,
392411
num_args,
393-
remat,
394412
before
395413
);
396414

397415
// We should have all args resolved at this
398416
// point. Grab them and drain them out, removing
399417
// them.
400418
let arg_idx = self.elab_result_stack.len() - num_args;
401-
let arg_values = &self.elab_result_stack[arg_idx..];
419+
let arg_values = &mut self.elab_result_stack[arg_idx..];
402420

403421
// Compute max loop depth.
404422
//
@@ -444,16 +462,15 @@ impl<'a> Elaborator<'a> {
444462

445463
// We know that this is a pure inst, because
446464
// non-pure roots have already been placed in the
447-
// value-to-elab'd-value map and are never subject
448-
// to remat, so they will not reach this stage of
449-
// processing.
465+
// value-to-elab'd-value map, so they will not
466+
// reach this stage of processing.
450467
//
451468
// We now must determine the location at which we
452469
// place the instruction. This is the current
453470
// block *unless* we hoist above a loop when all
454471
// args are loop-invariant (and this op is pure).
455472
let (scope_depth, before, insert_block) =
456-
if loop_hoist_level == self.loop_stack.len() || remat {
473+
if loop_hoist_level == self.loop_stack.len() {
457474
// Depends on some value at the current
458475
// loop depth, or remat forces it here:
459476
// place it at the current location.
@@ -486,16 +503,39 @@ impl<'a> Elaborator<'a> {
486503
insert_block
487504
);
488505

489-
// Now we need to place `inst` at the computed
490-
// location (just before `before`). Note that
491-
// `inst` may already have been placed somewhere
492-
// else, because a pure node may be elaborated at
493-
// more than one place. In this case, we need to
494-
// duplicate the instruction (and return the
495-
// `Value`s for that duplicated instance
496-
// instead).
506+
// Now that we have the location for the
507+
// instruction, check if any of its args are remat
508+
// values. If so, and if we don't have a copy of
509+
// the rematerializing instruction for this block
510+
// yet, create one.
511+
let mut remat_arg = false;
512+
for arg_value in arg_values.iter_mut() {
513+
if Self::maybe_remat_arg(
514+
&self.remat_values,
515+
&mut self.func,
516+
&mut self.remat_copies,
517+
insert_block,
518+
before,
519+
arg_value,
520+
&mut self.stats,
521+
) {
522+
remat_arg = true;
523+
}
524+
}
525+
526+
// Now we need to place `inst` at the computed
527+
// location (just before `before`). Note that
528+
// `inst` may already have been placed somewhere
529+
// else, because a pure node may be elaborated at
530+
// more than one place. In this case, we need to
531+
// duplicate the instruction (and return the
532+
// `Value`s for that duplicated instance instead).
533+
//
534+
// Also clone if we rematerialized, because we
535+
// don't want to rewrite the args in the original
536+
// copy.
497537
trace!("need inst {} before {}", inst, before);
498-
let inst = if self.func.layout.inst_block(inst).is_some() {
538+
let inst = if self.func.layout.inst_block(inst).is_some() || remat_arg {
499539
// Clone the inst!
500540
let new_inst = self.func.dfg.clone_inst(inst);
501541
trace!(
@@ -612,7 +652,16 @@ impl<'a> Elaborator<'a> {
612652
// Elaborate the arg, placing any newly-inserted insts
613653
// before `before`. Get the updated value, which may
614654
// be different than the original.
615-
let new_arg = self.elaborate_eclass_use(*arg, before);
655+
let mut new_arg = self.elaborate_eclass_use(*arg, before);
656+
Self::maybe_remat_arg(
657+
&self.remat_values,
658+
&mut self.func,
659+
&mut self.remat_copies,
660+
block,
661+
inst,
662+
&mut new_arg,
663+
&mut self.stats,
664+
);
616665
trace!(" -> rewrote arg to {:?}", new_arg);
617666
*arg = new_arg.value;
618667
}

cranelift/codegen/src/fx.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@ use super::{HashMap, HashSet};
1414
use core::default::Default;
1515
use core::hash::{BuildHasherDefault, Hash, Hasher};
1616
use core::ops::BitXor;
17-
1817
pub type FxHashMap<K, V> = HashMap<K, V, BuildHasherDefault<FxHasher>>;
1918
pub type FxHashSet<V> = HashSet<V, BuildHasherDefault<FxHasher>>;
2019

cranelift/codegen/src/scoped_hash_map.rs

Lines changed: 0 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -188,19 +188,6 @@ where
188188
.checked_sub(1)
189189
.expect("generation_by_depth cannot be empty")
190190
}
191-
192-
/// Remote an entry.
193-
pub fn remove(&mut self, key: &K) -> Option<V> {
194-
self.map.remove(key).and_then(|val| {
195-
let entry_generation = val.generation;
196-
let entry_depth = val.level as usize;
197-
if self.generation_by_depth.get(entry_depth).cloned() == Some(entry_generation) {
198-
Some(val.value)
199-
} else {
200-
None
201-
}
202-
})
203-
}
204191
}
205192

206193
#[cfg(test)]

0 commit comments

Comments
 (0)