-
Notifications
You must be signed in to change notification settings - Fork 853
Fix binary emitting of br_if with a refined value by emitting a cast #6510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 100 commits
f835f74
0ae495f
7c7ff86
a1075d2
a787489
497d0b8
097970b
23f006c
1233da1
a5e6037
47bdc44
d152156
e285a11
205c4e9
cfdd78f
585f9c8
57879cf
235af4f
90dd4b8
532c214
4342d57
e8d5f6b
96c2bce
631adb9
f829de3
3f0ba33
bb762f9
bd82ccd
f6c5c19
d73efd9
a390d3c
17bf39e
ea37c27
a047fbf
799c03a
8a0fea1
5650369
fa6c3a2
ae6a400
5dc87a5
ac1de14
e74ad5e
4efb784
d91a9cf
92e3e63
9c84a04
726eed5
a9b1642
9d4c16f
148479b
6d005ca
8a69cee
a009110
85e9d4d
7eed7fe
3a18cae
34db25a
a96a4c9
769d792
41ba975
16dba3e
fe530c1
a4b346e
9176802
ae0d5fb
0b4dbde
a83f7be
f90858a
4cd8430
76c74d2
9d37132
57f1536
00cb780
30a18c8
8515ad9
f5449d8
1f07285
62b20ea
4e6ff90
bab64f5
37a3aa6
b0fbd98
136f191
7c0568b
7bf4103
adebbdd
ea0a050
3170ac9
e2eafea
55a13fd
f663842
70f4812
ad87777
f8e13e0
4d40f31
20df949
cd94ea1
d4294df
bc4f68e
f572f54
5301467
5a5a2af
2e79f34
1d6497c
03eb588
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -65,6 +65,60 @@ void BinaryInstWriter::visitLoop(Loop* curr) { | |||||||
| void BinaryInstWriter::visitBreak(Break* curr) { | ||||||||
| o << int8_t(curr->condition ? BinaryConsts::BrIf : BinaryConsts::Br) | ||||||||
| << U32LEB(getBreakIndex(curr->name)); | ||||||||
|
|
||||||||
| // See comment on |brIfsNeedingHandling| for the extra casts we need to emit | ||||||||
| // here for certain br_ifs. | ||||||||
| auto iter = brIfsNeedingHandling.find(curr); | ||||||||
| if (iter != brIfsNeedingHandling.end()) { | ||||||||
| auto unrefinedType = iter->second; | ||||||||
| auto type = curr->type; | ||||||||
| assert(type.size() == unrefinedType.size()); | ||||||||
|
|
||||||||
| assert(curr->type.hasRef()); | ||||||||
|
|
||||||||
| auto emitCast = [&](Type to, Type from) { | ||||||||
| // Shim a tiny bit of IR, just enough to get visitRefCast to see what we | ||||||||
| // are casting, and to emit the proper thing. | ||||||||
| LocalGet get; | ||||||||
| get.type = from; | ||||||||
| RefCast cast; | ||||||||
| cast.type = to; | ||||||||
| cast.ref = &get; | ||||||||
| visitRefCast(&cast); | ||||||||
| }; | ||||||||
|
|
||||||||
| if (!type.isTuple()) { | ||||||||
| // Simple: Just emit a cast, and then the type matches Binaryen IR's. | ||||||||
| emitCast(type, unrefinedType); | ||||||||
| } else { | ||||||||
| // Tuples are trickier to handle, and we need to use scratch locals. Stash | ||||||||
| // all the values on the stack to those locals, then reload them, casting | ||||||||
| // as we go. | ||||||||
| // | ||||||||
| // We must track how many scratch locals we've used from each type as we | ||||||||
| // go, as a type might appear multiple times in the tuple. We allocated | ||||||||
| // enough for each, in a contiguous range, so we just increment as we go. | ||||||||
| std::unordered_map<Type, Index> scratchTypeUses; | ||||||||
| for (Index i = 0; i < unrefinedType.size(); i++) { | ||||||||
| auto t = unrefinedType[unrefinedType.size() - i - 1]; | ||||||||
| assert(scratchLocals.find(t) != scratchLocals.end()); | ||||||||
| auto localIndex = scratchLocals[t] + scratchTypeUses[t]; | ||||||||
| scratchTypeUses[t]++; | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This avoids having to do the lookup twice.
Suggested change
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I worry it's a little less readable that way... and I'd hope the optimizer avoids a double lookup? What do you think?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A quick test on godbolt shows that even in the simplest possible case, the optimizer is not able to figure out how to avoid two lookups, unfortunately. I'd go with the single lookup solution, but I agree it's probably not that important.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting it can't be optimized... I wonder why. I'll modify this then. Seems like there is no other aliasing operation that can interfere, but I guess I was overly optimistic... |
||||||||
| o << int8_t(BinaryConsts::LocalSet) << U32LEB(localIndex); | ||||||||
| } | ||||||||
| for (Index i = 0; i < unrefinedType.size(); i++) { | ||||||||
| auto t = unrefinedType[i]; | ||||||||
| scratchTypeUses[t]--; | ||||||||
| auto localIndex = scratchLocals[t] + scratchTypeUses[t]; | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
| o << int8_t(BinaryConsts::LocalGet) << U32LEB(localIndex); | ||||||||
| if (t.isRef()) { | ||||||||
| // Note that we cast all types here, when perhaps only some of the | ||||||||
| // tuple's lanes need that. This is simpler. | ||||||||
| emitCast(type[i], t); | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
|
|
||||||||
| void BinaryInstWriter::visitSwitch(Switch* curr) { | ||||||||
|
|
@@ -2707,11 +2761,116 @@ InsertOrderedMap<Type, Index> BinaryInstWriter::countScratchLocals() { | |||||||
| count = std::max(count, 1u); | ||||||||
| } | ||||||||
| } | ||||||||
| }; | ||||||||
|
|
||||||||
| ScratchLocalFinder finder(*this); | ||||||||
| // As mentioned in BinaryInstWriter::visitBreak, the type of br_if with a | ||||||||
| // value may be more refined in Binaryen IR compared to the wasm spec, as we | ||||||||
| // give it the type of the value, while the spec gives it the type of the | ||||||||
| // block it targets. To avoid problems we must handle the case where a br_if | ||||||||
| // has a value, the value is more refined then the target, and the value is | ||||||||
| // not dropped (the last condition is very rare in real-world wasm, making | ||||||||
| // all of this a quite unusual situation). First, detect such situations by | ||||||||
| // seeing if we have br_ifs that return reference types at all. We do so by | ||||||||
| // counting them, and as we go we ignore ones that are dropped, since a | ||||||||
| // dropped value is not a problem for us. | ||||||||
| // | ||||||||
| // Note that we do not check all the conditions here, such as if the type | ||||||||
| // matches the break target, or if the parent is a cast, which we leave for | ||||||||
| // a more expensive analysis later, which we only run if we see something | ||||||||
| // suspicious here. | ||||||||
| Index numDangerousBrIfs = 0; | ||||||||
|
|
||||||||
| void visitBreak(Break* curr) { | ||||||||
| if (curr->type.hasRef()) { | ||||||||
| numDangerousBrIfs++; | ||||||||
| } | ||||||||
| } | ||||||||
|
|
||||||||
| void visitDrop(Drop* curr) { | ||||||||
| if (curr->value->is<Break>() && curr->value->type.hasRef()) { | ||||||||
| // The value is exactly a br_if of a ref, that we just visited before | ||||||||
| // us. Undo the ++ from there as it can be ignored. | ||||||||
| assert(numDangerousBrIfs > 0); | ||||||||
| numDangerousBrIfs--; | ||||||||
| } | ||||||||
| } | ||||||||
| } finder(*this); | ||||||||
| finder.walk(func->body); | ||||||||
|
|
||||||||
| if (!finder.numDangerousBrIfs || !parent.getModule()->features.hasGC()) { | ||||||||
| // Nothing more to do: either no such br_ifs, or GC is not enabled. | ||||||||
| // | ||||||||
| // The explicit check for GC is here because if only reference types are | ||||||||
| // enabled then we still may seem to need a fixup here, e.g. if a ref.func | ||||||||
| // is br_if'd to a block of type funcref. But that only appears that way | ||||||||
| // because in Binaryen IR we allow non-nullable types even without GC (and | ||||||||
| // if GC is not enabled then we always emit nullable types in the binary). | ||||||||
| // That is, even if we see a type difference without GC, it will vanish in | ||||||||
| // the binary format; there is never a need to add any ref.casts without GC | ||||||||
| // being enabled. | ||||||||
| return std::move(finder.scratches); | ||||||||
| } | ||||||||
|
|
||||||||
| // There are dangerous-looking br_ifs, so we must do the harder work to | ||||||||
| // actually investigate them in detail, including tracking block types. By | ||||||||
| // being fully precise here, we'll only emit casts when absolutely necessary, | ||||||||
| // which avoids repeated roundtrips adding more and more code. | ||||||||
| struct RefinementScanner : public ExpressionStackWalker<RefinementScanner> { | ||||||||
| BinaryInstWriter& writer; | ||||||||
| ScratchLocalFinder& finder; | ||||||||
|
|
||||||||
| RefinementScanner(BinaryInstWriter& writer, ScratchLocalFinder& finder) | ||||||||
| : writer(writer), finder(finder) {} | ||||||||
|
|
||||||||
| void visitBreak(Break* curr) { | ||||||||
| // See if this is one of the dangerous br_ifs we must handle. | ||||||||
| if (!curr->type.hasRef()) { | ||||||||
| // Not even a reference. | ||||||||
| return; | ||||||||
| } | ||||||||
| auto* parent = getParent(); | ||||||||
| if (parent) { | ||||||||
| if (parent->is<Drop>()) { | ||||||||
| // It is dropped anyhow. | ||||||||
| return; | ||||||||
| } | ||||||||
| if (auto* cast = parent->dynCast<RefCast>()) { | ||||||||
| if (Type::isSubType(cast->type, curr->type)) { | ||||||||
| // It is cast to the same type or a better one. In particular this | ||||||||
| // handles the case of repeated roundtripping: After the first | ||||||||
| // roundtrip we emit a cast that we'll identify here, and not emit | ||||||||
| // an additional one. | ||||||||
| return; | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
| auto* breakTarget = findBreakTarget(curr->name); | ||||||||
| auto unrefinedType = breakTarget->type; | ||||||||
| if (unrefinedType == curr->type) { | ||||||||
| // It has the proper type anyhow. | ||||||||
| return; | ||||||||
| } | ||||||||
|
Comment on lines
+2801
to
+2804
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After #6579 there will be no need to know the source type when emitting a cast, so this check will be the only remaining use for
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The overhead is annoying, I'm afraid, and I'd really like to not regress the by-far common case of not needing the extra work. Some data: And, with this patch: $ git diff
diff --git a/src/wasm/wasm-stack.cpp b/src/wasm/wasm-stack.cpp
index 53a25d52b..1a83f56cb 100644
--- a/src/wasm/wasm-stack.cpp
+++ b/src/wasm/wasm-stack.cpp
@@ -2681,7 +2681,7 @@ void BinaryInstWriter::noteLocalType(Type type, Index count) {
}
InsertOrderedMap<Type, Index> BinaryInstWriter::countScratchLocals() {
- struct ScratchLocalFinder : PostWalker<ScratchLocalFinder> {
+ struct ScratchLocalFinder : ExpressionStackWalker<ScratchLocalFinder> {
BinaryInstWriter& parent;
InsertOrderedMap<Type, Index> scratches;That's about 5% more instructions and walltime. Note that that measures load+save, so the impact on save is around double that.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oof, that's way too much slowdown :( |
||||||||
|
|
||||||||
| // Mark the br_if as needing handling, and add the type to the set of | ||||||||
| // types we need scratch tuple locals for (if relevant). | ||||||||
| writer.brIfsNeedingHandling[curr] = unrefinedType; | ||||||||
|
|
||||||||
| if (unrefinedType.isTuple()) { | ||||||||
| // We must allocate enough scratch locals for this tuple. Note that we | ||||||||
| // may need more than one per type in the tuple, if a type appears more | ||||||||
| // than once, so we count their appearances. | ||||||||
| InsertOrderedMap<Type, Index> scratchTypeUses; | ||||||||
| for (auto t : unrefinedType) { | ||||||||
| scratchTypeUses[t]++; | ||||||||
| } | ||||||||
| for (auto& [type, uses] : scratchTypeUses) { | ||||||||
| auto& count = finder.scratches[type]; | ||||||||
| count = std::max(count, uses); | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
| } refinementScanner(*this, finder); | ||||||||
| refinementScanner.walk(func->body); | ||||||||
|
|
||||||||
| return std::move(finder.scratches); | ||||||||
| } | ||||||||
|
|
||||||||
|
|
||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the stringview PR has now landed, there's no need to supply the
fromtype anymore, so this can be simplified a bit. I guess we still need theunrefinedTypein thebrIfsNeedingHandlingmap for the scratch locals, though.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah,
fromcan go away but we do still need to know the unrefined type because the scratch locals are of that type.