Remove orphaned widgets from PerseusRenderers when parsing#3327
Remove orphaned widgets from PerseusRenderers when parsing#3327benchristel wants to merge 5 commits intomainfrom
Conversation
…nderers when parsing
…haned" (i.e. not referenced in the content string) are now removed during parsing.
🗄️ Schema Change: No Changes ✅ |
|
Size Change: +343 B (+0.07%) Total Size: 487 kB
ℹ️ View Unchanged
|
🛠️ Item Splitting: Changes Detected
|
npm Snapshot: PublishedGood news!! We've packaged up the latest commit from this PR (9e70862) and published it to npm. You Example: pnpm add @khanacademy/perseus@PR3327If you are working in Khan Academy's frontend, you can run the below command. ./dev/tools/bump_perseus_version.ts -t PR3327If you are working in Khan Academy's webapp, you can run the below command. ./dev/tools/bump_perseus_version.js -t PR3327 |
There was a problem hiding this comment.
LGTM — small, well-tested refactor using established parser patterns.
Extended reasoning...
Overview
This PR adds a removeOrphanedWidgets post-processing step to the parsePerseusRenderer parser pipeline. It touches three files: a changeset, a new test file, and the parser implementation. The refactor extracts the existing parser into a named parseRenderer constant and pipes it through convert(removeOrphanedWidgets) using the existing pipeParsers infrastructure.
Security risks
None. This is a pure data-transformation change in a JSON parsing layer with no network, auth, or user-input boundary implications.
Level of scrutiny
Low scrutiny is appropriate. The change is small (under 50 lines of new logic), uses well-established patterns (pipeParsers, convert, defaulted), has two focused tests covering the keep/remove cases, and no CODEOWNERS rules apply to these files.
Other factors
The bug report (introducing explicit undefined values for content referencing non-existent widget IDs) is a valid nit for malformed input, but it does not represent a regression — the same malformed data would already cause undefined-access issues before this PR. The inline comment will surface this to the author for consideration. The test plan in the PR description appropriately calls out manual verification of editor cut/copy/paste and undo behavior.
| for (const id of referencedWidgetIds) { | ||
| referencedWidgets[id] = renderer.widgets[id]; | ||
| } |
There was a problem hiding this comment.
🟡 Nit: removeOrphanedWidgets (line 40) sets referencedWidgets[id] = renderer.widgets[id] without checking that the widget exists, which introduces explicit undefined values into the widgets map when content references a non-existent widget ID. This is subtly worse than a missing key: an explicit undefined value will appear in Object.keys()/Object.entries() iterations, potentially causing TypeErrors downstream. Adding if (renderer.widgets[id] != null) before the assignment would fix it.
Extended reasoning...
What the bug is
getWidgetIdsFromContent extracts widget IDs purely from content via regex matching (see widget-id-utils.ts), without verifying that those IDs exist in the widgets map. In removeOrphanedWidgets, line 40 unconditionally does:
referencedWidgets[id] = renderer.widgets[id];If renderer.widgets does not have a key matching id, this sets referencedWidgets[id] = undefined — an explicit property with value undefined.
Step-by-step proof
Consider malformed data where content = "[[☃ radio 1]]" but widgets = {} (radio 1 is missing from the map):
getWidgetIdsFromContent("[[☃ radio 1]]")returns["radio 1"]- The loop iterates with
id = "radio 1" renderer.widgets["radio 1"]evaluates toundefined(key not present)referencedWidgets["radio 1"] = undefined— this creates the key"radio 1"with valueundefinedObject.keys(referencedWidgets)now returns["radio 1"]- Downstream code iterating over widgets (e.g.,
Object.entries(widgets)) will encounter["radio 1", undefined] - Any access like
widget.typeorwidget.optionson the undefined value throws a TypeError
Why existing code does not prevent it
The function is newly introduced in this PR and has no guard against missing widgets. The tests only cover the case where content is empty (orphan removed) and where content references an existing widget (widget kept). There is no test for content referencing a non-existent widget.
Impact
This requires malformed input data (content referencing a widget that does not exist in the widgets map), which should be rare in well-formed Perseus data. Before this PR, the same malformed data would also cause issues (a direct lookup of a missing widget also returns undefined), so this does not introduce a wholly new failure mode — it just makes the failure slightly more insidious by adding phantom keys to the map that will be iterated over.
Fix
Add a simple null check before the assignment:
for (const id of referencedWidgetIds) {
if (renderer.widgets[id] != null) {
referencedWidgets[id] = renderer.widgets[id];
}
}This ensures only widgets that actually exist in the map are copied over, and avoids introducing phantom undefined entries.
|
@jeremywiebe I found a potential problem with this. I think the widget snowman syntax can be replaced by CrowdIn placeholder strings for just-in-place translation. This means that when we parse Perseus content in the translation editors, all the widgets will be removed. We might have to make orphaned widget removal optional. We can do that by passing an option down via the |
|
Doh, I had two tabs open and submitted my review without realizing that it was on the one where you had this Crowdin question. Feel free to re-request, if needed, once you settle that question. Sorry for jumping the gun here. :) |
Perseus renderers sometimes contain entries in the
widgetsmap thatare not referenced in the
contentstring. I'm not sure if thishappens due to a bug in the editors, or because people are manually
editing the JSON.
In any case, these unused widgets are confusing to humans and LLMs
alike. This PR removes them during parsing.
Issue: LEMS-3983
Test plan:
Verify there are no regressions in the editor related to: