Skip to content

Remove orphaned widgets from PerseusRenderers when parsing#3327

Open
benchristel wants to merge 5 commits intomainfrom
benc/remove-orphaned-widgets
Open

Remove orphaned widgets from PerseusRenderers when parsing#3327
benchristel wants to merge 5 commits intomainfrom
benc/remove-orphaned-widgets

Conversation

@benchristel
Copy link
Member

@benchristel benchristel commented Mar 11, 2026

Perseus renderers sometimes contain entries in the widgets map that
are not referenced in the content string. I'm not sure if this
happens due to a bug in the editors, or because people are manually
editing the JSON.

In any case, these unused widgets are confusing to humans and LLMs
alike. This PR removes them during parsing.

Issue: LEMS-3983

Test plan:

Verify there are no regressions in the editor related to:

  • cut/copy and paste of content including widgets
  • undoing (with ctrl+Z) a change that deletes or adds a widget

@github-actions
Copy link
Contributor

github-actions bot commented Mar 11, 2026

🗄️ Schema Change: No Changes ✅

@github-actions
Copy link
Contributor

github-actions bot commented Mar 11, 2026

Size Change: +343 B (+0.07%)

Total Size: 487 kB

Filename Size Change
packages/perseus-core/dist/es/index.item-splitting.js 12.1 kB +231 B (+1.95%)
packages/perseus-core/dist/es/index.js 25 kB +112 B (+0.45%)
ℹ️ View Unchanged
Filename Size
packages/kas/dist/es/index.js 20.8 kB
packages/keypad-context/dist/es/index.js 1 kB
packages/kmath/dist/es/index.js 5.96 kB
packages/math-input/dist/es/index.js 98.5 kB
packages/math-input/dist/es/strings.js 1.61 kB
packages/perseus-editor/dist/es/index.js 100 kB
packages/perseus-linter/dist/es/index.js 8.82 kB
packages/perseus-score/dist/es/index.js 9.26 kB
packages/perseus-utils/dist/es/index.js 403 B
packages/perseus/dist/es/index.js 187 kB
packages/perseus/dist/es/strings.js 7.47 kB
packages/pure-markdown/dist/es/index.js 1.39 kB
packages/simple-markdown/dist/es/index.js 6.71 kB

compressed-size-action

@github-actions
Copy link
Contributor

github-actions bot commented Mar 11, 2026

🛠️ Item Splitting: Changes Detected ⚠️

Usually this means you need to update the Go parser
that Content Platform maintains!!!

See this list of post-mortems for more information.

This PR contains critical changes to Perseus. Please review
the changes and note that you may need to coordinate
deployment of these changes with other teams at Khan Academy.

diff --unified /home/runner/work/_temp/branch-compare/base/index.item-splitting.js /home/runner/work/_temp/branch-compare/pr/index.item-splitting.js
--- /home/runner/work/_temp/branch-compare/base/index.item-splitting.js	2026-03-11 17:57:56.676958246 +0000
+++ /home/runner/work/_temp/branch-compare/pr/index.item-splitting.js	2026-03-11 17:57:32.427502485 +0000
@@ -46,6 +46,10 @@
 
 function union(parseBranch){return new UnionBuilder(parseBranch)}class UnionBuilder{or(newBranch){return new UnionBuilder(either(this.parser,newBranch))}constructor(parser){this.parser=parser;}}function either(parseA,parseB){return (rawValue,ctx)=>{const resultA=parseA(rawValue,ctx);if(isSuccess(resultA)){return resultA}return parseB(rawValue,ctx)}}
 
+function getWidgetRegex(){return /\[\[☃ ([A-Za-z0-9- ]+)\]\]/g}function getWidgetIdsFromContent(content){const widgets=[];const localWidgetRegex=getWidgetRegex();let match=localWidgetRegex.exec(content);while(match!==null){widgets.push(match[1]);match=localWidgetRegex.exec(content);}return widgets}
+
+function convert(f){return (rawValue,ctx)=>ctx.success(f(rawValue))}
+
 const parsePerseusImageDetail=object({width:number,height:number});
 
 const parseImages=defaulted(record(string,parsePerseusImageDetail),()=>({}));
@@ -66,8 +70,6 @@
 
 function deriveExtraKeys(widgetOptions){if(widgetOptions.extraKeys){return widgetOptions.extraKeys}const defaultKeys=["PI"];if(widgetOptions.answerForms==null){return defaultKeys}const uniqueExtraVariables={};const uniqueExtraConstants={};for(const answerForm of widgetOptions.answerForms){const maybeExpr=KAS.parse(answerForm.value,widgetOptions);if(maybeExpr.parsed){const expr=maybeExpr.expr;const isGreek=symbol=>symbol==="pi"||symbol==="theta";const toKey=symbol=>isGreek(symbol)?symbol.toUpperCase():symbol;const isKey=key=>KeypadKeys.includes(key);for(const variable of expr.getVars()){const maybeKey=toKey(variable);if(isKey(maybeKey)){uniqueExtraVariables[maybeKey]=true;}}for(const constant of expr.getConsts()){const maybeKey=toKey(constant);if(isKey(maybeKey)){uniqueExtraConstants[maybeKey]=true;}}}}const extraVariables=Object.keys(uniqueExtraVariables).sort();const extraConstants=Object.keys(uniqueExtraConstants).sort();const extraKeys=[...extraVariables,...extraConstants];if(!extraKeys.length){return defaultKeys}return extraKeys}
 
-function convert(f){return (rawValue,ctx)=>ctx.success(f(rawValue))}
-
 const parseLegacyButtonSet=enumeration("basic","basic+div","trig","prealgebra","logarithms","basic relations","advanced relations","scientific");const parseLegacyButtonSets=defaulted(array(parseLegacyButtonSet),()=>["basic","trig","prealgebra","logarithms"]);
 
 function versionedWidgetOptions(latestMajorVersion,parseLatest){return new VersionedWidgetOptionsParserBuilder(latestMajorVersion,parseLatest,latest=>latest,(raw,ctx)=>ctx.failure("widget options with a known version number",raw))}class VersionedWidgetOptionsParserBuilder{withMigrationFrom(majorVersion,parseOldVersion,migrateToNextVersion){const parseOtherVersions=this.parser;const migrateToLatest=old=>this.migrateToLatest(migrateToNextVersion(old));return new VersionedWidgetOptionsParserBuilder(majorVersion,parseOldVersion,migrateToLatest,parseOtherVersions)}constructor(majorVersion,parseThisVersion,migrateToLatest,parseOtherVersions){this.migrateToLatest=migrateToLatest;this.parseOtherVersions=parseOtherVersions;const parseThisVersionAndMigrateToLatest=pipeParsers(parseThisVersion).then(convert(this.migrateToLatest)).parser;this.parser=(raw,ctx)=>{if(!isPlainObject(raw)){return ctx.failure("object",raw)}const versionParseResult=parseVersionedObject(raw,ctx);if(isFailure(versionParseResult)){return versionParseResult}if(versionParseResult.value.version.major!==majorVersion){return this.parseOtherVersions(raw,ctx)}return parseThisVersionAndMigrateToLatest(raw,ctx)};}}const parseVersionedObject=object({version:defaulted(object({major:number,minor:number}),()=>({major:0,minor:0}))});
@@ -116,7 +118,7 @@
 
 const parseMathFormat=enumeration("integer","mixed","improper","proper","decimal","percent","pi");const parseSimplify=pipeParsers(union(constant(null)).or(constant(undefined)).or(boolean).or(constant("true")).or(constant("required")).or(constant("correct")).or(constant("enforced")).or(constant("optional")).or(constant("accepted")).parser).then(convert(deprecatedSimplifyValuesToRequired)).parser;function deprecatedSimplifyValuesToRequired(simplify){switch(simplify){case "enforced":case "required":case "optional":return simplify;default:return "required"}}const parseNumericInputWidget=parseWidget(constant("numeric-input"),object({answers:array(object({message:defaulted(string,()=>""),value:optional(nullable(number)),status:string,answerForms:defaulted(optional(array(parseMathFormat)),()=>undefined),strict:defaulted(boolean,()=>false),maxError:optional(nullable(number)),simplify:parseSimplify})),labelText:optional(string),size:string,coefficient:defaulted(boolean,()=>false),rightAlign:optional(boolean),static:defaulted(boolean,()=>false),answerForms:optional(array(object({name:parseMathFormat,simplify:parseSimplify})))}));
 
-function parseRenderer(rawValue,ctx){return parsePerseusRenderer(rawValue,ctx)}const largeToAuto=(height,ctx)=>{if(height==="large"){return ctx.success("auto")}return ctx.success(height)};const parseOrdererWidget=parseWidget(constant("orderer"),object({options:defaulted(array(parseRenderer),()=>[]),correctOptions:defaulted(array(parseRenderer),()=>[]),otherOptions:defaulted(array(parseRenderer),()=>[]),height:pipeParsers(enumeration("normal","auto","large")).then(largeToAuto).parser,layout:defaulted(enumeration("horizontal","vertical"),()=>"horizontal")}));
+function parseRenderer$1(rawValue,ctx){return parsePerseusRenderer(rawValue,ctx)}const largeToAuto=(height,ctx)=>{if(height==="large"){return ctx.success("auto")}return ctx.success(height)};const parseOrdererWidget=parseWidget(constant("orderer"),object({options:defaulted(array(parseRenderer$1),()=>[]),correctOptions:defaulted(array(parseRenderer$1),()=>[]),otherOptions:defaulted(array(parseRenderer$1),()=>[]),height:pipeParsers(enumeration("normal","auto","large")).then(largeToAuto).parser,layout:defaulted(enumeration("horizontal","vertical"),()=>"horizontal")}));
 
 const parsePhetSimulationWidget=parseWidget(constant("phet-simulation"),object({url:string,description:string}));
 
@@ -138,7 +140,7 @@
 
 const parseWidgetsMap=(rawValue,ctx)=>{if(!isPlainObject(rawValue)){return ctx.failure("PerseusWidgetsMap",rawValue)}const widgetsMap={};for(const key of Object.keys(rawValue)){const entryResult=parseWidgetsMapEntry([key,rawValue[key]],widgetsMap,ctx.forSubtree(key));if(isFailure(entryResult)){return entryResult}}return ctx.success(widgetsMap)};const parseWidgetsMapEntry=([id,widget],widgetMap,ctx)=>{const idComponentsResult=parseWidgetIdComponents(id.split(" "),ctx.forSubtree("(widget ID)"));if(isFailure(idComponentsResult)){return idComponentsResult}const[type,n]=idComponentsResult.value;function parseAndAssign(key,parse){const widgetResult=parse(widget,ctx);if(isFailure(widgetResult)){return widgetResult}widgetMap[key]=widgetResult.value;return ctx.success(undefined)}switch(type){case "categorizer":return parseAndAssign(`categorizer ${n}`,parseCategorizerWidget);case "cs-program":return parseAndAssign(`cs-program ${n}`,parseCSProgramWidget);case "definition":return parseAndAssign(`definition ${n}`,parseDefinitionWidget);case "dropdown":return parseAndAssign(`dropdown ${n}`,parseDropdownWidget);case "explanation":return parseAndAssign(`explanation ${n}`,parseExplanationWidget);case "expression":return parseAndAssign(`expression ${n}`,parseExpressionWidget);case "free-response":return parseAndAssign(`free-response ${n}`,parseFreeResponseWidget);case "grapher":return parseAndAssign(`grapher ${n}`,parseGrapherWidget);case "group":return parseAndAssign(`group ${n}`,parseGroupWidget);case "graded-group":return parseAndAssign(`graded-group ${n}`,parseGradedGroupWidget);case "graded-group-set":return parseAndAssign(`graded-group-set ${n}`,parseGradedGroupSetWidget);case "iframe":return parseAndAssign(`iframe ${n}`,parseIframeWidget);case "image":return parseAndAssign(`image ${n}`,parseImageWidget);case "input-number":return parseAndAssign(`input-number ${n}`,parseInputNumberWidget);case "interaction":return parseAndAssign(`interaction ${n}`,parseInteractionWidget);case "interactive-graph":return parseAndAssign(`interactive-graph ${n}`,parseInteractiveGraphWidget);case "label-image":return parseAndAssign(`label-image ${n}`,parseLabelImageWidget);case "matcher":return parseAndAssign(`matcher ${n}`,parseMatcherWidget);case "matrix":return parseAndAssign(`matrix ${n}`,parseMatrixWidget);case "measurer":return parseAndAssign(`measurer ${n}`,parseMeasurerWidget);case "molecule-renderer":return parseAndAssign(`molecule-renderer ${n}`,parseMoleculeRendererWidget);case "number-line":return parseAndAssign(`number-line ${n}`,parseNumberLineWidget);case "numeric-input":return parseAndAssign(`numeric-input ${n}`,parseNumericInputWidget);case "orderer":return parseAndAssign(`orderer ${n}`,parseOrdererWidget);case "phet-simulation":return parseAndAssign(`phet-simulation ${n}`,parsePhetSimulationWidget);case "plotter":return parseAndAssign(`plotter ${n}`,parsePlotterWidget);case "python-program":return parseAndAssign(`python-program ${n}`,parsePythonProgramWidget);case "radio":return parseAndAssign(`radio ${n}`,parseRadioWidget);case "sorter":return parseAndAssign(`sorter ${n}`,parseSorterWidget);case "table":return parseAndAssign(`table ${n}`,parseTableWidget);case "video":return parseAndAssign(`video ${n}`,parseVideoWidget);case "sequence":return parseAndAssign(`sequence ${n}`,parseDeprecatedWidget);case "lights-puzzle":return parseAndAssign(`lights-puzzle ${n}`,parseDeprecatedWidget);case "simulator":return parseAndAssign(`simulator ${n}`,parseDeprecatedWidget);case "transformer":return parseAndAssign(`transformer ${n}`,parseDeprecatedWidget);case "passage":return parseAndAssign(`passage ${n}`,parseDeprecatedWidget);case "passage-ref":return parseAndAssign(`passage-ref ${n}`,parseDeprecatedWidget);case "passage-ref-target":return parseAndAssign(`passage-ref-target ${n}`,parseDeprecatedWidget);default:return parseAndAssign(`${type} ${n}`,parseWidget(constant(type),any))}};const parseDeprecatedWidget=parseWidget((_,ctx)=>ctx.success("deprecated-standin"),looseObject({}));
 
-const parsePerseusRenderer=defaulted(object({content:defaulted(string,()=>""),widgets:defaulted((rawVal,ctx)=>parseWidgetsMap(rawVal,ctx),()=>({})),images:parseImages,metadata:any}),()=>({content:"",widgets:{},images:{}}));
+const parseRenderer=object({content:defaulted(string,()=>""),widgets:defaulted((rawVal,ctx)=>parseWidgetsMap(rawVal,ctx),()=>({})),images:parseImages,metadata:any});const createDefaultRenderer=()=>({content:"",widgets:{},images:{}});function removeOrphanedWidgets(renderer){const referencedWidgetIds=getWidgetIdsFromContent(renderer.content);const referencedWidgets={};for(const id of referencedWidgetIds){referencedWidgets[id]=renderer.widgets[id];}return {...renderer,widgets:referencedWidgets}}const parsePerseusRenderer=pipeParsers(defaulted(parseRenderer,createDefaultRenderer)).then(convert(removeOrphanedWidgets)).parser;
 
 union(parsePerseusRenderer).or(array(parsePerseusRenderer)).parser;
 

@github-actions
Copy link
Contributor

npm Snapshot: Published

Good news!! We've packaged up the latest commit from this PR (9e70862) and published it to npm. You
can install it using the tag PR3327.

Example:

pnpm add @khanacademy/perseus@PR3327

If you are working in Khan Academy's frontend, you can run the below command.

./dev/tools/bump_perseus_version.ts -t PR3327

If you are working in Khan Academy's webapp, you can run the below command.

./dev/tools/bump_perseus_version.js -t PR3327

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — small, well-tested refactor using established parser patterns.

Extended reasoning...

Overview

This PR adds a removeOrphanedWidgets post-processing step to the parsePerseusRenderer parser pipeline. It touches three files: a changeset, a new test file, and the parser implementation. The refactor extracts the existing parser into a named parseRenderer constant and pipes it through convert(removeOrphanedWidgets) using the existing pipeParsers infrastructure.

Security risks

None. This is a pure data-transformation change in a JSON parsing layer with no network, auth, or user-input boundary implications.

Level of scrutiny

Low scrutiny is appropriate. The change is small (under 50 lines of new logic), uses well-established patterns (pipeParsers, convert, defaulted), has two focused tests covering the keep/remove cases, and no CODEOWNERS rules apply to these files.

Other factors

The bug report (introducing explicit undefined values for content referencing non-existent widget IDs) is a valid nit for malformed input, but it does not represent a regression — the same malformed data would already cause undefined-access issues before this PR. The inline comment will surface this to the author for consideration. The test plan in the PR description appropriately calls out manual verification of editor cut/copy/paste and undo behavior.

Comment on lines +39 to +41
for (const id of referencedWidgetIds) {
referencedWidgets[id] = renderer.widgets[id];
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: removeOrphanedWidgets (line 40) sets referencedWidgets[id] = renderer.widgets[id] without checking that the widget exists, which introduces explicit undefined values into the widgets map when content references a non-existent widget ID. This is subtly worse than a missing key: an explicit undefined value will appear in Object.keys()/Object.entries() iterations, potentially causing TypeErrors downstream. Adding if (renderer.widgets[id] != null) before the assignment would fix it.

Extended reasoning...

What the bug is

getWidgetIdsFromContent extracts widget IDs purely from content via regex matching (see widget-id-utils.ts), without verifying that those IDs exist in the widgets map. In removeOrphanedWidgets, line 40 unconditionally does:

referencedWidgets[id] = renderer.widgets[id];

If renderer.widgets does not have a key matching id, this sets referencedWidgets[id] = undefined — an explicit property with value undefined.

Step-by-step proof

Consider malformed data where content = "[[☃ radio 1]]" but widgets = {} (radio 1 is missing from the map):

  1. getWidgetIdsFromContent("[[☃ radio 1]]") returns ["radio 1"]
  2. The loop iterates with id = "radio 1"
  3. renderer.widgets["radio 1"] evaluates to undefined (key not present)
  4. referencedWidgets["radio 1"] = undefined — this creates the key "radio 1" with value undefined
  5. Object.keys(referencedWidgets) now returns ["radio 1"]
  6. Downstream code iterating over widgets (e.g., Object.entries(widgets)) will encounter ["radio 1", undefined]
  7. Any access like widget.type or widget.options on the undefined value throws a TypeError

Why existing code does not prevent it

The function is newly introduced in this PR and has no guard against missing widgets. The tests only cover the case where content is empty (orphan removed) and where content references an existing widget (widget kept). There is no test for content referencing a non-existent widget.

Impact

This requires malformed input data (content referencing a widget that does not exist in the widgets map), which should be rare in well-formed Perseus data. Before this PR, the same malformed data would also cause issues (a direct lookup of a missing widget also returns undefined), so this does not introduce a wholly new failure mode — it just makes the failure slightly more insidious by adding phantom keys to the map that will be iterated over.

Fix

Add a simple null check before the assignment:

for (const id of referencedWidgetIds) {
    if (renderer.widgets[id] != null) {
        referencedWidgets[id] = renderer.widgets[id];
    }
}

This ensures only widgets that actually exist in the map are copied over, and avoids introducing phantom undefined entries.

@benchristel
Copy link
Member Author

@jeremywiebe I found a potential problem with this. I think the widget snowman syntax can be replaced by CrowdIn placeholder strings for just-in-place translation.

This means that when we parse Perseus content in the translation editors, all the widgets will be removed.

We might have to make orphaned widget removal optional. We can do that by passing an option down via the ParseContext. That would enable us to only remove orphaned widgets during publish.

@jeremywiebe
Copy link
Collaborator

Doh, I had two tabs open and submitted my review without realizing that it was on the one where you had this Crowdin question. Feel free to re-request, if needed, once you settle that question. Sorry for jumping the gun here. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants