Summary
Sensei_Export_Task::run() opens its CSV file in append mode ('a') on every batch invocation and writes whatever $this->query->posts returned, with no truncate, no offset-based seek, and no dedupe. If the same task gets invoked more than once with completed_posts === 0 (e.g. a scheduler re-entry, a stuck OPTION_RUNNING_JOB transient that expires after 120s while a job is still active, or two concurrent triggers that both load state before either persists), each invocation appends the same rows.
The persisted state ends up correct (completed_posts === total_posts), so the job reports as 100% complete — but the on-disk CSV can be a multiple of the real row count.
Reproduction (observed)
On a site with 30 questions, an export job ended in state question.completed-posts: 30 (correct) but produced a Questions CSV with 15,090 data rows = 503 × 30. The Courses CSV from the same job was correct (21 rows). A subsequent export of the same content under quieter conditions produced the correct 30 rows, so the bug is timing/concurrency-dependent rather than data-dependent.
Job option (relevant excerpt):
```json
{"s":{"content_types":["course","question"],"course":{"completed-posts":21},"question":{"completed-posts":30}},"c":true,"p":100,...}
```
File: 13.1 MB CSV, 15,090 unique-by-row but only 30 unique IDs.
Suggested fix
Minimal hardening in `includes/data-port/export-tasks/class-sensei-export-task.php`: when `completed_posts === 0` at the start of `run()`, open the CSV in `'w'` mode (truncate) before appending the first batch's rows. That way a re-entrant first batch cannot accumulate prior writes. The header row would also need to be re-written in this branch.
A more thorough fix would seek to the byte offset corresponding to `completed_posts` rows (or store the file's row count separately) and truncate beyond it before appending — but the simple "truncate when starting from 0" pattern is enough to prevent the observed multiplication.
Notes
Summary
Sensei_Export_Task::run()opens its CSV file in append mode ('a') on every batch invocation and writes whatever$this->query->postsreturned, with no truncate, no offset-based seek, and no dedupe. If the same task gets invoked more than once withcompleted_posts === 0(e.g. a scheduler re-entry, a stuckOPTION_RUNNING_JOBtransient that expires after 120s while a job is still active, or two concurrent triggers that both load state before either persists), each invocation appends the same rows.The persisted state ends up correct (
completed_posts === total_posts), so the job reports as 100% complete — but the on-disk CSV can be a multiple of the real row count.Reproduction (observed)
On a site with 30 questions, an export job ended in state
question.completed-posts: 30(correct) but produced a Questions CSV with 15,090 data rows = 503 × 30. The Courses CSV from the same job was correct (21 rows). A subsequent export of the same content under quieter conditions produced the correct 30 rows, so the bug is timing/concurrency-dependent rather than data-dependent.Job option (relevant excerpt):
```json
{"s":{"content_types":["course","question"],"course":{"completed-posts":21},"question":{"completed-posts":30}},"c":true,"p":100,...}
```
File: 13.1 MB CSV, 15,090 unique-by-row but only 30 unique IDs.
Suggested fix
Minimal hardening in `includes/data-port/export-tasks/class-sensei-export-task.php`: when `completed_posts === 0` at the start of `run()`, open the CSV in `'w'` mode (truncate) before appending the first batch's rows. That way a re-entrant first batch cannot accumulate prior writes. The header row would also need to be re-written in this branch.
A more thorough fix would seek to the byte offset corresponding to `completed_posts` rows (or store the file's row count separately) and truncate beyond it before appending — but the simple "truncate when starting from 0" pattern is enough to prevent the observed multiplication.
Notes