Skip to content

Exporter can append duplicate rows to CSV under re-entrant batch runs #7969

@donnapep

Description

@donnapep

Summary

Sensei_Export_Task::run() opens its CSV file in append mode ('a') on every batch invocation and writes whatever $this->query->posts returned, with no truncate, no offset-based seek, and no dedupe. If the same task gets invoked more than once with completed_posts === 0 (e.g. a scheduler re-entry, a stuck OPTION_RUNNING_JOB transient that expires after 120s while a job is still active, or two concurrent triggers that both load state before either persists), each invocation appends the same rows.

The persisted state ends up correct (completed_posts === total_posts), so the job reports as 100% complete — but the on-disk CSV can be a multiple of the real row count.

Reproduction (observed)

On a site with 30 questions, an export job ended in state question.completed-posts: 30 (correct) but produced a Questions CSV with 15,090 data rows = 503 × 30. The Courses CSV from the same job was correct (21 rows). A subsequent export of the same content under quieter conditions produced the correct 30 rows, so the bug is timing/concurrency-dependent rather than data-dependent.

Job option (relevant excerpt):
```json
{"s":{"content_types":["course","question"],"course":{"completed-posts":21},"question":{"completed-posts":30}},"c":true,"p":100,...}
```
File: 13.1 MB CSV, 15,090 unique-by-row but only 30 unique IDs.

Suggested fix

Minimal hardening in `includes/data-port/export-tasks/class-sensei-export-task.php`: when `completed_posts === 0` at the start of `run()`, open the CSV in `'w'` mode (truncate) before appending the first batch's rows. That way a re-entrant first batch cannot accumulate prior writes. The header row would also need to be re-written in this branch.

A more thorough fix would seek to the byte offset corresponding to `completed_posts` rows (or store the file's row count separately) and truncate beyond it before appending — but the simple "truncate when starting from 0" pattern is enough to prevent the observed multiplication.

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions