Skip to content

Fix race condition in download queue when concurrent jobs share destination directory#104

Merged
lstein merged 3 commits intolstein/bugfix/test_download_queue-failurefrom
copilot/fix-race-condition-test-download-queue
Feb 28, 2026
Merged

Fix race condition in download queue when concurrent jobs share destination directory#104
lstein merged 3 commits intolstein/bugfix/test_download_queue-failurefrom
copilot/fix-race-condition-test-download-queue

Conversation

Copy link
Copy Markdown

Copilot AI commented Feb 28, 2026

Summary

When two download jobs target the same destination directory simultaneously, a TOCTOU race between glob("*.downloading") and the subsequent .stat() call could cause a FileNotFoundError if a concurrent job completed and renamed its .downloading file in between. This surfaced as an intermittent test failure in test_errors where broken's job error was FileNotFoundError instead of the expected HTTPError(NOT FOUND).

Fix: In _do_download, wrap the candidates[0].stat().st_size call in a try-except FileNotFoundError. If the file disappears between glob and stat, reset job.download_path = None and leave resume_from = 0 so the job proceeds as a fresh download.

# Before
resume_from = candidates[0].stat().st_size  # crashes if file renamed by concurrent job

# After
try:
    resume_from = candidates[0].stat().st_size
except FileNotFoundError:
    # .downloading file renamed/deleted between glob and stat; skip resume
    job.download_path = None

Related Issues / Discussions

QA Instructions

Run tests/app/services/download/test_download_queue.py::test_errors repeatedly — it previously failed intermittently due to this race.

Merge Plan

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)
Original prompt

This section details on the original issue you should resolve

<issue_title>[bug]: Race condition in test_download_queue.py</issue_title>
<issue_description>### Is there an existing issue for this problem?

  • I have searched the existing issues

Install method

Invoke's Launcher

Operating system

Linux

GPU vendor

Nvidia (CUDA)

GPU model

No response

GPU VRAM

No response

Version number

main branch

Browser

No response

System Information

No response

What happened

I am seeing random failures of the unit test tests/app/services/download/test_download_queue.py crom what appears to be a race condition. Here is a typical stack trace:

=================================== FAILURES ===================================
_________________________________ test_errors __________________________________

tmp_path = PosixPath('/tmp/pytest-of-runner/pytest-0/test_errors0')
mm2_session = <requests_testadapter.TestSession object at 0x7f873fcc87d0>

    @pytest.mark.timeout(timeout=10, method="thread")
    def test_errors(tmp_path: Path, mm2_session: Session) -> None:
        queue = DownloadQueueService(
            requests_session=mm2_session,
        )
        queue.start()
    
        for bad_url in ["http://www.civitai.com/models/broken", "http://www.civitai.com/models/missing"]:
            queue.download(AnyHttpUrl(bad_url), dest=tmp_path)
    
        queue.join()
        jobs = queue.list_jobs()
        print(jobs)
        assert len(jobs) == 2
        jobs_dict = {str(x.source): x for x in jobs}
        assert jobs_dict["http://www.civitai.com/models/broken"].status == DownloadJobStatus.ERROR
>       assert jobs_dict["http://www.civitai.com/models/broken"].error_type == "HTTPError(NOT FOUND)"
E       assert "FileNotFound...downloading')" == 'HTTPError(NOT FOUND)'
E         
E         - HTTPError(NOT FOUND)
E         + FileNotFoundError([Errno 2] No such file or directory: '/tmp/pytest-of-runner/pytest-0/test_errors0/missing.txt.downloading')

tests/app/services/download/test_download_queue.py:78: AssertionError
----------------------------- Captured stdout call -----------------------------
[DownloadJob(id=0, dest=PosixPath('/tmp/pytest-of-runner/pytest-0/test_errors0'), download_path=PosixPath('/tmp/pytest-of-runner/pytest-0/test_errors0/missing.txt'), status=<DownloadJobStatus.ERROR: 'error'>, bytes=0, total_bytes=0, error_type="FileNotFoundError([Errno 2] No such file or directory: '/tmp/pytest-of-runner/pytest-0/test_errors0/missing.txt.downloading')", error='Traceback (most recent call last):\n  File "/home/runner/work/InvokeAI/InvokeAI/invokeai/app/services/download/download_default.py", line 316, in _download_next_item\n    self._do_download(job)\n  File "/home/runner/work/InvokeAI/InvokeAI/invokeai/app/services/download/download_default.py", line 391, in _do_download\n    resume_from = candidates[0].stat().st_size\n                  ^^^^^^^^^^^^^^^^^^^^\n  File "/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/pathlib.py", line 1013, in stat\n    return os.stat(self, follow_symlinks=follow_symlinks)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nFileNotFoundError: [Errno 2] No such file or directory: \'/tmp/pytest-of-runner/pytest-0/test_errors0/missing.txt.downloading\'\n', source=AnyHttpUrl('http://www.civitai.com/models/broken'), access_token=None, priority=10, job_started='2026-02-28T16:13:25.053048+00:00', job_ended='2026-02-28T16:13:25.083713+00:00', content_type=None, canonical_url=None, etag=None, last_modified=None, final_url=None, expected_total_bytes=None, resume_required=False, resume_message=None, resume_from_scratch=False), DownloadJob(id=1, dest=PosixPath('/tmp/pytest-of-runner/pytest-0/test_errors0'), download_path=PosixPath('/tmp/pytest-of-runner/pytest-0/test_errors0/missing.txt'), status=<DownloadJobStatus.COMPLETED: 'completed'>, bytes=22, total_bytes=0, error_type=None, error=None, source=AnyHttpUrl('http://www.civitai.com/models/missing'), access_token=None, priority=10, job_started='2026-02-28T16:13:25.052943+00:00', job_ended='2026-02-28T16:13:25.054633+00:00', content_type=None, canonical_url=None, etag=None, last_modified=None, final_url='http://www.civitai.com/models/missing', expected_total_bytes=0, resume_required=False, resume_message=None, resume_from_scratch=False)]
----------------------------- Captured stderr call -----------------------------
[2026-02-28 16:13:25,083]::[DownloadQueueService]::ERROR --> http://www.civitai.com/models/broken: ['Traceback (most recent call last):\n', '  File "/home/runner/work/InvokeAI/InvokeAI/invokeai/app/services/download/download_default.py", line 316, in _download_next_item\n    self._do_download(job)\n', '  File "/home/runner/work/InvokeAI/InvokeAI/invokeai/app/services/download/download_default.py", line 391, in _do_download\n    resume_from = candidates[0].stat().st_size\n                  ^^^^^^^^^^^^^^^^^^^^\n', '  File "/opt/hostedtoolcac...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes lstein/InvokeAI#103

<!-- START COPILOT CODING AGENT TIPS -->
---

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. [Learn more about Advanced Security.](https://gh.io/cca-advanced-security)

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix race condition in test_download_queue.py Fix race condition in download queue when concurrent jobs share destination directory Feb 28, 2026
@lstein lstein marked this pull request as ready for review February 28, 2026 16:30
@lstein lstein self-requested a review as a code owner February 28, 2026 16:30
Copy link
Copy Markdown
Owner

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work, but hard to know given that it is a race condition.

@lstein lstein merged commit c2f4ad3 into lstein/bugfix/test_download_queue-failure Feb 28, 2026
13 checks passed
@lstein lstein deleted the copilot/fix-race-condition-test-download-queue branch February 28, 2026 17:22
lstein added a commit that referenced this pull request Mar 6, 2026
…nation directory (#104) (invoke-ai#8931)

* Initial plan

* Fix race condition in _do_download when scanning for .downloading files



* chore(backend): update copyright

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants