fix(save): make isolated and atomic by shcheklein · Pull Request #1603 · datachain-ai/datachain

shcheklein · 2026-02-19T01:33:13Z

Concurrent save() calls on the same dataset can corrupt data by double-inserting rows into the same version table.

UUID-based conflict detection in create_dataset_version (works across SQLite, PostgreSQL, Studio API), retry loop for auto-versioned saves, and a staging table pattern in save() — write to temp table first, claim version metadata, then rename.

Copilot

Pull request overview

This PR aims to make concurrent save() operations on the same dataset isolated and effectively atomic to prevent version-table corruption and duplicate inserts under concurrency.

Changes:

Implemented a staging-table “write then claim version then rename” flow in DatasetQuery.save().
Added UUID-based conflict detection to metastore.create_dataset_version() and a retry loop for auto-versioned dataset creation in Catalog.create_dataset().
Updated functional tests to reflect the new create_dataset_version() return type and the fact that create_dataset() no longer creates a rows table.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_atomicity.py	Removes `create_rows` usage to align with new dataset creation semantics.
tests/func/test_metastore.py	Updates expectations for `create_dataset_version()` returning `(ds, version_created)`.
tests/func/test_datasets.py	Refactors fixtures/tests away from `create_rows` and adds coverage for “no rows table created”.
src/datachain/query/dataset.py	Implements staging-table save: temp table insert → claim version metadata → rename → finalize.
src/datachain/lib/dc/records.py	Creates rows table explicitly since `create_dataset()` is now metadata-only.
src/datachain/data_storage/metastore.py	Returns `(dataset, version_created)` and uses UUID equality as conflict/ownership detection.
src/datachain/catalog/catalog.py	Removes `create_rows` behavior, adds version-claim retry logic, and changes `create_dataset_version()` to metadata-only + tuple return.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-19T01:39:53Z

tests/func/test_datasets.py

+    chain = dc.read_values(val=[1, 2, 3, 4], session=test_session).save(name)
+    catalog = test_session.catalog
+    return catalog.update_dataset(
+        chain.dataset, {"description": "test dataset", "attrs": ["test", "dataset"]}


Catalog.update_dataset takes updates via keyword arguments; passing a dict as the second positional arg binds it to conn and results in no fields being updated. Unpack the dict (**{...}) or pass description= / attrs= directly so the fixture actually updates the dataset metadata.

Suggested change

chain.dataset, {"description": "test dataset", "attrs": ["test", "dataset"]}

chain.dataset,

description="test dataset",

attrs=["test", "dataset"],

Copilot · 2026-02-19T01:39:54Z

src/datachain/lib/dc/records.py

+    # Create the rows table (create_dataset only creates metadata).
+    table_name = warehouse.dataset_table_name(dsr, dsr.latest_version)
+    warehouse.create_dataset_rows_table(table_name, columns=columns)
+


After create_dataset() was changed to create metadata only, this function now creates and populates the rows table but never calls catalog.update_dataset_version_with_warehouse_info(...). That leaves the version marked COMPLETE with num_objects/size/preview potentially unset; update the warehouse info before setting status COMPLETE (or explicitly document that read_records skips these fields).

This comment make sense to me, can you check. Also, why did we move creating warehouse table outside catalog? Now we need to do all this stuff like create table, update stats and maybe other things in multiple places.

I'll check this is kind-a draft still, I'm iterating on it.

In short - the idea here is to never create the warehouse table with its actual name initially. Create a temporary table and rename it at the end. Temporary table is created in a different place.

this read_records is unique (and probably should be redesigned).

I'm still not sure if this design overall is right ...

src/datachain/data_storage/metastore.py

cloudflare-workers-and-pages · 2026-02-19T02:18:41Z

Deploying datachain with Cloudflare Pages

Latest commit:	`68019c1`
Status:	✅ Deploy successful!
Preview URL:	https://a0a31138.datachain-2g6.pages.dev
Branch Preview URL:	https://fix-save-race-condition.datachain-2g6.pages.dev

View logs

ilongin · 2026-02-19T09:01:00Z

src/datachain/query/dataset.py

+            try:
+                self.catalog.warehouse.rename_table(temp_table, final_table_name)
+            except Exception:
+                with contextlib.suppress(Exception):


Should we cleanup dataset metadata or mark it as failed here?

ilongin · 2026-02-19T09:05:12Z

src/datachain/lib/dc/records.py

+    # Create the rows table (create_dataset only creates metadata).
+    table_name = warehouse.dataset_table_name(dsr, dsr.latest_version)
+    warehouse.create_dataset_rows_table(table_name, columns=columns)
+


This comment make sense to me, can you check. Also, why did we move creating warehouse table outside catalog? Now we need to do all this stuff like create table, update stats and maybe other things in multiple places.

codecov · 2026-02-20T02:51:36Z

Codecov Report

❌ Patch coverage is 61.29032% with 24 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/datachain/catalog/catalog.py	62.50%	9 Missing and 3 partials ⚠️
src/datachain/query/dataset.py	47.82%	12 Missing ⚠️

📢 Thoughts on this report? Let us know!

shcheklein self-assigned this Feb 19, 2026

shcheklein requested a review from Copilot February 19, 2026 01:33

Copilot started reviewing on behalf of shcheklein February 19, 2026 01:33 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

shcheklein force-pushed the fix-save-race-condition branch from b346f12 to 076e138 Compare February 19, 2026 02:18

ilongin reviewed Feb 19, 2026

View reviewed changes

fix(save): make isolated and atomic

c1631bd

shcheklein force-pushed the fix-save-race-condition branch from 076e138 to a3fea9d Compare February 19, 2026 23:54

fix tests

68019c1

shcheklein force-pushed the fix-save-race-condition branch from a3fea9d to 68019c1 Compare February 20, 2026 02:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix(save): make isolated and atomic#1603

fix(save): make isolated and atomic#1603
shcheklein wants to merge 2 commits intomainfrom
fix-save-race-condition

shcheklein commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 19, 2026

Uh oh!

Copilot AI Feb 19, 2026

Uh oh!

ilongin Feb 19, 2026

Uh oh!

shcheklein Feb 19, 2026

Uh oh!

Uh oh!

cloudflare-workers-and-pages bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

ilongin Feb 19, 2026

Uh oh!

ilongin Feb 19, 2026

Uh oh!

codecov bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

shcheklein commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

ilongin Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

shcheklein Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cloudflare-workers-and-pages bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying datachain with Cloudflare Pages

Uh oh!

ilongin Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

ilongin Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 20, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloudflare-workers-and-pages bot commented Feb 19, 2026 •

edited

Loading