Skip to content

fix(q_dev): prevent data duplication in user_report and user_data tables#8737

Merged
abeizn merged 2 commits intomainfrom
fix/q-dev-dedup-user-tables
Feb 28, 2026
Merged

fix(q_dev): prevent data duplication in user_report and user_data tables#8737
abeizn merged 2 commits intomainfrom
fix/q-dev-dedup-user-tables

Conversation

@warren830
Copy link
Copy Markdown
Contributor

Summary

  • Replace auto-increment ID (common.Model) with composite primary keys (common.NoPKModel) on _tool_q_dev_user_report and _tool_q_dev_user_data tables to enable proper deduplication
  • Switch db.Create() to db.CreateOrUpdate() in s3_data_extractor so re-extracted data updates existing rows instead of inserting duplicates
  • Add migration to drop/rebuild tables with new PKs and reset s3_file_meta.processed flag to trigger clean re-extraction

Test plan

  • go build ./plugins/q_dev/... passes
  • go test ./plugins/q_dev/... passes
  • Verify migration runs cleanly on dev environment
  • Verify SELECT ... GROUP BY ... HAVING COUNT(*) > 1 returns no duplicates after re-extraction
  • Verify Grafana dashboard "Total Credits Used" shows correct value

Replace auto-increment ID with composite primary keys so that
CreateOrUpdate can properly deduplicate rows on re-extraction.

- user_report PK: (connection_id, scope_id, user_id, date, client_type)
- user_data PK: (connection_id, scope_id, user_id, date)
- Switch db.Create() to db.CreateOrUpdate() in s3_data_extractor
- Migration drops old tables, rebuilds with new PKs, resets s3_file_meta
  processed flag to trigger re-extraction
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. component/plugins This issue or PR relates to plugins pr-type/bug-fix This PR fixes a bug labels Feb 28, 2026
Copy link
Copy Markdown
Contributor

@abeizn abeizn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@abeizn abeizn merged commit 19b853a into main Feb 28, 2026
10 checks passed
@abeizn abeizn deleted the fix/q-dev-dedup-user-tables branch February 28, 2026 11:32
la-tamas pushed a commit to archfz/incubator-devlake that referenced this pull request Mar 26, 2026
…les (apache#8737)

* fix(q_dev): prevent data duplication in user_report and user_data tables

Replace auto-increment ID with composite primary keys so that
CreateOrUpdate can properly deduplicate rows on re-extraction.

- user_report PK: (connection_id, scope_id, user_id, date, client_type)
- user_data PK: (connection_id, scope_id, user_id, date)
- Switch db.Create() to db.CreateOrUpdate() in s3_data_extractor
- Migration drops old tables, rebuilds with new PKs, resets s3_file_meta
  processed flag to trigger re-extraction

* fix(q_dev): gofmt archived user_data_v2 model
la-tamas pushed a commit to archfz/incubator-devlake that referenced this pull request Apr 9, 2026
…les (apache#8737)

* fix(q_dev): prevent data duplication in user_report and user_data tables

Replace auto-increment ID with composite primary keys so that
CreateOrUpdate can properly deduplicate rows on re-extraction.

- user_report PK: (connection_id, scope_id, user_id, date, client_type)
- user_data PK: (connection_id, scope_id, user_id, date)
- Switch db.Create() to db.CreateOrUpdate() in s3_data_extractor
- Migration drops old tables, rebuilds with new PKs, resets s3_file_meta
  processed flag to trigger re-extraction

* fix(q_dev): gofmt archived user_data_v2 model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/plugins This issue or PR relates to plugins pr-type/bug-fix This PR fixes a bug size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants