Refactor: Deduplicate schemas using hash-based storage#4319
Merged
kddejong merged 2 commits intoaws-cloudformation:mainfrom Dec 10, 2025
Merged
Refactor: Deduplicate schemas using hash-based storage#4319kddejong merged 2 commits intoaws-cloudformation:mainfrom
kddejong merged 2 commits intoaws-cloudformation:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4319 +/- ##
==========================================
- Coverage 94.35% 93.48% -0.88%
==========================================
Files 416 417 +1
Lines 14059 14130 +71
Branches 2787 2816 +29
==========================================
- Hits 13266 13210 -56
- Misses 445 573 +128
+ Partials 348 347 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This commit refactors the schema management system to eliminate duplicate storage of identical resource schemas across regions by implementing a hash-based deduplication strategy. Changes: - Modified schema generation to store unique schemas in resources/ folder - Updated provider files to reference schemas by hash instead of embedding - Added hash checking before schema loading to reduce redundant I/O - Fixed Custom:: resource type normalization for proper hash lookup - Added type annotations to resolve mypy errors - Updated ruff noqa comments for consistency with project standards Storage Impact: - Reduced schema storage from 25MB to 15.9MB (~36% reduction) - Provider files: 25MB → 2.9MB (now contain hash references) - Resources folder: 8KB → 13MB (deduplicated schema storage) Performance: - Schema loading optimized by checking hash before reading file content - Eliminates redundant schema loads when same schema exists across regions - Test suite runtime impact: ~4-5 seconds slower (within acceptable range) Technical Details: - Schemas with identical content now stored once with hash-based filename - Provider files map resource types to schema hashes for each region - Custom:: resources normalized to AWS::CloudFormation::CustomResource - Maintains backward compatibility with existing schema lookup APIs
06c41fd to
3ea1a6a
Compare
- Replace region-specific schema files with hash-based storage system - Schemas now stored in resources/ directory with hash-based filenames - Region files map resource types to schema hashes for deduplication - Update patch_schemas() method to work with new storage structure - Remove obsolete _patch_region_schemas() method - Clean up redundant module.json creation in _update_provider_schema - Maintain patching functionality in --update-specs command - Fix related test failures for new storage system This reduces storage from ~56 duplicated files per region to shared hash-based files, significantly reducing repository size while maintaining all existing functionality.
0187e60 to
cd00e12
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit refactors the schema management system to eliminate duplicate storage of identical resource schemas across regions by implementing a hash-based de-duplication strategy.
Changes:
Storage Impact:
Performance:
Technical Details:
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.