ZFIN-10166: Marker assembly update fix#1743
Open
rtaylorzfin wants to merge 2 commits intoZFIN:mainfrom
Open
Conversation
The temp_new_gene query was filtering on sfclg (location records) instead of marker_assembly, causing genes with existing locations but missing assembly records to be skipped. This required a second cleanup pass to catch them. Fix the upstream filter to check marker_assembly directly, simplify regex patterns, and remove the now-unnecessary second pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Improvements to GeneID matching and data consistency (markerAssemblyUpdate.sql):
Regular Expression Updates for GeneID Matching:
regexp_like(..., 'GeneID:' || accession || '(,|$)'), ensuring more accurate matching of GeneIDs at the end of the string or before a comma, and removing unnecessary wildcard patterns.regexp_like(gff_attributes, 'GeneID:' || db.dblink_acc_num || '(,|;|$)'), improving the specificity of the match and handling both comma and semicolon delimiters.Table Usage and Data Consistency:
marker_assemblytable instead ofsequence_feature_chromosome_location_generated, ensuring consistency with the intended data model and preventing duplicate entries.More Details
of marker_assembly, causing genes with existing locations but missing
assembly records to be skipped. This required a second cleanup pass to
catch them. Fix the upstream filter to check marker_assembly directly,
simplify regex patterns, and remove the now-unnecessary second pass.