A probabilistic graphical model for estimating selection coefficient of nonsynonymous variants from human population sequence data
Zhao et al 2025, https://www.nature.com/articles/s41467-025-59937-2
- estimated S_gene for all human protein coding genes
- estimated selection coefficient (MisFit S) for all possible missense variants in human genome caused by SNVs Download at: https://doi.org/10.5281/zenodo.15230898
pop_model simulate variants and construct PIG model
model_PTV only use PTVs, independent of other models
model_mis
used to find priors of d and s_gene, then initialize s_gene before MisFit training.
model_basic population data w./w.o. genes
model_logit population data + gene + ESM zero-shot as d
model_TF full MisFit model
*_analysis are used to combine data for different analysis
Note: model_selection directly given by the model may need to be transformed by a sigmoid function to get MisFit_S in the original scale
to be updated
- deep mutational scan GMM
- variant annotations