Skip to content

BiomedSciAI/genet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GENET

bioRxiv

GENET is a computational workflow designed for the processing, analysis, and visualization of biomedical entities and relations in scientific literature. It integrates multiple components to support tasks that involve trait-gene association discovery, literature mining, knowledge graph construction, and interactive visualizations.

Overview

This repository includes the following modules:

  • Snp2TraitNet – Predicts associations between SNPs and traits using a dual-encoder architecture.
  • LitMiner – Extracts biomedical entities and relations from literature using in-context learning.
  • Emb2KG – Produces and then converts embedding representations of biomedical entities and relations into structured knowledge graphs.
  • GENETViz – Provides interactive visualizations for exploring biomedical entities networks.

Getting Started

To get started with this project, you'll need to install all four modules: snp2traitnet, litminer, emb2kg, and genetviz. Then, you'll need to download and place model weights and datafiles in proper places. You can do these by running the following commands on CLI:

git clone https://github.com/BiomedSciAI/genet.git
cd genet
chmod +x install.sh
./install.sh

Install Ollama and Pull Model

To enable local inference via Ollama, follow these steps:

  1. Download and install Ollama from https://ollama.com
  2. Pull a desired model:
ollama pull granite4:tiny-h
  1. create .env in litminer (litminer/.env) and add the model:
OLLAMA_MODEL=granite4:tiny-h
  1. create apikey.js in GENETViz (genetviz/GENETViz/static/apikey/apikey.js) and add the model:
const API_CONFIG = {
    'ollama':
    {
        'API_END_POINT': 'http://localhost:11434/api/generate',
        'MODEL': 'granite4:tiny-h'
    }
}

Run Snp2TraitNet

Snp2TraitNet enables the discovery of associations between traits, genes, and SNPs using a dual-encoder model trained on curated biomedical datasets.

Trait to Genes: discover genes using disease/trait names: e.g., alzheimer\'s disease.

Identify genes potentially associated with a given disease or trait. For example, to discover genes linked to Marfan syndrome, run:

run_snp2trait --mode trait2gene --keyword "marfan syndrome" --ckpt_path snp2traitnet/Snp2TraitNet/output/snp2trait/checkpoints/snp2trait-checkpoint.ckpt --data_path snp2traitnet/Snp2TraitNet/datasets/snp2trait.csv --output_path snp2trait.txt

Note: Use escape characters (\", \') if your keyword contains quotes.

Gene to Traits: Find disease/trait names using gene name: e.g., PCSK9.

Retrieve diseases or traits associated with a specific gene. For example, to find traits linked to PCSK9:

run_snp2trait --mode gene2trait --keyword PCSK9 --ckpt_path snp2traitnet/Snp2TraitNet/output/snp2trait/checkpoints/snp2trait-checkpoint.ckpt --data_path snp2traitnet/Snp2TraitNet/datasets/snp2trait.csv --output_path snp2trait.txt

SNP to Traits: Find disease names using snp (RS ID): e.g., rs362307.

Discover traits associated with a specific SNP (RS ID). For example, to query rs362307:

run_snp2trait --mode snp2trait --keyword rs362307 --ckpt_path snp2traitnet/Snp2TraitNet/output/snp2trait/checkpoints/snp2trait-checkpoint.ckpt --data_path snp2traitnet/Snp2TraitNet/datasets/snp2trait.csv --output_path snp2trait.txt

Run LitMiner

LitMiner performs literature-based mining to extract biomedical entities and relationships from PubMed abstracts using keyword-driven queries and LLM-based inference.

Keyword-Based Extraction

To search for articles and extract entities and relations using a specific keyword (e.g., Alkaptonuria):

nohup run_litminer --query "Alkaptonuria" --retmax 100 --backend ollama > output.log 2>&1 &

Pipeline-Based Extraction

To use the output from Snp2TraitNet as input for literature mining:

nohup run_litminer --query snp2trait.txt --retmax 100 --backend ollama > output.log 2>&1 &

Run Emb2KG

Emb2KG transforms extracted entities and relations into structured knowledge graphs, generates embeddings, and performs clustering analysis.

run_emb2kg --input_path litminer/LitMiner/output/msx --output_path genetviz/GENETViz/static/data --n_clusters 5

Run GENET Visualization

run_genet

Then open your browser and navigate to http://127.0.0.1:11748

Citations

If you find our work useful for your research, we ask you to cite our work:

@misc{kwon2025genet,
    title={GENET: AI-Powered Interactive Visualization Workflows to Explore Biomedical Entity Networks},
    author={Bum Chul Kwon and Natasha Mulligan and Joao Bettencourt-Silva and Ta-Hsin Li and Bharath Dandala and Feng Lin and Ching-Huei Tsou and Pablo Meyer},
    year={2025},
    doi = {10.64898/2025.12.12.694029},
    eprint = {https://www.biorxiv.org/content/early/2025/12/16/2025.12.12.694029.full.pdf},
    journal = {bioRxiv}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors