Retrieve Protein Domains

Retrieving protein domains from UniProt based on Ensembl transcript ID.

Given a list of Ensembl transcript IDs (i.e. ENST IDs), we:

Retrieve the corresponding protein ID, gene name and ID, UniProt ID, and UniProt URL.
Retrieve the protein domains from UniProt.
Generate an excel file containing the IDs and protein domains.

Configuration File

./config/config.toml

Input

A text file or a CSV file containing the Ensembl transcript IDs (file name defined in ./config/config.toml). In case of a text file, each transcript ID is listed in a separate line. In case of a CSV file, the column name holding the transcript IDs, and the CSV delimiter, are defined in ./config/config.toml.

An example of an input text file:

ENST00000288135
ENST00000302278
ENST00000559488

Output

An excel or CSV file containing the corresponding protein domains (file name defined in ./config/config.toml).

Execution Flow

Set the configuration parameters in ./config/config.toml
Run ./main.py

Requirements

Python >= 3.11
pandas
XlsxWriter
requests
toml

Additional Stand Alone Features

Converting Ensembl ID to UniProt ID.

# Example
import Utils.uniprot_utils as uput
uniprot_id: str = uput.ensembl_id2uniprot_id('ENST00000559488')

Converting UniProt ID to Ensembl ID.

# Example
import Utils.uniprot_utils as uput
ensembl_id: str = uput.uniprot_id2ensembl_id('P05106')

Retrieving the AA sequence.

# Example
import Utils.uniprot_utils as uput
aa_seq: str = uput.AA_seq('P05106')

Retrieving cross-reference information.

# Example
import Utils.uniprot_utils as uput
df_cross_reference: pd.DataFrame = uput.get_CrossReferences_databases_info('P05106')

Retrieving all protein data.

# Example
import Utils.uniprot_utils as uput
protein_data: dict = uput.lookup_protein_data('P05106')

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
Output		Output
Utils		Utils
config		config
data		data
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieve Protein Domains

Configuration File

Input

Output

Execution Flow

Requirements

Additional Stand Alone Features

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Retrieve Protein Domains

Configuration File

Input

Output

Execution Flow

Requirements

Additional Stand Alone Features

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages