Skip to content

yoramzarai/ProteinDmn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Retrieve Protein Domains

Retrieving protein domains from UniProt based on Ensembl transcript ID.

Given a list of Ensembl transcript IDs (i.e. ENST IDs), we:

  1. Retrieve the corresponding protein ID, gene name and ID, UniProt ID, and UniProt URL.
  2. Retrieve the protein domains from UniProt.
  3. Generate an excel file containing the IDs and protein domains.

Configuration File

./config/config.toml

Input

A text file or a CSV file containing the Ensembl transcript IDs (file name defined in ./config/config.toml). In case of a text file, each transcript ID is listed in a separate line. In case of a CSV file, the column name holding the transcript IDs, and the CSV delimiter, are defined in ./config/config.toml.

An example of an input text file:

ENST00000288135
ENST00000302278
ENST00000559488

Output

An excel or CSV file containing the corresponding protein domains (file name defined in ./config/config.toml).

Execution Flow

  1. Set the configuration parameters in ./config/config.toml
  2. Run ./main.py

Requirements

  1. Python >= 3.11
  2. pandas
  3. XlsxWriter
  4. requests
  5. toml

Additional Stand Alone Features

Converting Ensembl ID to UniProt ID.

# Example
import Utils.uniprot_utils as uput
uniprot_id: str = uput.ensembl_id2uniprot_id('ENST00000559488')

Converting UniProt ID to Ensembl ID.

# Example
import Utils.uniprot_utils as uput
ensembl_id: str = uput.uniprot_id2ensembl_id('P05106')

Retrieving the AA sequence.

# Example
import Utils.uniprot_utils as uput
aa_seq: str = uput.AA_seq('P05106')

Retrieving cross-reference information.

# Example
import Utils.uniprot_utils as uput
df_cross_reference: pd.DataFrame = uput.get_CrossReferences_databases_info('P05106')

Retrieving all protein data.

# Example
import Utils.uniprot_utils as uput
protein_data: dict = uput.lookup_protein_data('P05106')

About

Retrieving protein data and domains.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages