(This entry was updated 19 Sep 2018 to reflect recent feature updates)
pyHam (‘python HOG analysis method’) makes it possible to extract useful information from HOGs encoded in standard OrthoXML format. It is available both as a python library and as a set of command-line scripts. Input HOGs in OrthoXML format are available from multiple bioinformatics resources, including OMA, Ensembl and HieranoidDB.
This post is a brief primer to pyham, with an emphasis on what it can do for you.
How to get pyHam?
pyHam is available as python package on the pypi server and is compatible python 2 and python 3. You can easily install via pip using the following bash command:
pip install pyham
What are Hierarchical Orthologous Groups (HOGs)?
You don’t know what HOGs are and you are eager to change this, we have an explanatory video about them just for you:
You can learn more about this in our previous blog post.
Where to find HOGs?
HOGs inferred on public genomes can be downloaded from the OMA orthology database. Other databases, such as Eggnog, OrthoDb or HieranoiDB also infer HOGs, but not all of these databases offer them in OrthoXML format. You can check which database serves hogs as orthoxml here. If you want to use your custom genomes to infer HOGs you can use the OMA standalone software.
In order to facilitate the use of pyHam on single gene family, we provide the option to let pyham fetch required data directly from a comptabilble databases (for now only OMA is available for this feature). The user simply have to give the id of a gene inside the gene family (HOGs) of insterest along with the name of the compatibible database where to get the data and pyHam will do the rest.
For example, if you are interest by the P_53 gene in rat (P53 rat gene page in OMA) you simply have to run the following python code to set-up your pyHam session:
my_gene_query = 'P53_RAT' database_to_query = 'oma' pyham_analysis = pyham.Ham(query_database=my_gene_query, use_data_from=database_to_query)
How does pyham help you investigate on HOGs?
The main features of pyHam are: (i) given a clade of interest, extract all the relevant HOGs, each of which ideally corresponds to a distinct ancestral gene in the last common ancestor of the clade; (ii) given a branch on the species tree, report the HOGs that duplicated on the branch, got lost on the branch, first appeared on that branch, or were simply retained; (iii) repeat the previous point along the entire species tree, and plot an overview of the gene evolution dynamics along the tree; and (iv) given a set of nested HOGs for a specific gene family of interest, generate a local iHam web page to visualize its evolutionary history.
What is the number of genes in a particular ancestral genome? (i)
In pyHam, ancestral genomes are attached to one specific internal node in the inputted species tree and denoted by the name of this taxon. Ancestral genes are then infered by fetching all the HOGs at the same level.
# Get the ancestral genome by name rodents_genome = ham_analysis.get_ancestral_genome_by_name("Rodents") # Get the related ancestral genes (HOGs) rodents_ancestral_genes = rodents_genome.genes # Get the number of ancestral genes at level of Rodents print(len(rodents_ancestral_genes)
How can I figure out the evolutionary history of genes in a given genome? (ii)
pyHam provides a feature to trace for HOGs/genes along a branch that span across one or multiple taxonomic ranges and report the HOGs that duplicated on this branch, got lost on this branch, first appeared on that branch, or were simply retained. The ‘vertical map’ (see further information on map here) allows for retrieval of all genes and their evolutionary history between the two taxonomic levels (i.e. which genes have been duplicated, which genes have been lost, etc).
# Get the genome of interest human = ham_analysis.get_extant_genome_by_name("HUMAN") vertebrates = ham_analysis.get_ancestral_genome_by_name("Vertebrata") # Instanciate the gene mapping ! vertical_human_vertebrates = ham_analysis.compare_genomes_vertically(human, vertebrates) # The order doesn't matter! # The identical genes (that stay single copies) # one HOG at vertebrates -> one descendant gene in human vertical_human_vertebrates.get_retained()) # The duplicated genes (that have duplicated) # one HOG at vertebrates -> list of its descendants gene in human vertical_human_vertebrates.get_duplicated()) # The gained genes (that emerged in between) # list of gene that appeared after vertebrates taxon vertical_human_vertebrates.get_gained() # The lost genes (that been lost in between) HOG at vertebrates that have been lost before human taxon vertical_human_vertebrates.get_lost()
How can I get an overview of the gene evolution dynamics along the tree that occured in my genomic setup? (iii)
pyHam includes treeProfile (extension of the Phylo.io tool), a tool to visualise an annotated species tree with evolutionary events (genes duplications, losses, gains) mapped to their related taxonomic range. The aim is to provide a minimalist and intuitive way to visualise the number of evolutionary events that occurred on each branch or the numbers of ancestral genes along the species tree.
# create a local treeprofile web page treeprofile = ham_analysis.create_tree_profile(outfile="treeprofile_example.html")
As you can see in the figure above, the treeprofile is composed of the reference species used to perform the pyham analysis. Each internal node is displayed with its related histogram of phylogenetic events (number of genes duplicated, lost, gained, or retained) that occurred on each branch. The tree profile either display the number of genes resulting from phylogenetics events or the number of phylogenetic events on themself; the switch can be made by opening the settings panel (histogram icon on top right) and selecting between ‘genes’ or ‘events’.
How can I visualise the evolutionary history of a gene family (HOG)? (iv)
pyHam embeds iHam, an interactive tool to visualise gene family evolutionary history. It provides a way to trace the evolution of genes in terms of duplications and losses, from ancient ancestors to modern day species.
# Select an HOG hog_of_interest = pyham_analysis.get_hog_by_id(2) # create and export the hog vis as .html output_filename = "hogvis_example.html" pyham_analysis.create_hog_visualisation(hog=hog_of_interest,outfile=output_filename)
Then, you simply have to double click on the .html file to open it in your default internet browser. We provide you an example below of what you should see. A brief video tutorial on iHam is available at this URL.
iHam is composed of two panels: a species tree that allows you to select the taonomic range of interest, a genes panel where each grey square represents an extant gene and each row a species.
We can see for example that at the level of mammals (click on the related node and select ‘Freeze at this node’) all genes of this gene family are descendant from a single comon ancestral gene.
Now, if we look at the level of Euarchontoglires (redo the same procedure as for mammals to freeze the vis at this level) we observe that the genes are now split by a vertical line. This vertical line separates 2 group of genes that are each descendants from a same single ancestral gene. This is the result of a duplication in between Mammals and Euarchontoglires.
This small example demonstrate the simplicity of iHam usefulness to identify evolutionary events that occured in gene families (e.g. when a duplication occured, which species have lost genes or how big genes families evolved).