This post is a brief primer to pyham, with an emphasis on what it can do for you.
How to get pyham?
Pyham is available as python package on the pypi server and is compatible python 2 and python 3. You can easily install via pip using the following bash command:
pip install pyham
If you can check the official pyham website for further information about how to use pyham, documentation and others related resources.
What are Hierarchical Orthologous Groups (HOGs)?
You don’t know what HOGs are and you are eager to change this, we have an explanatory video about them just for you:
You can learn more about this in our previous blog post.
Where to find HOGs?
HOGs inferred on public genomes can be downloaded from the OMA orthology database. Other databases, such as Eggnog, OrthoDb or HieranoiDB also infer HOGs, but not all of these databases offer them in OrthoXML format. If you want to use your custom genomes to infer HOGs you can use the OMA standalone software.
How does pyham help you investigate on HOGs?
As input, pyham takes an orthoXML file containing HOGs and the related species tree. Pyham creates gene and genome objects based on the information extracted from the input files and provides an API to work directly on those phylogenetic objects (easy queries based on name or phylogenetic relations). The input species tree serves as a guide to define evolutionary relationships between genes and genomes.
How can I figure out the evolutionary history of genes in a given genome?
Pyham provides a mapper object for HOGs/genes across multiple taxonomic ranges. Remember that each HOG at a given taxanomic level corresponds to 1 gene in that particular ancestral genome. The idea of pyham is to map the HOGs of an ancestral genome to the HOGs/genes of its descendant genomes. The vertical mapper object allows for retrieval of all genes and their evolutionary history between the two taxonomic levels (i.e. which genes have been duplicated, which genes have been lost, etc).
compare_human_mammals = pyham_analysis.compare_genomes_vertically("Human","Mammals") # Mammals HOGs with their single copy human descendant genes compare_human_mammals.get_identical() # Mammals HOGs that been lost in between the two levels compare_human_mammals.get_lost() # Human genes that have been "gained" in between the two levels compare_human_mammals.get_gained() # Mammals HOGs with their multiple copy human descendant genes compare_human_mammals.get_duplicated()
What are the genes in an extant genome that have been ancestrally duplicated?
We can use logic operations in the previously described mapper object. In this case we can compare the genome of interest with its ancestral parent and retrieve the duplicated genes that will be specific to this branch. For example, we can find the genes in human which were duplicated sometime between the speciation of tetrapoda and the speciation of mammals.
compare_mamm_tetra = pyham_analysis.compare_genomes_vertically("Mammals","Tetrapods") mammals_specific_dupl_hogs = compare_human_mammals.get_duplicated() human_genes_duplicated_before_mammals_speciation =  for hog in mammals_specific_dupl_hogs: for gene in hog.get_descendant_genes() if gene.genome.name == "Human": human_genes_duplicated_before_mammals_speciation.append(gene)
What is the number of genes in a particular ancestral genome?
Ancestral genome objects act as proxy to fetch all hogs at specific taxon.
# return an ancestral genomes object mammals_genome = pyham_analysis.get_ancestral_genome_by_name("Mammals") # get the list of hogs in this ancestral genome number_ancestral_geness_mammals = len(mammals_genome.genes)
How can I visualise the evolutionary history of a gene family (HOG)?
Pyham embeds HogVis, an interactive tool to visualise gene family evolutionary history. It provides a way to trace the evolution of genes in terms of duplications and losses, from ancient ancestors to modern day species.
# Select an HOG hog_of_interest = pyham_analysis.get_hog_by_id(2) # create and export the hog vis as .html output_filename = "hogvis_example.html" pyham_analysis.create_hog_visualisation(hog=hog_of_interest,outfile=output_filename)
As you can see in the figure below, HogVis is composed of two panels: a species tree that allows you to select the taonomic range of interest, a genes panel where each grey square represents an extant gene and each row a species.
We can see for example in the figure above that at the level of mammals all genes of this gene family are descendant from a single comon ancestral gene.
If we are looking at the level of Euarchontoglires we observe that the genes are now split by a vertical line. This vertical line separates 2 group of genes that are each descendants from a same single ancestral gene. This is the result of a duplication in between Mammals and Euarchontoglires.
With a quick look we can easily identify when a duplication occured, which species have lost genes or how big genes families evolved.
How can I visually represent the different evolutionary events that occured in my genomic setup?
Pyham includes treeprofile, a tool to visualise an annotated species tree with evolutionary events (genes duplications, losses, gains) mapped to their related taxonomic range. The aim is to provide a minimalist and intuitive way to visualise the number of evolutionary events that occurred on each branch.
# create and export the treeprofile as .png (.svg, .pdf also available) treeprofile= pyham_analysis.create_tree_profile(outfile="example.png")
As you can see in the figure above, the treeprofile is composed of the reference species used to perform the pyham analysis. Each internal node is displayed with its related histogram of phylogenetic events (number of genes duplicated, lost, gained, or not changed) that occurred on each branch.
Can I have a one-page summary of this blog post for reference?
Of course you can, we have prepared a PDF version of this blog post that you can download here!