pyHam: a python package to visualize and process hierarchical orthologous groups (HOGs)

• Author: Clement Train •

(This entry was updated 19 Sep 2018 to reflect recent feature updates)

pyHam (‘python HOG analysis method’) makes it possible to extract useful information from HOGs encoded in standard OrthoXML format. It is available both as a python library and as a set of command-line scripts. Input HOGs in OrthoXML format are available from multiple bioinformatics resources, including OMA, Ensembl and HieranoidDB.

This post is a brief primer to pyham, with an emphasis on what it can do for you.

How to get pyHam?

pyHam is available as python package on the pypi server and is compatible python 2 and python 3. You can easily install via pip using the following bash command:

pip install pyham

You can check the official pyham website for further information about how to use pyham, documentation and the source code.

What are Hierarchical Orthologous Groups (HOGs)?

You don’t know what HOGs are and you are eager to change this, we have an explanatory video about them just for you:


You can learn more about this in our previous blog post.

Where to find HOGs?

HOGs inferred on public genomes can be downloaded from the OMA orthology database. Other databases, such as Eggnog, OrthoDb or HieranoiDB also infer HOGs, but not all of these databases offer them in OrthoXML format. You can check which database serves hogs as orthoxml here. If you want to use your custom genomes to infer HOGs you can use the OMA standalone software.

In order to facilitate the use of pyHam on single gene family, we provide the option to let pyham fetch required data directly from a comptabilble databases (for now only OMA is available for this feature). The user simply have to give the id of a gene inside the gene family (HOGs) of insterest along with the name of the compatibible database where to get the data and pyHam will do the rest.

For example, if you are interest by the P_53 gene in rat (P53 rat gene page in OMA) you simply have to run the following python code to set-up your pyHam session:

my_gene_query = 'P53_RAT'
database_to_query = 'oma'
pyham_analysis = pyham.Ham(query_database=my_gene_query, use_data_from=database_to_query)

How does pyham help you investigate on HOGs?

The main features of pyHam are: (i) given a clade of interest, extract all the relevant HOGs, each of which ideally corresponds to a distinct ancestral gene in the last common ancestor of the clade; (ii) given a branch on the species tree, report the HOGs that duplicated on the branch, got lost on the branch, first appeared on that branch, or were simply retained; (iii) repeat the previous point along the entire species tree, and plot an overview of the gene evolution dynamics along the tree; and (iv) given a set of nested HOGs for a specific gene family of interest, generate a local iHam web page to visualize its evolutionary history.

What is the number of genes in a particular ancestral genome? (i)

In pyHam, ancestral genomes are attached to one specific internal node in the inputted species tree and denoted by the name of this taxon. Ancestral genes are then infered by fetching all the HOGs at the same level.

# Get the ancestral genome by name
rodents_genome = ham_analysis.get_ancestral_genome_by_name("Rodents")

# Get the related ancestral genes (HOGs)
rodents_ancestral_genes = rodents_genome.genes

# Get the number of ancestral genes at level of Rodents

How can I figure out the evolutionary history of genes in a given genome? (ii)

pyHam provides a feature to trace for HOGs/genes along a branch that span across one or multiple taxonomic ranges and report the HOGs that duplicated on this branch, got lost on this branch, first appeared on that branch, or were simply retained. The ‘vertical map’ (see further information on map here) allows for retrieval of all genes and their evolutionary history between the two taxonomic levels (i.e. which genes have been duplicated, which genes have been lost, etc).

# Get the genome of interest
human = ham_analysis.get_extant_genome_by_name("HUMAN")
vertebrates = ham_analysis.get_ancestral_genome_by_name("Vertebrata")

# Instanciate the gene mapping !
vertical_human_vertebrates = ham_analysis.compare_genomes_vertically(human, vertebrates) # The order doesn't matter!

# The identical genes (that stay single copies) 
# one HOG at vertebrates -> one descendant gene in human

# The duplicated genes (that have duplicated) 
# one HOG at vertebrates -> list of its descendants gene in human

# The gained genes (that emerged in between)
# list of gene that appeared after vertebrates taxon

# The lost genes (that been lost in between) 
HOG at vertebrates that have been lost before human taxon

How can I get an overview of the gene evolution dynamics along the tree that occured in my genomic setup? (iii)

pyHam includes treeProfile (extension of the tool), a tool to visualise an annotated species tree with evolutionary events (genes duplications, losses, gains) mapped to their related taxonomic range. The aim is to provide a minimalist and intuitive way to visualise the number of evolutionary events that occurred on each branch or the numbers of ancestral genes along the species tree.

# create a local treeprofile web page
treeprofile = ham_analysis.create_tree_profile(outfile="treeprofile_example.html")

As you can see in the figure above, the treeprofile is composed of the reference species used to perform the pyham analysis. Each internal node is displayed with its related histogram of phylogenetic events (number of genes duplicated, lost, gained, or retained) that occurred on each branch. The tree profile either display the number of genes resulting from phylogenetics events or the number of phylogenetic events on themself; the switch can be made by opening the settings panel (histogram icon on top right) and selecting between ‘genes’ or ‘events’.

How can I visualise the evolutionary history of a gene family (HOG)? (iv)

pyHam embeds iHam, an interactive tool to visualise gene family evolutionary history. It provides a way to trace the evolution of genes in terms of duplications and losses, from ancient ancestors to modern day species.

# Select an HOG
hog_of_interest = pyham_analysis.get_hog_by_id(2)

# create and export the hog vis as .html
output_filename = "hogvis_example.html"

Then, you simply have to double click on the .html file to open it in your default internet browser. We provide you an example below of what you should see. A brief video tutorial on iHam is available at this URL.

iHam is composed of two panels: a species tree that allows you to select the taonomic range of interest, a genes panel where each grey square represents an extant gene and each row a species.

We can see for example that at the level of mammals (click on the related node and select ‘Freeze at this node’) all genes of this gene family are descendant from a single comon ancestral gene.

Now, if we look at the level of Euarchontoglires (redo the same procedure as for mammals to freeze the vis at this level) we observe that the genes are now split by a vertical line. This vertical line separates 2 group of genes that are each descendants from a same single ancestral gene. This is the result of a duplication in between Mammals and Euarchontoglires.

This small example demonstrate the simplicity of iHam usefulness to identify evolutionary events that occured in gene families (e.g. when a duplication occured, which species have lost genes or how big genes families evolved).

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

Sex, alcohol, and structural variants in fission yeast

• Authors: Fritz Sedlazeck, Dan Jeffares & Christophe Dessimoz •

Our latest study just came out (Jeffares et al., Nature Comm 2017). In it, we carefully catalogued high-confidence structural variants among all known strains of the fission yeast population, and assessed their impact on spore viability, winemaking and other traits. This post gives a summary and the story behind the paper.

Structural variants (SVs) measure genetic variation beyond single nucleotide changes …

Next generation sequencing is enabling the study of genomic diversity on unprecedented levels. While most of this research has focused on single base pair differences (single nucleotide polymorphisms, SNPs), larger genomic differences (called structural variations, SVs) can also have an impact on the evolution of an organism, on traits and on diseases. SVs are usually loosely defined as events that are at least 50 base pair long. They are often classified in five subtypes: deletions, duplications, new sequence insertions, inversions and translocations.

Over the recent years the impact of SVs has been characterized in many organisms. For example, SVs play a role in cancer, when duplications often lead to multiple copies of important oncogenes. Furthermore, SVs are known to play a role in other human disorders such as autism, obesity, etc.

… but calling structural variants remains challenging

In principle, identifying SVs seems trivial: just map paired-end reads to a reference genome, look for any abnormally spaced pairs or split reads (i.e. reads with parts mapping to different regions), and—boom—structural variants!

In practice, things are much harder. This is partly due to the frustrating tendency for SVs occur in or near repetitive regions where short read sequencing struggles to disambiguate the reads. Or in highly variable regions of genome such as the chromosome ends, which tend to be the tinkering workshop of the genome.

As a result, a large proportion of SVs—typically at least 30-40%—remain undetected. As for false discovery rates (proportion of wrongly inferred SVs), they are mostly not well known because validating SVs on real data is very laborious.

Fission yeast: a compelling model to study structural variants

Studying structural variants in Schizosaccharomyces pombe is especially suited because:

  1. The genome is small, well-annotated and simple (few repeats, haploid).
  2. We had 40x or more coverage over 161 genomes covering the worldwide known population of S. pombe.
  3. We had more than 220 accurate trait measurements for these strains at hand. Since the traits are measured under strictly controlled conditions, they contain little (if any) environmental variance—in stark contrast to human traits.

SURVIVOR makes the most out of (imperfect) SV callers

To infer accurate SVs calls, we introduced SURVIVOR, a consensus method to reduce the false discovery rate, while maintaining high sensitivity. Using simulated data, we observed that consensus calls obtained from two to three different SV callers could recover most SV while keeping the false-discovery rate in check. For example, SURIVOR performed second best with a 70% sensitivity (best was Delly: 75%), while the false discovery rate was significantly reduced to 1% (Delly: 13%) (but remember these figures are based on simulation; performance on real data is likely worse.) Furthermore, we equipped SURVIVOR with different methods to simulate data sets and evaluate callers; merge data from different samples; compute bad map ability regions (BED file) over the different regions, etc. SURVIVOR is written in C++ so it’s fast enough to run on large genomes as well. Since then, we are running it on multiple human data sets, which takes only a few minutes on a laptop. SURVIVOR is available on GitHub.

SVs: now you see me, now you don’t

We applied SURVIVOR to our 161 genomic data sets, and then manually vetted all our calls to obtain a trustworthy set of SVs. We then discovered something suspicious. Some groups of strains that were very closely related (essentially clonal, differing by <150 SNPs) had different numbers of duplications, or different numbers of copies in duplications (1x, 2x, even 6x). This observation was also validated with lab experiments.

Interestingly we identified 15 duplications that were shared between the more diverse non-clonal strains (so these must have been shared during evolution) but could not be explained by the tree inferred from SNPs (Figure 1). To confirm this we compared the local phylogeny of SNPs in 20kb windows up and downstream of the duplications with the variance in copy numbers. Oddly the copy number variance was not highly correlated with the SNP tree. This lead to the conclusion that some SVs are transient and thus are gained or lost faster than SNPs.


Tree reconstructed from SNPs, with coloured dots indicating strains with identical SVs.

Duplications happen within near-clonal populations Phylogenetic tree of the strains reconstructed from SNPs data, with eight pairs of very close strains that nonetheless show structural variation. Click to enlarge.


Though this transience came as a surprise, there is actually supporting evidence from laboratory experiments carried out by Tony Carr back in 1989 that duplications can occur frequently in laboratory-reared S. pombe, and can revert. (Carr et al. 1989). The high turnover raises the possibility that SVs could be an important source for environmental adaptation.

SVs affect spore viability and are associated with several traits

We then investigated the phenotypic impact of these SVs. We used the 220 trait measurements from previous publications. We observed an inverse correlation between rearrangement distance and spore viability, confirming reports in other species that SVs can contribute to reproductive isolation. We also found a link between copy number variation and two traits relevant to wine making (malic acid accumulation and glucose+fructose ultilisation) (Benito et al. PLOS ONE 2016).


plots from SV paper showing the relationship between structural variants and spore viability, as well as the contribution of SVs to trait heritability

Structural variants, reproductive isolation, and wine. A) Making crosses between fission yeast strains often results in low offspring survival. The theory is that rearrangements (inversions and translocations) cause errors during meiosis, so we might expect them to affect offspring viability. If we compare offspring viability from crosses with the number of rearrangements that the parents differ by, there is a correlation, and a ‘forbidden triangle’ in the top right of the plot (it seem impossible to produce high viability spores when parents have many unshared rearrangements). B) SVs also affect traits. For > 200 traits (vertical bars) we used [LDAK]( to estimate the proportion of the narrow sense heritability that was caused by copy number variants (red), rearrangements (black) and SNPs (grey). Some traits are very strongly affected by copy number variants, such as the wine-making traits (wine-colored bars along the x-axis). C) Fission yeast wine tasting at UCL—how much of the taste is due structural variants? (Jürg Bähler at right).


We used the estimation of narrow sense heritability from Doug Speed’s LDAK program. Narrow sense heritability estimates how much of a difference in a trait between individuals can be explained by adding up all the tiny effects of the genomic differences (in our case SNPs; deletions and duplications; inversions and translocations and all combined). Overall, we found the heritability was better explained when combining the SNP data as well as the SVs data. In 45 traits SVs explained 25% or more of the trait variability. Five traits that were explained by over 90% heritability using SNPs and SVs came from different growth conditions in liquid medium. This may highlight again the influence of environmental conditions on the genomic structure. For 74 traits (~30% of those we analyzed) SVs explain more of the trait than the SNPs. These high SV-affected traits include malic acid, acetic acid and glucose/fructose contents of wine, key components of taste.

A collaborative effort

On a personal note, the paper concludes a wonderful team effort over two and a half years.

The project started as a summer project for Clemency Jolly, who had then just completed her 3rd undergraduate year at UCL, in the Dessimoz and Bähler labs. Dan Jeffares and the rest of the Bähler lab had just published their 161 fission yeast genomes, with an in-depth analysis of the association between SNPs and quantitative traits (Jeffares et al., Nature Genetics 2015). Studying SVs was the logical next step, but given the challenging nature of reliable SV calling, we also recruited to the team Fritz Sedlazeck, collaborator and expert in tool development for NGS data analysis then based in Mike Schatz’s lab at Cold Spring Harbor Laboratory.

At the end of the summer, it was clear that we were onto something, but there was still a lot be done. Clemency turned the work into her Master’s project, with Dan and Fritz redoubling their efforts until Clemency graduation in summer 2015. It took another year of intense work lead by Dan and Fritz to verify the calls, perform the GWAS and heritability analyses, and publish the work. Since then, Clemency has started her PhD at the Crick Institute, Fritz has moved to John Hopkins University, and Dan has started his own lab at the University of York.



Jeffares, D., Jolly, C., Hoti, M., Speed, D., Shaw, L., Rallis, C., Balloux, F., Dessimoz, C., Bähler, J., & Sedlazeck, F. (2017). Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast Nature Communications, 8 DOI: 10.1038/ncomms14061

Carr AM, MacNeill SA, Hayles J, & Nurse P (1989). Molecular cloning and sequence analysis of mutant alleles of the fission yeast cdc2 protein kinase gene: implications for cdc2+ protein structure and function. Molecular & general genetics : MGG, 218 (1), 41-9 PMID: 2674650

Jeffares, D., Rallis, C., Rieux, A., Speed, D., Převorovský, M., Mourier, T., Marsellach, F., Iqbal, Z., Lau, W., Cheng, T., Pracana, R., Mülleder, M., Lawson, J., Chessel, A., Bala, S., Hellenthal, G., O’Fallon, B., Keane, T., Simpson, J., Bischof, L., Tomiczek, B., Bitton, D., Sideri, T., Codlin, S., Hellberg, J., van Trigt, L., Jeffery, L., Li, J., Atkinson, S., Thodberg, M., Febrer, M., McLay, K., Drou, N., Brown, W., Hayles, J., Salas, R., Ralser, M., Maniatis, N., Balding, D., Balloux, F., Durbin, R., & Bähler, J. (2015). The genomic and phenotypic diversity of Schizosaccharomyces pombe Nature Genetics, 47 (3), 235-241 DOI: 10.1038/ng.3215

Benito, A., Jeffares, D., Palomero, F., Calderón, F., Bai, F., Bähler, J., & Benito, S. (2016). Selected Schizosaccharomyces pombe Strains Have Characteristics That Are Beneficial for Winemaking PLOS ONE, 11 (3) DOI: 10.1371/journal.pone.0151102

More info

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

How to access scientific papers for free?

• Author: Christophe Dessimoz •

You have a reference to a research paper of interest.

Perhaps this one:

Iantorno S et al, Who watches the watchmen? an appraisal of benchmarks for multiple sequence alignment, Multiple Sequence Alignment Methods (D Russell, Editor), Methods in Molecular Biology, 2014, Springer Humana, Vol. 1079. doi:10.1007/978-1-62703-646-7_4

How can you retrieve the full article?

Gold open access

If the paper is published as “gold” open access, you can download the PDF from the publisher’s website. In such cases, it’s easiest to paste the title and author names in Google and look for the first hit in the journal.

Alternatively, if you know the Digital Object Identifer (DOI) of the article, prepending should directly lead you to the article. In this case, the DOI is included at the end of the citation. (If you don’t know its DOI, you can find it on the publisher’s website or in a database such as PubMed). We get:

Unfortunately, we see that this particular paper is behind a paywall at the publisher.

Green open access

There is, however, still a chance that it might be deposited in a preprint server or in an institutional repository. This is referred to as “green” open access. One way to find out is to look at the article record in Google Scholar and look for a link in the right margin:

screenshot of google scholar with link in the right margin circled

In this case, the paper is thus available on

If you know the DOI, an even quicker way of looking for a deposited version is by using the redirection tool, which works analogously to but redirect to a green open access version whenever possible:

Here, a free version of the article deposited in the UCL institutional repository is found.

If you use Chrome or Firefox, you can also use the Unpaywall browser extension to automatically get a link to green open access alternatives as you land on paywalled articles.

On the author’s homepage

Sometimes, the paper is available on the homepage of one of the authors. In this case, a link to the preprint is provided on the homepage of the corresponding author (item #36).


Instead of an institutional homepage, some authors self-archive their articles on ResearchGate. In the case of our paper, the full-text version is indeed directly available.

And otherwise, if one of the authors is active on ResearchGate, it’s also possible to send a full-text request at the click of a button.

Pirated version off Sci-Hub

Sci-Hub serves bootleg copies of pay-walled articles. This is illegal, so I only mention it for educational purposes. This works most reliably using, again, DOIs:

If you, purely hypothetically of course, pasted that URL in your browser, you would or would not get a PDF of the entire book in which the referenced article appears.


It’s also possible to request full-text articles via Twitter. As described in Wikipedia, this works by tweeting the article title, its DOI, an email address (to indicate to whom the article should be sent), and the hashtag #icanhazpdf. Someone with access to the article might send a copy via email. Once the article is received, the tweet is deleted. Again, I mention this for educational purpose only—don’t break the law.

cat asking whether it can haz pdf because cat is purr

Image credit: Field of Science wrote an interesting post on #icanhazpdf a few years ago.

Email the corresponding author

Finally, you can always ask the corresponding author by email for a copy of their article. They will happily oblige.


[Update (19 Mar 2017): added mention of unpaywall to seemlessly retrieve green open access]

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

Life as an academic: my 2016 in numbers

• Author: Christophe Dessimoz •

Life as an academic is varied and busy. Students sometimes believe that all we do is teach. In fact, we do quite a few other things. Here’s my 2016 in numbers.

  • number of papers published: 10
  • number of paper rejections: 7
  • number of books edited: 1
  • number of grant proposals submitted: 8
  • number of research contracts negotiated with the industry: 2
  • number of blog posts: 5
  • number of tweets: 474 (66% were retweets)
  • number of YouTube videos: 1
  • number of papers reviewed: 24
  • number of papers edited: 3
  • number of grants reviewed: 3
  • number of PhD theses examined: 2
  • number of emails received (excluding spam and mailing-lists): 12,695
  • number of emails written: 4,377 (!)
  • number of minutes videoconferencing on GoToMeeting: 13,236 (!!)
  • number of Geneva-London-Geneva roundtrips: 12
  • number of meetings with >50 attendees co-organised: 6
  • number of seminars hosted: 4
  • number of conferences attended: 3
  • number of talks given: 11
  • number of semester-long courses organised: 2
  • number of hours lectured: 32
  • number of 2000-word student papers marked: 47
  • number of summer students supervised: 4
  • number of overnight retreats attended: 4
  • number of work Christmas dinners attended: 3
  • number of annual reports written: 3 (this does not count)
  • number of Tête de Moine eaten at lab celebrations: 4
  • number of times moved home: 0 (noteworthy since we moved 5 times in the preceding 5 years…)

I wish you, Dear Reader, all the best in 2017!

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

What are Hierarchical Orthologous Groups (HOGs)?

• Author: Christophe Dessimoz •

One central concept in the OMA project and other work we do to infer relationships between genes is that of Hierarchical Orthologous Groups, or “HOGs” for the initiated.

We’ve written several papers on aspects pertaining to HOGs—how to infer them, how to evaluate them, them being increasingly adopted by orthology resources, etc.—but there is still a great deal of confusion as to what HOGs are and why they matter.

Natasha Glover, talented postdoc in the lab, has produced a brief video to introduce HOGs and convey why we are mad about them!




Altenhoff, A., Gil, M., Gonnet, G., & Dessimoz, C. (2013). Inferring Hierarchical Orthologous Groups from Orthologous Gene Pairs PLoS ONE, 8 (1) DOI: 10.1371/journal.pone.0053786

Boeckmann, B., Robinson-Rechavi, M., Xenarios, I., & Dessimoz, C. (2011). Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees Briefings in Bioinformatics, 12 (5), 423-435 DOI: 10.1093/bib/bbr034

Sonnhammer, E., Gabaldon, T., Sousa da Silva, A., Martin, M., Robinson-Rechavi, M., Boeckmann, B., Thomas, P., Dessimoz, C., & , . (2014). Big data and other challenges in the quest for orthologs Bioinformatics, 30 (21), 2993-2998 DOI: 10.1093/bioinformatics/btu492

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

Interview with Tunca Doğan, OMA Visiting Fellow 2016

• Author: Tunca Doğan •

Note: the “Life in the Lab” series features interviews of interns and visitors. This post is by our second 2016 OMA Visiting Fellow Tunca Doğan, who spent a month with us earlier this year. You can follow Tunca on Twitter at @tuncadogan. —Christophe


Please introduce yourself in a few sentences.

My name is Tunca Doğan. I received my PhD in 2013 with a thesis study in the fields of bioinformatics and computational biology where we developed methods for the clustering of the protein sequences using unsupervised machine learning techniques (Dogan and Karacali, 2013). I’ve since been working as a post-doctoral fellow in the EMBL-EBI, UK under the Protein Function Development team (UniProt Database) leaded by Dr Maria Martin. Here I’m developing new tools and methods for the automated functional annotation of protein records in the UniProtKB using a variety of features including domain architectures (Dogan et al., 2016). I’m also conducting research in the field of computational drug discovery. As of 2016, I’m also affiliated to the Department of Health Informatics, METU, Turkey both as a senior research fellow and a faculty candidate.

Why did you choose to apply to the OMA visiting fellowship programme?

The team behind OMA is world-leading in the field of phylogenomics, and they authored many highly cited publications in this area. Moreover, OMA is considered to be one of the most reliable and comprehensive resources offering phylogenomic information on various species. I’ve applied to this programme in order to develop my knowledge in phylogenomic research, particularly about the OMA production. My specific research aim was to investigate if and how the information in OMA can be utilized in order to increase the coverage and the quality of the automated functional annotation of proteins in the UniProt database.

Discussions on UNIL campus with Leonardo de Oliveria Martins, Surag Nair, Clément Train, David Dylus and Tunca Doğan (from l. to r.)

What project did you work on during your visit?

The project I worked on had two sides: 1) investigating novel ways of quality checking of the data produced in the OMA pipeline (especially HOGs) using the Domain Architecture Alignment and Classification (DAAC) method I previously developed in UniProt; 2) investigating the use of OMA groups and HOGs to propagate the functional annotation between the (homologous) member proteins of the same clusters/classes.

Was there any highlight or low point you’d like to share?

It was a great experience for me both professionally and socially. I’ve learnt a great deal in just one month and we still keep our collaboration with the continuation of the abovementioned project. Everyone I met in the group: Christophe, Adrian, David, Leonardo and Clement were all knowledgeable, helpful and friendly that I had great time during my stay. It was a great pleasure to meet and to work with them all…

UNIL/EPFL campus is just beautiful, at the shores of lake Geneva. The campus is also well-equipped for all possible needs. This was also my first time in Switzerland and I was enchanted by the beauty of this country… The only downside for a foreign visitor could be the expensiveness of life in Switzerland, which was also manageable with a little prior investigation and planning.

Do you have any practical tip for future OMA visiting fellows?

I definitely recommend any researcher (at PhD or post-doc level) that has an interest in phylogenomics to apply to this programme. You’ll learn a great deal and have a good time at the same time. Also (for the foreigners) do not forget about travelling around this beautiful country in your spare time…


Editor’s note: If you are interested in the OMA visiting fellowship programme, consult this page.


Doğan T, & Karaçalı B (2013). Automatic identification of highly conserved family regions and relationships in genome wide datasets including remote protein sequences. PloS one, 8 (9) PMID: 24069417

Doğan T, MacDougall A, Saidi R, Poggioli D, Bateman A, O’Donovan C, & Martin MJ (2016). UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB. Bioinformatics (Oxford, England), 32 (15), 2264-71 PMID: 27153729

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

Interview with Rosa Fernández, 2016 OMA Visiting Fellow

• Author: Rosa Fernández García •

Note: We are rebooting our “Life in the Lab” series, which features interviews of interns and visitors. This post is by our inaugural OMA Visiting Fellow Rosa Fernández García, who spent a month with us earlier this year. You can follow Rosa on Twitter at @Rosamygale. —Christophe


Please introduce yourself and your research interests.

I received my bachelor’s degree in Biology (major in Zoology) at Complutense University in Madrid, Spain. I got my master’s and PhD at the same university with a thesis about phylogeny and phylogeography of cosmopolitan earthworms. After that, I moved to the lab of Prof. Gonzalo Giribet at Harvard University where I was a postdoc during 3 years and a Research Associate for another year. In January 2017, I’ll move to Barcelona to work as a Research Fellow in the lab of Dr. Toni Gabaldón at the Center for Genomic Regulation.

My research addresses fundamental questions about evolution in invertebrates: in other words, I am fascinated by how, when and where biodiversity took its form, and why it is maintained. My main two animal groups of interest are terrestrial annelids (oligochaetes) and (pan)arthropods, particularly the earliest branching lineages and most scientifically neglected groups (chelicerates and myriapods).

How did biodiversity took its shape? Resolving the tree of life. Macroevolutionary patterns are generally what we see when we look at the large-scale history of life. It encompasses the grandest trends and transformations in evolution, such as the origin of bilateral animals or the radiation of arthropods. In order to understand how lineages are related to each other, I study macroevolutionary patterns in several groups of invertebrates through phylogenetics and phylogenomics. I currently lead a fruitful line of research dealing with phylogenomics of myriapods and chelicerates, having optimized protocols to sequence successfully single individuals of the rarest and smallest arthropods. We are getting closer to resolve the Arthropod Tree of Life!

Artist’s rendition of the Arthropod tree of life

When and where? I tried to understand the mode and tempo of animal diversification patterns through the integration of phylogeography, biogeography and paleogeography.

Why? Comparative transcriptomics and genomics is a very powerful tool to shed light on very interesting evolutionary questions, such as arthropod terrestrialization - one of my favorite new lines of research.

Why did you choose to apply to the OMA visiting fellowship programme?

Orthology inference is one of the key steps in phylogenomics. I had been using OMA for a few years and I wanted to learn how I could use it more efficiently in my ongoing projects.

What project did you work on during your visit?

My project focused on optimizing OMA runs for some big and challenging data sets that I was having problems with. Also, I was interested in learning how I could exploit hierarchical orthogroups for comparative genomics studies in arthropods.

Rosa and David Dylus at coffee break (photo by Arthur Dessimoz)

Was there any highlight or low point you’d like to share?

It was a great experience to be in the Dessimoz lab for a month. As a systematist with relatively limited bioinformatic background, it was absolutely great to exchange ideas with computer scientists interested in the same scientific problems but with a completely different perspective that mine. It was a very enriching experience.

Do you have any practical tip for future OMA visiting fellows?

One month was not enough for me, so try to stay longer if your project is ambitious. And ask Christophe to bring a Tête de Moine cheese in your last day, it’s delicious!


Editor’s note: If you are interested in the OMA visiting fellowship programme, consult this page.


Fernández R, Laumer CE, Vahtera V, Libro S, Kaluziak S, Sharma PP, Pérez-Porro AR, Edgecombe GD, & Giribet G (2014). Evaluating topological conflict in centipede phylogeny using transcriptomic data sets. Molecular biology and evolution, 31 (6), 1500-13 PMID: 24674821

Fernández, R., Hormiga, G., & Giribet, G. (2014). Phylogenomic Analysis of Spiders Reveals Nonmonophyly of Orb Weavers Current Biology, 24 (15), 1772-1777 DOI: 10.1016/j.cub.2014.06.035

Fernández R, & Giribet G (2015). Unnoticed in the tropics: phylogenomic resolution of the poorly known arachnid order Ricinulei (Arachnida). Royal Society open science, 2 (6) PMID: 26543583

Novo M, Fernández R, Andrade SC, Marchán DF, Cunha L, & Díaz Cosín DJ (2016). Phylogenomic analyses of a Mediterranean earthworm family (Annelida: Hormogastridae). Molecular phylogenetics and evolution, 94 (Pt B), 473-8 PMID: 26522608

Fernández R, Edgecombe GD, & Giribet G (2016). Exploring Phylogenetic Relationships within Myriapoda and the Effects of Matrix Composition and Occupancy on Phylogenomic Reconstruction. Systematic biology, 65 (5), 871-89 PMID: 27162151

Sharma, P., Fernandez, R., Esposito, L., Gonzalez-Santillan, E., & Monod, L. (2015). Phylogenomic resolution of scorpions reveals multilevel discordance with morphological phylogenetic signal Proceedings of the Royal Society B: Biological Sciences, 282 (1804), 20142953-20142953 DOI: 10.1098/rspb.2014.2953

Rosa Fernandez, Prashant Sharma, Ana LM Tourinho, & Gonzalo Giribet (2016). The Opiliones Tree of Life: shedding light on harvestmen relationships through transcriptomics BioRxiv DOI: 10.1101/077594

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

Over 10 computational biology PhD and postdoc positions

• Author: Christophe Dessimoz •

Our lab has several open positions, and so do collaborators and colleagues across Switzerland, the UK, and Europe.

Please help us spread the word by forwarding this post. If you have computational biology jobs to announce, let me know and I will gladly add a link.

Bioinformatic openings associated with our lab

Bioinformatic openings with colleagues

PhD fellowship programmes

Edits: added openings with Ziheng Yang (on 23 Nov) and with Klaas Vandepoele (on 24 Nov)

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

Getting Published (the story behind the paper)

• Author: Natasha Glover •

Our paper “A Pragmatic Approach to Getting Published: 35 Tips for Early Career Researchers” just came out in Frontiers in Plant Science. This is the story behind the paper.

For my second postdoc, I was the fortunate receipient of a PLANT FELLOWS scholarship. PLANT FELLOWS is an international program that provides research grants to postdocs in the field of plant science. The fellows are based at many different host institutions throughout Europe. I myself am working at Bayer Crop Science in Gent, Belgium, in collaboration with the Dessimoz lab in London and Lausanne. Part of the PLANT FELLOWS mission is to provide training, mentoring, and networking to the postdocs—skills essential for career advancement.

Last year, the annual PF meeting was held in Männedorf, Switzerland from September 28 to October 1 2015. Training workshops took place at the Boldern Hotel, surrounded by meadows and with a nice view of Lake Zürich.


Group picture from the 3rd annual PLANT FELLOWS meeting

The meeting consisted of several days of trainings and workshops. For one of the days, I chose to participate in the workshop “Advanced Strategies for Dealing with the Publication Process.” I was especially keen on learning more about this particular subject. As a postdoc still trying to navigate the publication waters, I was looking for all the advice I could get. We’ve all heard the saying before: publish or perish. Publishing papers in your postdoc years is so important for an academic career.

There were about 15 postdocs in this day-long workshop. The facilitator, Philipp Mayer, came with a bunch of photocopied book chapters, articles, and USB keys full of pdfs for each of us to use on our laptops. The objective of the workshop was to, as a group, write a small paper about advanced publication strategies using the literature we were provided with. Our plan of attack was to pool our collective postdoc experience and come up with a list of our most useful recommendations on how to get a scientific paper published.

After feverishly reading websites, book chapters and papers, at the end of the day we came up with a draft: an introduction, our recommendations broken into 3 main sections, and a conclusion. We had a respectable number of references. But what would be the fate of our paper? About a third of the class was apathetic, a third thought we should aim for a blog post, and another third thought we should try for a “real” scientific journal. I had really enjoyed the workshop so I lobbied for publishing it in a real journal. I liked the experience of learning about a topic, working collaboratively with my peers, and then passing on the information for others to benefit.

I volunteered to take charge of the paper, edit it, and submit it to journals in hopes of getting it published. At the end of the day I left with a draft of the paper, many references, the contact information of all the attendees, and the full support of the facilitator (Philipp) for any future help that I might need. I looked at it as an opportunity take a leadership role in publishing a paper, from start to finish. And more importantly, it was a chance to put our own advice into practice.

Upon returning to Belgium, I quickly found out that one of the sentences we had written in the paper rang true: It is a common misconception among early career researchers that the presentation of the work in a manuscript is the last stage of a project. There is a long and complicated process associated with submission, review, and revision that must be taken into account. During the next month, I reread paper, finished writing short sections, added references, edited, and got feedback from the coauthors. We agreed on the author order, and shared the document using Authorea. Philipp and I went back and forth with several rounds of editing.

Attempt #1

We decided to submit our manuscript to eLife, which is a prestigious peer reviewed open access journal with favorable policy toward early career researchers. I wrote a cover letter to the editor describing our paper and asking if the topic was suitable to be considered for eLife.

Within a few days, the editor read the manuscript but informed me that he was unable to send it out for review because it wasn’t “fresh” enough, meaning most of what we said had already be discussed many times in the scientific community. Despite the sting of having a paper rejected directly from the editor, I decided to take the advice we had written in the paper: Remove your personal feelings from the peer review process. Time to find the next journal.

During the following month and a half, the manuscript was pushed to the bottom of my To Do list, as other projects and tasks got my attention. Christmas holidays came and went, and admittedly this paper was the last thing on my mind.

Attempt #2

In January, I sent a presubmission inquiry to PLOS Biology. The PLOS Biology editor wrote back within a few days to inform me that although they appreciated the attention to an important problem, they could not encourage us to submit because it didn’t present “novel strategies for increasing access to research, improving the quality of research results, or fixing flawed measures of impact.” Since this was the second time I had heard this same exact criticism, I realized it was time to take more advice from the paper: It is critical to highlight the novelty and importance in the article and cover letter. We were going to have to add something to the paper to make it more novel.

Attempt #3

Shortly after, I contacted the Frontiers in Plant Science (FiPS) Editorial Office with a new and improved cover letter. FiPS is an open access online journal publishing many different peer reviewed articles: research, reviews, commentaries, and perspectives, among others. The editor and I discussed morphing the paper into something that would be more plant related, given the plant science background of all the coauthors. Over the next month, it was back to editing the paper. I proposed edits that would make our tips more plant-specific. We added advice about industry-academia collaborations, and more information about plant science journals. Philipp, the coauthors, and I went back and forth several times with rounds of edits, adding more references and polishing more details. I submitted the final version of the paper to Frontiers in Plant Science on March 15.

The experience of the collaborative peer review by FiPS was a pleasant and efficient one. Their website says “Frontiers reviews are standardized, rigorous, fair, constructive, efficient and transparent.” I enthusiastically agree. Within two weeks, we had received comments from the reviewers. There were some major points that needed to be addressed before Frontiers could offer publication. However, the points were all very relevant and only helped to make the paper stronger. During the process of the interactive review, I took more guidance from the paper: Go point by point through the reviewer comments and either make the suggested change or politely explain and clarify the misunderstanding.

April 21st : Acceptance achieved! Approximately 5 weeks after submitting the article, it was accepted and the provisional version of the manuscript was published online. This is an extremely fast turnover time, in part due to the responsiveness of the editor, quick but in-depth peer review, and the interactive, transparent review discussion.

What I learned

This collaboration with the PLANT FELLOWS postdocs resulted in a paper I can say I’m proud of. I learned many things about the publication process—not only through a literature review, but by actually experiencing the process first hand. Here are some of the main things that stuck with me:

  • There is a certain creative power in bringing people together in a beautiful location to brainstorm and produce an outcome within a short period of time. However, it is necessary for someone to take the reins and commit to the follow-through in order to get to a finished product. I think things like hackathons or other collaborative group efforts could lead to fruitful outcomes.
  • I learned how to coordinate a small project. This was a great collaborative effort, which gave me an opportunity to practice the recommendations we wrote about in the paper. I discovered firsthand the importance of the initial contact with the editor. As soon as we reworked the paper to approach the topic from a plant-specific standpoint, this added novelty to the paper. We were able to highlight this novelty in the cover letter.
  • Don’t give up. Many times I got distracted or discouraged and thought to publish the manuscript on our blog, but I’m glad in the end we found a home for it at FiPS. Perseverance is key.


Glover, N., Antoniadi, I., George, G., Götzenberger, L., Gutzat, R., Koorem, K., Liancourt, P., Rutowicz, K., Saharan, K., You, W., & Mayer, P. (2016). A Pragmatic Approach to Getting Published: 35 Tips for Early Career Researchers Frontiers in Plant Science, 7 DOI: 10.3389/fpls.2016.00610

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter. a new interactive way of visualising and comparing trees

• Author: David Dylus •

The paper introducing our new tree visualisation tool was just published in MBE.

Yet another tool to display trees, you might say, and indeed, so it is. But for all the tools that have been developed over the years, there are very few that scale to large trees, make it easy to compare trees side-by-side, and simply run in a browser on any computer or mobile device.

To fill this gap, we created

Story behind the paper

The project started as a student summer internship project, with the aim of producing a tree visualiser that facilitates comparison of trees built on the same set of leaves. After reading the project description, Oscar Robinson, a brilliant student from the Computer Science department at UCL, decided to work on this project during a three month internship. He saw a chance to apply his experience in the development of web tools and to develop his knowledge in the field of data visualisation, one of his major interests.

Once Oscar started with the development of, he realised that only a few tools existed for visual comparison of two trees and either seemed to rely on old technology or were cumbersome to use. Especially this incentive lead him to develop our tool into a fully fledged online resource that is easy to use, platform independent and based on the newest javascript libraries (e.g. D3). Within three months, he managed to produce a prototype of the tool. However, due to the short length of the internship, some details still needed a bit of attention.

Luckily for me, I started my PostDoc in the Dessimoz Lab around that time. Being a novice in a computational lab, Christophe proposed to me to take over the project and bring it to completing as a way to kickstart my postdoc. Altough my computational background at that time did not include any experience in JavaScript programming, I anyway accepted the challenge and was eager to start learning the material. Especially my initial steep learning progress was facilitaed by the help of two other brilliant PhD students, Alex Vesztrocy and Clément Train. Once I acquired some basic understanding, I was able to resolve bugs and add some key missing functionalities such as automatic tree rerooting or persistent storage and sharing functionality.

What is and what can it do? is a web tool that works in any modern browser. All computations are performed client-side and the only restriction on performace is the machine it is running on. Trees can be input in Newick and Extended Newick format. offers many features that other tree viewers have. Branches can be swapped, the rooting can be changed, the thickness, font and other parameters are adaptable. Many of these operations can be performed directly by clicking on a branch or a node in the tree. Importantly, it features an automatic subtree collapsing function: this facilitates the visualisation of large trees and hence the analysis of splits that are deep in the tree.

Next to basic tree visualisation/manipulation it features a compare mode. This mode allows to compare two trees computed using different tools or different models. Similarities and differences are highlighted using a colour scheme directly on the individual branches, making it clear where the differences in two topologies actually are. Additionally, since the output of different tools provides trees with very different rootings and leaf order, has a function to root one of the trees according to the other one and adapt the order of the leaves according to a fixed tree.

How do you use

To save you time, here is a one minute screencast highlighting some of the key features of

You can find more info in the Manual.



Robinson, O., Dylus, D., & Dessimoz, C. (2016). interactive viewing and comparison of large phylogenetic trees on the web Molecular Biology and Evolution DOI: 10.1093/molbev/msw080

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

A web service to facilitate orthology benchmarking

• Author: Christophe Dessimoz •

Our latest paper, “Standardized Benchmarking in the Quest for Orthologs”, just came out in Nature Methods. This is a brief overview and story behind the paper.

Orthology benchmarking

Orthology, which formalises the concept of “same” genes in different species, is a foundation of genomics. Last year alone, more than 13,000 scientific papers were published with keyword “ortholog”. To satisfy this enormous demand, many methods and resources for ortholog inference have been proposed. Yet because orthology is defined from the evolutionary history of genes, which usually cannot be known with certainty, benchmarking orthology is hard. There are also practical challenges in comparing complex computational pipelines, many of which are not available as standalone software.

Identifier mapping: the bane of bioinformatics

Back in 2009, Adrian Altenhoff and I published a paper on ortholog benchmarking in PLOS Computational Biology. At the time, this was the first benchmark study with phylogeny-based tests. It also investigated an unprecedented number of methods. One of the most challenging aspect of this work—and by far the most tedious—was to compare inferences performed by different methods on only partly overlapping sets of genomes, often with inconsistent identifiers and releases—giving right to the cynics’s view that “bioinformatics is ninety percent identifier mapping…”

Enter the Quest for Orthologs consortium

Around that time, Eric Sonnhammer and Albert Vilella organised the first Quest for Orthologs (QfO) meeting at the beautiful Genome Campus in Hinxton, UK—the first of a series of collaborative meetings. We have published detailed reports on these meetings (2009, 2011, 2013; stay tuned for the 2015 meeting report…).

Out of these interactions, the Quest for Orthologs consortium was born, with the mission to benchmark, improve and standardise orthology predictions through collaboration, the use of shared reference datasets, and evaluation of emerging new methods.

The consortium is open to all interested parties and now includes over 60 researchers from 40 institutions worldwide, with representatives from many resources, such as UniProt, Ensembl, NCBI COGs, PANTHER, Inparanoid, PhylomeDB, EggNOG, PLAZA, OrthoDB and our own OMA resource.

The orthology benchmark service and other contributions of the paper

The consortium is organised in working groups. One of them is the benchmarking working group, in which Adrian and I have been very involved. This new paper presents several key outcome of the benchmarking working group.

First and foremost, we present a publicly-available, automated, web-based benchmark service. Accessible at, the service lets method developers evaluate predictions performed on the 2011 QfO reference proteome set of 66 species. Within a few hours after submitting their predictions, they obtain detailed feedback on the performance of their method on various benchmarks compared with other methods. Optionally, they can make the results publicly available.

alignment filtering

Conceptual overview of the benchmark service (Fig 1 of the paper; click to enlarge)

Second, we discuss the performance of 14 orthology methods on a battery of 20 different tests on a common dataset across all of life.

Third, one of the benchmark, the generalised species discordance test, is new and provides a way for testing pairwise orthology based on trusted species trees of arbitrary size and shape.


For developers of orthology prediction methods, this work sets minimum standards in orthology benchmarking. Methodological innovations should be reflected in competitive performance in at least a subset of the benchmarks (we recognise that different applications entail different trade-offs). Publication of new or update methods in journals should ideally be accompanied by publication of the associated results in the orthology benchmark service.

For end-users of orthology predictions, the benchmark service provides the most comprehensive survey of methods to date. And because it can process new submissions automatically and continuously, it holds the promise of remaining current and relevant over time. The benchmark service thus enables users to gauge the quality of the orthology calls upon which they depend, and to identify the methods most appropriate to the problem at hand.



Altenhoff, A., Boeckmann, B., Capella-Gutierrez, S., Dalquen, D., DeLuca, T., Forslund, K., Huerta-Cepas, J., Linard, B., Pereira, C., Pryszcz, L., Schreiber, F., da Silva, A., Szklarczyk, D., Train, C., Bork, P., Lecompte, O., von Mering, C., Xenarios, I., Sjölander, K., Jensen, L., Martin, M., Muffato, M., Altenhoff, A., Boeckmann, B., Capella-Gutierrez, S., DeLuca, T., Forslund, K., Huerta-Cepas, J., Linard, B., Pereira, C., Pryszcz, L., Schreiber, F., da Silva, A., Szklarczyk, D., Train, C., Lecompte, O., Xenarios, I., Sjölander, K., Martin, M., Muffato, M., Quest for Orthologs consortium, Gabaldón, T., Lewis, S., Thomas, P., Sonnhammer, E., Dessimoz, C., Gabaldón, T., Lewis, S., Thomas, P., Sonnhammer, E., & Dessimoz, C. (2016). Standardized benchmarking in the quest for orthologs Nature Methods DOI: 10.1038/nmeth.3830

Altenhoff, A., & Dessimoz, C. (2009). Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods PLoS Computational Biology, 5 (1) DOI: 10.1371/journal.pcbi.1000262

Gabaldón, T., Dessimoz, C., Huxley-Jones, J., Vilella, A., Sonnhammer, E., & Lewis, S. (2009). Joining forces in the quest for orthologs Genome Biology, 10 (9) DOI: 10.1186/gb-2009-10-9-403

Dessimoz, C., Gabaldon, T., Roos, D., Sonnhammer, E., Herrero, J., & Quest for Orthologs Consortium (2012). Toward community standards in the quest for orthologs Bioinformatics, 28 (6), 900-904 DOI: 10.1093/bioinformatics/bts050

Sonnhammer, E., Gabaldon, T., Sousa da Silva, A., Martin, M., Robinson-Rechavi, M., Boeckmann, B., Thomas, P., Dessimoz, C., & Quest for Orthologs Consortium. (2014). Big data and other challenges in the quest for orthologs Bioinformatics, 30 (21), 2993-2998 DOI: 10.1093/bioinformatics/btu492

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

Thoughts on pre- vs. post-publication peer-review

• Author: Christophe Dessimoz •

A few months ago, we published a paper that spent four years in peer-review (story behind the paper). Because of this, I feel entitled to an opinion on the pre- vs post-publication review debate.

Background on preprints and their effect on peer-review

If you have been living under a rock, or if you are not on Twitter, you may not have noticed that preprints are becoming more widely accepted in biology—supported by initiatives such as Haldane’s Sieve and bioRxiv. This is particularly true in population genetics, evolutionary biology, bioinformatics, and genomics. Typically, a manuscript is made available as preprint just as it is submitted to a scientific journal, and therefore prior to peer-review. I am saying “made available” instead of “published” because although preprints can be read by anybody, the general view is that the canonical publication event lies with the journal, post peer-review. Because of this, many traditional journals tolerate this practice: peer-review technically remains “pre-publication” and the journals get to keep their gatekeeping function.

The key benefit of preprints is that they accelerate scientific communication. Indeed, peer-review can be long and frustrating for authors. Reviewers sometimes misjudge the importance of papers or request unreasonable amounts of additional work. The ability to bypass peer-review can thus be liberating for authors. Thus, if we instead recognised preprints as the canonical publication event, so goes the idea, peer-review would be relagated to a secondary role and journals would loose their gatekeeping function. This is the “post-publication” peer-review model.

For more background info, here are a few pointers:

Advantages of pre- and post-publication peer-review

What did our recent experience teach us? Spending four years in various stage of peer-review is a huge strain on the authors, reviewers, and editors. On the positive side, the final paper was more complete (some of the methods tested were published after our first submission!). Undoubtedly, it became a clearer and more solid paper. However, as I pointed out in my post on the paper, our main conclusions did not change. They could have been brought to everyone’s attention four years earlier.

So should pre-publication peer-review be abolished? In this particular case, it’s debatable. If we had known what awaited us, we would have released the manuscript as a preprint (eg on arXiv, bioRxiv, or PeerJ PrePrint)—something we have done with subsequent pieces of work.

However in general, I still think that pre-publication peer-review has many merits. First, thankfully this experience was extreme; on average things are much faster: 2-4 months including one revision cycle is quite typical in my experience. With some journals, this can be even faster (Bioinformatics, PeerJ or MBE jump to mind). Second, pre-publication peer-review can identify flaws or interesting points overlooked by the authors—to the point that in some cases (large multi-author studies!), peer-reviewers wind up contributing almost certainly more to the paper than some of the co-authors. Furthermore, while reviewers do not always agree in their comments, when they do, the authors better pay attention.

That being said, unorthodox or controversial results can be extremely difficult to publish under the pre-publication peer-review model, particularly if some of the reviewers have vested interests in the status quo.

Best practices at the age of pre- and post-publication peer-review

So what model am I arguing for? I think the emerging combination of preprints and journals can give us the best of the two worlds. Preprints ensure that advances can be quickly, broadly, and unimpededly disseminated. Journals add a layer of quality control and differentiation, and even glamour if so they choose. Importantly, this new paradigm shifts power from the publishers back to us, the researchers. And you all remember what comes with great powers, right?

As peer-reviewers, it is our job to identify specific issues with the work, and bring them to the authors and the editor, but ultimately, we should remember that the work we review is not our work. If the authors choose to ignore points we consider important, it may be more constructive and rewarding to write a rebuttal paper anyway. Post-publication peer-review as it were!

As editors, we should pay attention to potential conflicts of interests, focus on a limited set of key points that need addressing, and remember that every additional round of revisions costs precious time and resources. The additional delay could result in wasteful duplication of the work by others, or missed opportunities to build upon the findings. Thus we have a moral obligation to balance pre- and post-publication peer-review. Too often, editors lazily or cowardly repeatedly forward all reviewer comments back and forth without taking a stance, with little consideration of the burden this incurs to the authors and the rest of the community.

As authors, one simple but powerful thing we can do is to more openly acknowledge the shortcomings of our work and candidly disclose unresolved issues. In case of fundamental disagreeement with a peer-reviewer, the impasse may be overcome by including an account of the disagrement as part of the paper. In fact, this is precisely what we ended up doing in our paper. In the discussion section, we wrote:

And sixth, we disclose that in spite of the several lines of evidence and numerous controls provided in this study, one anonymous referee remained skeptical of our conclusions. His/her arguments were: (i) instead of using default parameters or globally optimized ones, filtering parameters should be adjusted for each data set; (ii) the observations that, in some cases, phylogenies reconstructed using a least-squares distance method were more accurate than phylogenies reconstructed using a maximum likelihood method (Supplementary Figs. 7–10 available on Dryad at, and that ClustalW performed “surprisingly well” compared with other aligners, are indicative that the data sets used for the species discordance test are flawed; (iii) the parsimony criterion underlying the minimum duplication test and the Ensembl analyses is questionable.

Indeed, not every issue can be resolved during peer-review. At some point, the debate should happen in the open. Any one single paper is rarely the “last word” on a question anyway. And as our editor admitted, a bit of controversy is good for the journal.



Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M, & Dessimoz C (2015). Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference. Systematic biology, 64 (5), 778-91 PMID: 26031838

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

What is homoeology? (story behind the paper)

• Author: Natasha Glover •

We know homologs are genes related by common ancestry. But throw complex evolutionary events into the mix and things can get little dicey. Under the umbrella of homologs exist many different categories: orthologs, paralogs, ohnologs, xenologs, co-ortholog, in-paralogs, out-paralogs, paleologs, among others. All of these —log terms have a specific meaning (see my previous blog post on orthology and paralogy), but now we will focus on one in particular: homoeologs.

But before we get into the definition, let’s start at the beginning. When I started as a postdoc at Bayer CropScience working with Henning Redestig in collaboration with Christophe Dessimoz University College London, I was tasked with evaluating homoeolog predictions using the OMA algorithm.

What are homoeologs?

From my previous experience, I knew homoeologs as roughly “corresponding” genes between subgenomes of a polyploid organism. For example, the wheat genome is an allohexaploid, with 3 diploid subgenomes named A, B, and D. Given a gene on chromosome 3B, you will most likely find a nearly identical copy on chromosomes 3A and 3D, in roughly the same position. These corresponding copies across subgenomes are known as homoeologs. But this definition left something to be desired— it didn’t tell me anything about the evolutionary relationship between the homoeologs. Worse, it was ambiguous in that it required discretionary similarity thresholds in terms of sequence and positional conservation. How could we test for performance if there was no unambiguous definition of the target?

Time to hit the books

Like many researchers starting a new project, I went to the scientific literature to get more information. After many hours spent on google scholar, I found myself with more questions than answers. Firstly, what were the evolutionary events that give rise to homoeologs? How do they fit in with the other —log terms? Can they be found only in a certain type of polyploid, but not another? How do things like gene duplication and movement affect our understanding of what a homoeolog is? And finally, after seeing it the word written as homoeolog, homeolog, and homoeologue, how do you even spell it?

There are some excellent review papers out there on polyploidy which shed light on the biological consequences of homoeology. This, this, or this for example. However, when searching the whole of the literature, I found many inconsistent, vague, or even incorrect usages of the term homoeolog. Sometimes people defined homoeologs on the basis of their chromosome pairing patterns. Other times homoeologs were used to describe corresponding genes from different, although closely related species. Many papers said homoeologs were necessarily syntenic. Others don’t define the term at all.

Getting on the same page

These imprecise or incorrect definitions can lead to confusion. In recent years, advances in technology has afforded us the opportunity to sequence many new genomes, including polyploids. All these new techniques and have exploded the amount of data and brought about collaborations between geneticists, molecular biologists, plant breeders, bioinformaticians, phylogeneticists, and statisticians. Therefore we think it’s important to have a precise and evolutionary meaningful definition of homoeology as a reference point.

What we learned

Thus we went back to the earliest usage of the term we could find and synthesizing the literature to date. We define homoeologs as “pairs of genes or chromosomes in the same species that originated by speciation and were brought back together in the same genome by allopolyploidization”. For recent hybrids, as long as there was no rearrangement across subgenomes, homoeologs can be thought of as orthologs between these subgenomes. Here’s how they fit in with other common homologs:

Different types of homologs

We realized that homoeologs are not necessarily one-to-one or syntenic. Depending on the particular patterns of gene duplication and rearrangement in a given species, we may see homoeologs at a 1:many or across non-corresponding chromosomes.

We also reviewed homoeolog inference techniques, starting from low-throughput lab techniques to evolution-based computational methods. Orthology prediction is a booming area of active research, so many orthology inference methods can be applied to homoeology prediction.

Last but not least, we learned that even though homoeolog has alternatively been spelled “homeolog” (no extra o), homoeolog is the clear winner in terms of popularity. The “homoeo—” spelling has been used more than double the amount of times in the literature. Fortunately however, both are pronounced the same (“ho-mee-o-log”)

Check out the review paper in Trends in Plant Science (open access!). We hope this paper can serve as a jump off point for those interested in tackling homoeology, especially for those new to the field.


Glover, N., Redestig, H., & Dessimoz, C. (2016). Homoeologs: What Are They and How Do We Infer Them? Trends in Plant Science DOI: 10.1016/j.tplants.2016.02.005

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

New paper “Clustering Genes of Common Evolutionary History”

• Author: Kevin Gori •

This is how molecular systematics has worked since the sixties: you take some identifiable feature (e.g. a gene or a protein) common to a group of species and take some measurements of it (e.g. sequencing the DNA). By comparing the results of these measurements you can estimate the evolutionary tree that links the species. Shortly after people started doing this they realised there was a problem: when analyses are based different genes they often estimate different —incongruent — evolutionary trees. As technology has become more capable researchers have begun using more and more genes, so this problem of incongruent trees has moved to the foreground.

There have been lots of good ideas of what do about this problem, and this paper is our contribution. We tried to tackle incongruence by designing a method that groups genes together based on how similar their estimated trees are, without any assumption as to how any incongruence came about.

If all the genes more or less agree on the evolutionary tree, then you get one large group; if some disagree, then they are placed in their own groups. The most interesting case is if several genes disagree in the same way, because then you have an effect to try to explain, and you may have discovered something.

TreeCl conceptual diagram

We did lots of simulation to test and refine our method, both in its ability to recognise different incongruent groups, and to estimate how many groups are present. Then, armed with a method that works well on simulation, we tested it on some real data, from yeasts, and from flies.

Our findings were that for the yeast data our method worked really well, and identified 3 distinct groups of genes. The majority of genes were a good fit to the widely accepted tree for the species we looked at. The other two groups showed some major differences, mostly involving two of the species. We had a close look at the data, and concluded that there were some wrong annotations in the data that had introduced sequences that didn’t belong there. This was not the biological result we were looking for, but nonetheless useful.

The flies data were more tricky, as they come from a genus where we aren’t sure how many separate species there are. We produced trees that show better species level resolution than the most recent molecular studies. We also showed high levels of incongruence in the order that the species appear, which can often be the case when species have diverged rapidly, due to a process called incomplete lineage sorting.

So be it to identify artifacts or genuine incongruence among your loci, we think that process-agnostic topology partitioning should become a routine step in phylogenetic analyses. To facilitate this process, we’ve released our code in a new open source software called “treeCl”, available at


Gori K, Suchan T, Alvarez N, Goldman N, & Dessimoz C (2016). Clustering genes of common evolutionary history. Molecular biology and evolution PMID: 26893301

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

Lausanne, Switzerland

• Author: Christophe Dessimoz •

The new academic year brings a big change to our lab. I am moving to the University of Lausanne, Switzerland, on a professorship grant from the Swiss National Science Foundation. The generous funding will enable us to expand our activities on computational methods dealing with mixtures of phylogenetic histories. Lausanne is a hub for life sciences and bioinformatics so we will feel right at home there—indeed we have already been collaborating with several groups there. I join the Center for Integrative Genomics and the Department of Ecology and Evolution. I also look forward to reintegrating the Swiss Institute of Bioinformatics. At a personal level, this marks a return to a region in which I grew up, after 16 years in exile.

However, I keep a joint appointment at UCL, where part of the lab remains. I’ll be flying back regularly and keep some of my teaching activities. UCL is a very special place—one which would be too hard for me to leave entirely. For all the cynicism we hear about universities-as-businesses, the overriding priority at UCL clearly remains on outstanding scholarship. My departments (Genetics, Evolution, Environment and Computer Science) are both highly collegial and supportive. Compared to the previous institutions I have worked for, the organisational culture at UCL is very much bottom-up. The pervasive chaos is perceived as a shortcoming by some, but it’s actually a huge competitive advantage—one that leaves ample room for initiative and flexibility. One colleague once told me that I could build a nuclear reactor in my lab and no one would ask a question—provided I secure the funding for it of course…

So how are we going to manage working in two different sites? Well, the situation is not new. We have had a distributed lab for several years and have developed a system for remote collaboration. Currently, we have lab members primarily based in London, Zurich, Ghent, and Cambridge. Our weekly lab meeting and monthly journal club are done via videoconference (with GoToMeeting). I try to have at least fortnightly 1:1 meetings with all remote members. During the day, the lab stays in touch via instant messaging (using HipChat). We have shared code (git) and data (sshfs) repositories. We tend to write collaborative papers using Google Docs (with Paperpile as reference manager). Importantly, we have a lab retreat every four months where we meet in person, reflect on our work, and have fun. We supplement this with collaborative visits as needed. The system is not perfect—please share your experience if you’ve found other good ways of collaborating remotely—but overall it’s working quite well.

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

Filtering sequence alignment reduces the quality of single-gene trees

• Author: Christophe Dessimoz •

The recent publication of our paper “Current methods for automated filtering of multiple sequence alignments frequently worsen single-gene phylogenetic inference” in Systematic Biology is the conclusion of 5 years of work, most of which was spent in peer-review. I will write a separate post on the issue of pre- vs. post-publication in a later post (update: available here). For now, I summarise our main results, try to provide an intuition for them, and give the story behind the paper.

Does automatic alignment filtering lead to better trees?

One major use of multiple sequence alignments is for tree inference. Because aligners make mistakes, many practitioners like to mask the uncertain parts of the alignment. This is done by hand or using automated tools such as Gblocks, TrimAl, or Guidance.

The aim of our study was to compare different automated filtering methods and assess under which conditions filtering might be beneficial. We compared ten different approaches on several datasets covering hundreds of genomes across different domains of life—(including nearly all of the Ensembl database) as well as simulated data. We used several criteria to assess the impact of filtering on tree inference (comparing the congruence of resulting trees with undisputed species trees and counting the number of gene duplications implied). We sliced the data in many different ways (sequence length, divergence, “gappyness”).

The more we filter alignments, the worse trees become.

In all datasets, tests, and conditions we tried, we could hardly find any situation in which filtering methods lead to better trees; in many instances, the trees got worse:

alignment filtering

Overall, the more alignments get filtered (x-axis in figure), the worse the trees become! This holds across different datasets and filtering methods. Furthermore, under default parameters, most methods filter far too many columns.

The results were rather unexpected, and potentially controversial, so we went to great lengths to ensure that they were not spurious. This included many control analyses and replication of the results on different datasets, and using different criteria of tree quality. We also used simulated data, for which the correct tree is known with certainty.

What could explain this surprising result?

It appears that tree inference is more robust to alignment errors than we might think. One hypothesis for this might be that while alignment errors introduce mostly random (unbiased) noise, correct columns (or partly correct ones) contain crucial phylogenetic signal that can help discriminate between the true and alternative topologies.

But why could this be the case? We are not sure, but here is an idea: aligners tend to have most difficulty with highly distant sequences, because there are many evolutionary scenarios that could have resulted in the same sequences. At the limit, if the distance is very large (e.g. sites have undergone multiple substitions on average), all alignments become equally likely, and it becomes impossible to align the sequences. Also, the variance of the distance estimate explodes. But relative to this enormous variance, the bias introduced by alignment errors becomes negligible.

I stress that we don’t prove this in the paper and this is merely a conjecture (some might call this posthoc rationalisation).

So is filtering an inherently bad idea?

Although alignment filtering does not improve tree accuracy, we can’t say that it is inherently a bad idea. Moderate amounts of filtering did not seem to have much impact—positive or negative—but can save some computation time.

Also, if we consider the accuracy of the alignment themselves, which we did in simulations (such that we know the true alignment), filtering does decrease the proportion of erroneous sites in the aligments (though, of course, these alignments get shorter!). Thus for applications more sensitive to alignment errors than tree inference, such as detection of sites under positive selection, it is conceivable that filtering might, in some circumstances, help. However, the literature on the topic is rather ambivalent (see here, here, here, and here).

Why it took us so long: a brief chronology of the project

The project started in summer 2010 as a 3-week rotation project by Ge Tan, who was a talented MSc student at ETH Zurich at the time (he is now a PhD student at Imperial College London, in Boris Lenhard’s group). The project took a few months more than originally foreseen to complete, but early-on the results were already apparent. In his report, Ge concluded:

“In summary, the filtering methods do not help much in most cases.”

After a few follow-up analyses to complete the study, we submitted a first manuscript to MBE in Autumn 2011. This first submission was rejected after peer-review due to insufficient controls (e.g. lack of DNA alignments, no control for sequence length, proportion of gaps, etc.). The editor stated:

“Because the work is premature to reach the conclusion, I cannot help rejecting the paper at this stage”.

Meanwhile, having just moved to EMBL-EBI near Cambridge UK, I gave a seminar on the work. Puzzled by my conclusions, Matthieu Muffato and Javier Herrero from the Ensembl Compara team set out to replicate our results on the Ensembl Compara pipeline. They saw the same systematic worsening of their trees after alignment filtering.

We joined forces and combined our results in a revised manuscript, alongside additional controls requested by the reviewers from our original submission. The additional controls necessitated several additional months of computations but all confirmed our initial observations. We resubmitted the manuscript to MBE in late 2012 alongside a 10-page cover letter detailing the improvements.

Once again, the paper was rejected. Basically, the editor and one referee did not believe in the conclusions and no amount of controls were going to convince them of the contrary. We appealed. The editor-in-chief rebutted the appeal but now the reason was rather different:

“[Members of the Board of Editors] were not convinced that the finding that the automated filtering of multiple alignment does not improve the phylogenetic inference on average for a single-gene data set was sufficiently high impact for MBE.”

We moved on and submitted our work to Systematic Biology. Things worked out better there, but it nevertheless took another two years and three resubmissions—addressing a total of 147 major and minor points (total length of rebuttal letters: 43 pages)—before the work got accepted. Two of the four peer-reviewers went so far as to reanalyse our data as part of their report—one conceding that our results were correct and the other one holding out until the bitter end.

Why no preprint?

Some of the problem with this slow publication process could would have been mitigated if we had submitted the paper as a preprint. In hindsight, it’s obvious that we should have done so. Initially, however, I did not anticipate that it would take so long. And with each resubmission, the paper was strengthening so I thought during the whole time that it was just about to be accepted… Also, I surely also fell for the Sunk Cost Fallacy.

Other perils of long-term projects

I’ll finish with a few amusing anectodes highlighting the perils of papers requiring many cycles of resubmissions:

  • More than once, we had to redo analyses with new filtering methods that got published after we started the project.
  • At some point, one referee asked why we were using such an outdated version of TCoffee (went from version 5 to version 10 during the project!).
  • The editor-in-chief of MBE changed, and alongside some of the editorial policy and manuscript format (the paper had to be restructured with the method section at the end).


Tan, G., Muffato, M., Ledergerber, C., Herrero, J., Goldman, N., Gil, M., & Dessimoz, C. (2015). Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference Systematic Biology, 64 (5), 778-791 DOI: 10.1093/sysbio/syv033

If you enjoyed this post, you might want to check the other entries of our series “story behind the paper”.

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

Quest for Orthologs 4

• Authors: Ed Chalstrey, Jan Koch, Clement Train & Lucas Wittwer •

On 25-27 May 2015, the lab attended the 4th international ‘Quest for Orthologs’ conference held at Center for Genomic Regulation (CRG) in Barcelona, Spain. The following blog entry is a summary of the experiences had at the conference by Ed Chalstrey, Jan Koch, Clement Train, and Lucas Wittwer, who are interns and master’s students in the Dessimoz lab.


Quest for Orthologs (QfO) is a meeting of groups working on orthology detection and phylogenomic databases, with an aim to improve and standardise orthology predictions. This meeting was part of a series of conferences beginning in 2009, which have successfully brought together a community of researchers with shared goals. These goals included collaboration on benchmarking and the sharing of reference datasets.

As short project students in the group, QfO gave an excellent opportunity for those of us based at UCL to meet some of our colleagues from ETH (in Zurich) and Bayer CropScience (in Ghent) in person for the first time and to make contact with other scientists working in the field of ortholog prediction.

QFO picture 1QFO picture 2

As young scientists, some of the most important questions we face are: Will I be able to explain my project to established scientists and discuss it with them? Will I be able to understand the work of other scientists, even if their research topic falls outside my area of expertise? How can I have new ideas and be inspired to contribute to an area of research I’m new to? For us, most of whom had not attended a conference before, QfO was the perfect place to begin answering these questions.

The conference involved talks from each of the research groups and a poster session for students to display their contributions. Each of the postdocs and PhD students in the Dessimoz lab gave a short talk to introduce their posters, as well as one of us (Clement).

Clement: “The talk and the poster were the great practice for us to increase our communication skills by presenting to an audience composed of experts in related topics. This enabled us to adapt our talks depending of the kind of people we had in front of us and exchange ideas with other people during a constructive conversation. Also, attending talks on the many fields related to our work (orthology) was an amazing experience as interns, both in discovering new things and helping us in our own project with new ideas and other ways of thinking.”

QfO was a great opportunity for us to meet scientists that have worked in the field for many years and from all over the world. We were able to benefit from their experience and the advice they gave us after talking with them about our own research projects, gaining a different perspective to that of our usual supervisors and colleagues. One of the discussions had by Jan with two researchers from Switzerland may even lead to a potential future collaboration; they were interested in DLIGHT, a program that was developed by our group.

As well as discussing our current work, the conference also gave us the chance to think about future work opportunities and network with established scientists. One of the highlights for us was meeting Eugene Koonin and Sergei Mekhedov from the NCBI at the conference dinner. We had an amusing chat (about topics not necessarily related to orthology!) and an enjoyable evening. They even invited us to visit them at the NCBI!

All in all, we greatly benefited from our participation in the QfO conference.

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

Topic Pages: a bridge between academia and Wikipedia

• Author: Christophe Dessimoz •

We have published our latest review, on methods to infer horizontal gene transfer, in Wikipedia. It was peer-reviewed and it simultaneously appeared in PLOS Computational Biology. After our review on approximate Bayesian computations published two years ago, this is our second contribution using an exciting new format called “Topic Page”. In this post, I reflect on our motivation and experience as Topic Page authors.

The difficult relationship between academia and Wikipedia

Academia has mixed feelings about Wikipedia. Although many academics—and certainly many students—consult Wikipedia frequently, I’d venture to say that most remain reluctant to cite Wikipedia or admit relying on it otherwise.

As for academics contributing to Wikipedia, things are even worse. As a result, the quality of Wikipedia articles on scientific topics is often quite poor.

I think there are two main reasons for this reluctance to contribute. First, the lack of clear authorship and therefore credit makes it difficult for scientists to get recognition for contributing to Wikipedia. Given the intense competitiveness of contemporary science, this is more than a vanity issue; recognition is tightly coupled with funding and job success—i.e. survival in the profession. But perhaps just as importantly, many scientists are unfamiliar with Wikipedia’s conventions and practices and are thus (rightly!) concerned that their contributions might be “diluted” by further edits by others or even flat out turned down. I know of several disgruntled people who have given up on editing Wikipedia because of such bad experiences.

Simson Garfinkel provides an illuminating account of this sort of tensions in this article:

“I have attempted to retire from directing films in the alternative universe that is the Wikipedia a number of times, but somebody always overrules me,” Lanier wrote. “Every time my Wikipedia entry is corrected, within a day I’m turned into a film director again.”

Since Lanier’s attempted edits to his own Wikipedia entry were based on firsthand knowledge of his own career, he was in direct violation of Wikipedia’s three core policies. He has a point of view; he was writing on the basis of his own original research; and what he wrote couldn’t be verified by following a link to some kind of legitimate, authoritative, and verifiable publication.

For the tertiary source Wikipedia aims to be, these core policies are entirely reasonable but it’s easy to imagine situation where they might frustrate some contributors.

Wikipedia’s tremendous impact

It’s however worth considering the upsides of contributing to Wikipedia. For all the obsessions many of us have about publishing articles in generalist journals with broad readership, the lack of interest in Wikipedia feels like a missed opportunity. Consider the wikipedia page on Phylogenetics. It was consulted over 50,000 in the last 3 months alone. This is over twice as much as the median number of views of papers published in, say, the 17 Oct 2013 issue of Nature in a quarter of the time (as it happens, this particular issue has “impact metrics” as its cover story…).

PLOS Topic Pages

Fortunately, the good folks at PLOS Computational Biology have worked out a great solution to this conundrum: the “Topic Page”. In short, authors contribute Wikipedia-style articles on topics not or only poorly covered in Wikipedia. These get peer-reviewed and published in the journal with attribution, a DOI, and all the bells and whistles that come with journal articles. But in addition, the page gets incorporated into Wikipedia, where it starts a new life.

This format solves the problem discussed above. Authors get credit for their work in a way that fits well to existing structures. There is a permanent record of the contribution, indexed in scholarly databases such as PubMed, Google Scholar, etc. The contribution benefits from additional feedback from the peer-review and editorial process. And, perhaps most importantly, the authors can relinquish control over their work—for better or worse—with the reassurance that an unadulterated version of their work will remain available no matter what.

Our experience publishing Topic Pages

So far, we have published two Topic Pages: one on Approximate Bayesian Computations (paper, peer-reviews, Wikipedia page) and one on inferring horizontal gene transfer (paper, peer-reviews, Wikipedia page).

We’ve been pleasantly surprised by the excellent reception of the ABC article. It has been viewed over 26,000 times on the PLOS site alone. It’s also consulted a few thousand times every month on Wikipedia.

Remarkably, since the article was publicly accessible while we were drafting it on the PLOS Topic Page wiki, it had already accumulated over 10,000 views even before publication (see counter at bottom of this page). It was also picked up by a prominent’s statistician’s blog.

But just as importantly, the editorial process itself was great. Editing the manuscript on the PLOS Topic Page wiki provided a natural environment for collaborative writing. The wiki-based, open peer-review process yielded constructive and timely reviews (we could start addressing referee reports as they rolled in!). Our editor Daniel Mietchen was helpfully hands-on and did a substantial number of edits directly on the manuscript itself.

The only caveat I can think of is that the neutral, factual, impersonal, intemporal style of Wikipedia articles is quite different from the type of review articles I am otherwise used to. This is definitely not the right outlet for opinion-type pieces!

On the other hand, this format is great for student work. In fact, both of our Topic Pages started as student assignment in my course Reviews in Computational Biology. That being said, although the course gave the initial impetus, in both cases extensive additional work (and the involvement of additional co-authors) was required to get them published.

What happened since?

As it’s been two years since we published the ABC review, we can start to discern some outcomes.

The Wikipedia version underwent 46 changes, all minor modifications or additions (typo corrections, additional links and “Wikifications”, additional entries in the list of relevant software packages, attempts to sneak in one’s own contributions, … the usual stuff).

It is also gratifying to see our work appearing as first hit in Google. Since the publication in early 2013, it’s already been cited over 50 times.

Way of the future

In conclusion, the Topic Page is a great format. I am surprised that only eight Topic Pages have been published thus far, but perhaps there is still a lack of awareness about the format. I hope that this blog post will inspire some readers to improve Wikipedia by contributing a Topic Page. We are certainly thinking of our Topic Page number three…


Ravenhall M, Škunca N, Lassalle F, & Dessimoz C (2015). Inferring horizontal gene transfer. PLoS computational biology, 11 (5) PMID: 26020646 doi:10.1371/journal.pcbi.1004095 link to Wikipedia page.

Sunnåker M, Busetto AG, Numminen E, Corander J, Foll M, & Dessimoz C (2013). Approximate Bayesian computation. PLoS computational biology, 9 (1) PMID: 23341757 doi:10.1371/journal.pcbi.1002803 link to Wikipedia page

If you enjoyed this post, you might want to check the other entries of our series “story behind the paper”.

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

OMA Visiting Fellowship

• Author: Christophe Dessimoz •

Are you working on a project that could benefit from collaboration with the OMA team? Join us as OMA Visiting Fellow:

Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

A slew of computational biology PhD positions across Europe (updated)

• Author: Christophe Dessimoz •

It’s hiring season and there are many interesting opportunities for prospective PhD students. Here is a list of some of the opportunities that were brought to my attention. Please get in touch if you have any other pointers.

UCL (including our lab):

Rest of UK:




Share or comment:

To be informed of future posts, sign up to the low-volume blog mailing-list, subscribe to the blog's RSS feed, or follow us on Twitter.

Creative Commons 
            License The Dessimoz Lab blog is licensed under a Creative Commons Attribution 4.0 International License.
Last modified on September 19th, 2018.