Phylogenetic Software Development Tutorial (version 2)

This tutorial teaches you how to create C++ software that performs Bayesian phylogenetic analyses. This tutorial was written primarily to help students in my laboratory to develop software that they can later modify for their own purposes, but I hope it is more broadly useful.

This is the second version of the tutorial. It is being developed throughout Fall 2017, so you may find that pages change weekly to reflect corrections and additions made as we work out problems and add features. The first version is still available here, but I strongly recommend using version 2 (it covers the same material as version 1 and yet is less complicated).

This second version is similar to the first version in reading tree files and sequence data file, implementing Bayesian MCMC, and showing how to use BEAGLE-LIB to compute the phylogenetic likelihood, but differs substantially from the first in several ways:

  • It does away with most of the templates that made the first version difficult to read and understand.
  • It uses nice C++11 features such as built-in shared pointers, regular expressions, and more concise for loops, which make for less reliance on the Boost C++ libraries.
  • It uses the Nexus Class Library (NCL) to read data files. The original tutorial used regular expressions to parse data files, but that approach assumes a certain regularity in how data files are put together; a regularity that doesn’t really exist in the real world. Mark Holder has turned the NCL into a really amazing library that, in turns, will allow us to write programs that are quite robust in terms of data file input formats.

Before diving in…

All tutorials must make some assumptions about the student’s background. This tutorial is designed for those with some background in C++ programming and thus does not explain in detail how C++ works. If you have never written anything in C++, you will still end up with a working program and may, given sufficient motivation, find it a useful vehicle for learning C++. The tutorial uses features of C++11.

This tutorial will be most useful to a biologist who has some prior experience using Bayesian phylogenetics software such as MrBayes/RevBayes or BEAST and who is interested in developing new phylogenetic methods/models not yet implemented in these programs.

License

The software that you will create falls under the permissive open-source MIT License.

Funding

This tutorial was developed as a broader impact project associated with National Science Foundation grant DEB-1354146 (Estimating the Bayesian phylogenetic information content of systematic data, PI Paul O. Lewis).

Get started!

The menu in the sidebar on the right will appear on every page of the tutorial and list each step in order. Use the sidebar menu to navigate through the tutorial, starting with Create a C++ project, which will help you set up a development environment on either Windows or Mac.