The software offered below was written by Paul O. Lewis unless otherwise indicated. Most are useful for teaching concepts in statistics or phylogenetics, and all are free for downloading.
Note that unless otherwise indicated these programs are offered AS IS with absolutely NO WARRANTY of any kind. In fact, I find it is very hard to find time to keep most of these in a working state given the pace with which Java, iOS, and other operating systems evolve. Please feel free to write to me (firstname.lastname@example.org) if you find something is broken, and I’ll do my best to fix it as soon as I can.
This is a free app for iPad/iPhone that illustrates the basic principles of Markov chain Monte Carlo using the metaphor of a robot following simple rules to walk around on a landscape and, in the process, learning about the topography of the landscape. For the iOS version, P. Lewis acknowledges the support provided by NSF grant DEB-1036448 (GrAToL).
Note: an older Windows version of this app is available below or from the MCMC Robot home page.
This is a tutorial showing how to create a functioning Bayesian phylogenetics application in C++. It is designed for graduate students who need a base program that they understand and which can be easily modified to implement new models or methods.
This updated version of the tutorial adds data partitioning, invariable sites models, and a codon model to the repertoire of the program created. This version first went online 31-Oct-2019.
Program by Paul O. Lewis that estimates topological information content from a tree file (or list of tree files) representing a sample from the posterior distribution generated by a Bayesian phylogenetic analysis. The analysis performed by Galax is described in the following manuscript:
Lewis, P. O., M.-H. Chen, L. Kuo, L. A. Lewis, K. Fucikova, S. Neupane, Y.-B. Wang, D. Shi. Estimating Bayesian phylogenetic information content. Accepted in Systematic Biology. Download the advance access version.
Important! The methods outlined in the above paper and implemented in the Galax software are most useful for problems involving fewer than 12 taxa. Please give Table 2 in the paper your full attention before using the software on your own data, especially if you suspect information content is low. We are working on a more general solution that will more accurately measure information content.
Download Galax v1.0.0
Program by Paul O. Lewis, Mark T. Holder and David L. Swofford that performs Bayesian phylogenetic analyses. Specializes in marginal likelihood estimation and model selection, allows data partitioning and tree space including unresolved (polytomous) tree topologies. Phycas is free and open-source, written primarily in C++ but has a Python 2.x interface. Versions are available for Windows and MacOS, and it can be compiled for Linux.
Program by Kent E. Holsinger and Paul O. Lewis that implements the Bayesian method described in Holsinger (1999) for estimating F-statistics co-dominant marker data and the method described in Holsinger et al. (2002) for estimating F-statistics from dominant marker data. It also includes routines to allow posterior comparisons as described in Holsinger and Wallace (2004). Hickory is free and open-source, written in C++, using the wxWindows library for cross-platform compatibility.
Holsinger, K. E. 1999. Analysis of genetic diversity in geographically structured populations: a Bayesian perspective. Hereditas 130:245-255.
Holsinger, K. E., P. O. Lewis, and D. K. Dey. 2002. A Bayesian approach to inferring population structure from dominant markers. Molecular Ecology 11(7):1157-1164. [pdf]
Holsinger, K. E., and L. E. Wallace. 2004. Bayesian approaches for the analysis of population structure: an example from Platanthera leucophaea (Orchidaceae). Molecular Ecology 13:887-894. [pdf]
Program by Paul O. Lewis and Dmitri Zaykin designed to accompany the book “Genetic Data Analysis” by Bruce S. Weir (1996, Sinaur Associates). Computes linkage and Hardy-Weinberg disequilibrium, some genetic distances, and provides method-of-moments estimators for hierarchical F-statistics. On 11 January 2008 I changed the download format from self-extracting zip archive to a simple zip archive. Let me know if this causes problems. The new zip file contains an additional example data file (fbi99.nex) included at the request of Bruce Weir to accompany his forthcoming review paper.
GDA has a graphical user interface (GUI) that works under Windows only, but Chris Basten has compiled a command-line-only version of GDA that runs under Mac OS 10.2.8 and 10.3 (Jaguar and Panther). This version can be downloaded here. After downloading, you should open a terminal window, navigate to the folder containing the file, and type “chmod +x gda1.1” to make GDA executable.
Weir, B. S. 1996. Genetic Data Analysis. 2nd ed. Sinauer Associates, Sunderland, Massachusetts. 376 pages.
Source Code Libraries
NCL is a C++ class library for reading data files formatted in the NEXUS file format common to several phylogenetic analysis software applications. The original version was written by me in the 1990s, but the current version has been nearly completely rewritten (and improved tremendously in the process) by Mark T. Holder. Mark’s version is now used in several phylogenetic analysis programs, including Garli, RevBayes and Phycas.
Lewis, P. O. 2003. NCL: a C++ class library for interpreting data files in NEXUS format. Bioinformatics 19 (17): 2330-2331.
Although I played a very small part in developing this fantastic resource, I include it here to advertise its availability. This library allows one to write software for maximum likelihood or Bayesian phylogenetics in either C or C++ without needing to write code to compute the likelihood! Better yet, it allows your software to make use of Graphical Processing Units (GPUs) if available to parallelize the computation of the likelihood.
Ayres, D. L., A. Darling, D. J. Zwickl, P. Beerli, M. T. Holder, P. O. Lewis, J. P. Huelsenbeck, F. Ronquist, D. L. Swofford, M. P. Cummings, A. Rambaut and M. A. Suchard. 2011. BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics. Systematic Biology 61(1):170–173. [pdf]
These Java applets are now quite old, having been written back in the days when there was only one or two books of Java in the local (now extinct) Borders book store. Some of them (e.g. Osmosis and Diffusion) are quite silly, so don’t try to make more of these than is intended.
|Illustrates the fact that sums of just about anything are approximately normally distributed (for those of us who really want to believe the Central Limit Theorem, but who need to see it to believe it)|
|Animates the long branch attraction example in Joe Felsenstein’s classic 1978 paper entitled “Cases in which parsimony or compatibility methods will be positively misleading”|
|Illustrates the Brownian motion model used by Joe Felsenstein in his classic 1985 paper “Phylogenies and the comparative method” (the paper that introduced his Phylogenetically Independent Contrasts)|
|Illustrates concept of a semipermeable membrane (smaller particles diffuse across barrier but larger ones cannot)|
|Simulates crossing over between homologous chromatids and is designed to aid in understanding the notion of the recombination fraction and its use in constructing linkage maps|
Microsoft® Windows® Applications
These are also quite old, but still useful if you use Windows.
Teaching application useful for illustrating the adaptive rejection sampling approach used for gibbs sampling in Bayesian statistics.
Chi-square significance probability calculator.
Illustrates Bayesian concepts of prior and posterior densities for a simple coin flipping example.
Graphically illustrates the concept of inbreeding coefficient.
Illustrates the basic principles of Markov chain Monte Carlo simulation, using arbitrary, user-defined landscapes composed of one or more bivariate normal densities. Note: this is the Windows version, for the newer iPad/iPhone version, see the MCMC Robot web site.
A calculator useful in demonstrating the estimation of evolutionary distances using the method of maximum likelihood under four simple substitution models.
Graphically illustrates the concept of population differentiation due to isolation and genetic drift.