Model Selection

Plot of distributions along path from prior to posteriorIn collaboration with Ming-Hui Chen and Lynn Kuo in the UConn Statistics department, my lab has worked on the problem of estimating the marginal likelihood, a quantity used to compare models. The marginal likelihood measures the average fit of a model to a data set; the model with the best average fit is deemed best. Estimating the marginal likelihood accurately is not easy, however. The method we came up with is called the stepping-stone method.

The figure on the left shows a simple example involving only two sequences and a model with 2 parameters: the edge length (ν) and transition transversion rate ratio (κ). The stepping-stone method uses a series of probability distributions ranging from the posterior distribution to a reference distribution (here the prior distribution). The prior distribution in a Bayesian analysis represents our belief in the values of the model parameters without reference to the sequences we’ve observed, while the posterior represents our belief after the sequences have been taken into account. The marginal likelihood is related to both the prior and the posterior. The marginal likelihood is a weighted average of the likelihood (probability of the data given the model), and the prior distribution provides the weights. The marginal likelihood also represents the constant that is used to normalize the posterior distribution (i.e. scale it so that it represents a proper probability distribution). The stepping-stone method uses samples from both prior and posterior as well as several distributions that lie between to provide an accurate estimate of the marginal likelihood. See the Xie et al. (2011) and Fan et al. (2011) papers for a more complete explanation.

References:

Xie, W., P. O. Lewis, Y. Fan, L. Kuo, and M.-H. Chen. 2011. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology 60:150-160.

Fan, Y., R. Wu, M.-H. Chen, L. Kuo and P. O. Lewis. 2011. Choosing among partition models in Bayesian phylogenetics. Molecular Biology and Evolution 28(1):523-532.[pdf]