Bayesian Star Tree Paradox

startreeIf sequence data are simulated using a 4-taxon star tree (such as the one shown on the left) and evaluated with standard software tools for Bayesian phylogenetic inference, one of the 3 possible fully-resolved trees is often supported very strongly. This is paradoxical in that most people expect the three possible resolutions to be equally supported in this case, but such an outcome is only seen when the sequence length is tiny (e.g. 1 site). It appears that uncertainty in this case is manifested in the inability to predict, from dataset to dataset, which of the 3 possible fully-resolved tree topologies will be favored. This behavior is troubling, and possible examples of this behavior have been pointed out by several researchers. Many more potential examples can be found in the literature by looking for high posterior probabilities but low bootstrap support, combined with tiny internal edges.

We argue that the central problem here is the non-identifiability of the tree topology, and propose a solution using reversible-jump MCMC. Our rjMCMC sampler visits not only fully-resolved tree topologies, but can visit topologies containing hard polytomies as well. This effectively places a point mass prior probability on polytomies, providing an alternative in situations in which a fully-resolved topology is not a reasonable option. The analysis can be made as conservative as desired by modifying the prior distribution assumed for topologies, but in our (albeit limited) experience it does not appear easy to destroy support for real edges by using a prior that strongly supports polytomous topologies.

Reference: Lewis, P. O., Holder, M. T., and Holsinger, K. E. 2005. Polytomies and Bayesian phylogenetic inference. Systematic Biology 54(2): 241-253