Author: pol02003

Paul O. Lewis

Photo of Paul LewisPaul is an Associate Professor in the Ecology and Evolutionary Biology Department. He obtained his B.S. in Biology and Mathematics from Georgetown College in 1982, M.S. in Biology from Memphis State University in 1984, and Ph.D. in Plant Biology from Ohio State University in 1991. He was a postdoc at North Carolina State University with Bruce Weir, and at the Laboratory of Molecular Systematics, Smithsonian Institution, with David Swofford (now at Duke University). Paul came to UConn after 3 years as an Assistant Professor in the Biology Department at the University of New Mexico.

Tel: (860) 486-2069
Fax: (860) 486-6364
E-mail: paul.lewis@uconn.edu

Suman Neupane

Neupane_SumanSuman started working toward his Ph.D. here in January, 2013, after completing much of his dissertation research at Old Dominion University. He studies flowering plants related to the genus Hedyotis (family Rubiaceae, the coffee family), which those in the eastern United States will recognize as the small blue flowers known as bluets carpeting roadsides and lawns in early spring. Hedyotis has a wide global distribution, however, and in addition to resolving the phylogeny and biogeography of this group Suman is interested in the tendency of Hedyotis to evolve secondary woodiness in tropical montane islands. which mimics the evolution of secondary woodiness common in oceanic island systems.

E-mail: suman.neupane@uconn.edu

Model Selection

Plot of distributions along path from prior to posteriorIn collaboration with Ming-Hui Chen and Lynn Kuo in the UConn Statistics department, my lab has worked on the problem of estimating the marginal likelihood, a quantity used to compare models. The marginal likelihood measures the average fit of a model to a data set; the model with the best average fit is deemed best. Estimating the marginal likelihood accurately is not easy, however. The method we came up with is called the stepping-stone method.

The figure on the left shows a simple example involving only two sequences and a model with 2 parameters: the edge length (ν) and transition transversion rate ratio (κ). The stepping-stone method uses a series of probability distributions ranging from the posterior distribution to a reference distribution (here the prior distribution). The prior distribution in a Bayesian analysis represents our belief in the values of the model parameters without reference to the sequences we’ve observed, while the posterior represents our belief after the sequences have been taken into account. The marginal likelihood is related to both the prior and the posterior. The marginal likelihood is a weighted average of the likelihood (probability of the data given the model), and the prior distribution provides the weights. The marginal likelihood also represents the constant that is used to normalize the posterior distribution (i.e. scale it so that it represents a proper probability distribution). The stepping-stone method uses samples from both prior and posterior as well as several distributions that lie between to provide an accurate estimate of the marginal likelihood. See the Xie et al. (2011) and Fan et al. (2011) papers for a more complete explanation.

References:

Xie, W., P. O. Lewis, Y. Fan, L. Kuo, and M.-H. Chen. 2011. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology 60:150-160.

Fan, Y., R. Wu, M.-H. Chen, L. Kuo and P. O. Lewis. 2011. Choosing among partition models in Bayesian phylogenetics. Molecular Biology and Evolution 28(1):523-532.[pdf]

Prospective Students

View of campus from New Storrs Cemetery
View of UConn campus from New Storrs Cemetery

Find yourself drawn to both biology and mathematics? I was. As an undergraduate, I couldn’t decide whether I wanted to be a biologist or a mathematician, so I majored in both math and biology and ended up with a very satisfying career that combines the two!

If you have a love for biology and an itch for mathematics and/or computer programming, please send me an email (paul.lewis@uconn.edu). You needn’t worry if you do not yet have formal training; that’s what graduate school is all about! The important thing is to have a natural curiosity and a willingness to work hard. UConn is recognized for its excellent programs in both evolutionary biology and statistics. Getting a degree here provides the opportunity to work with the best scientists and receive formal training in both fields.

My higher degrees (M.S. and Ph.D.) are both in Plant Systematics, so I would also entertain the idea of a graduate student working on a purely botanical M.S. or Ph.D. project. The UConn EEB department is remarkably deep with respect to botanical expertise, with faculty specializing in green algae (L. Lewis), bryology (Goffinet), angiosperms (Les, Anderson, Coe), morphology, development and evolution (Jones, DiggleSchlichting), genomics and evolutionary genetics (Yuan), population and quantitative genetics (Holsinger), and computational genomics (Wegrzyn). So if you like plants, especially the evolutionary history of plants, please feel free to write to me and begin a conversation.

Storrs, Connecticut, is a clean, safe, and beautiful place to live. UConn as long had a reputation for being rather isolated, surrounded by farmland and forest with a paucity of restaurants and, well, stores. The farmland and forest part is still largely true, but in 2013 a major development project came to fruition resulting in an veritable concentration of stores and restaurants in Storrs Center. Storrs now has a very active Main Street that is still unfolding!

Photo of Jacobson Barn, UConn campus, Storrs, CT
The Jacobson Barn is a UConn landmark.

Bayesian Star Tree Paradox

startreeIf sequence data are simulated using a 4-taxon star tree (such as the one shown on the left) and evaluated with standard software tools for Bayesian phylogenetic inference, one of the 3 possible fully-resolved trees is often supported very strongly. This is paradoxical in that most people expect the three possible resolutions to be equally supported in this case, but such an outcome is only seen when the sequence length is tiny (e.g. 1 site). It appears that uncertainty in this case is manifested in the inability to predict, from dataset to dataset, which of the 3 possible fully-resolved tree topologies will be favored. This behavior is troubling, and possible examples of this behavior have been pointed out by several researchers. Many more potential examples can be found in the literature by looking for high posterior probabilities but low bootstrap support, combined with tiny internal edges.

We argue that the central problem here is the non-identifiability of the tree topology, and propose a solution using reversible-jump MCMC. Our rjMCMC sampler visits not only fully-resolved tree topologies, but can visit topologies containing hard polytomies as well. This effectively places a point mass prior probability on polytomies, providing an alternative in situations in which a fully-resolved topology is not a reasonable option. The analysis can be made as conservative as desired by modifying the prior distribution assumed for topologies, but in our (albeit limited) experience it does not appear easy to destroy support for real edges by using a prior that strongly supports polytomous topologies.

Reference: Lewis, P. O., Holder, M. T., and Holsinger, K. E. 2005. Polytomies and Bayesian phylogenetic inference. Systematic Biology 54(2): 241-253

Phylodiversity in Desert Green Algae (the other land plants)

A major thrust in the laboratory of Louise Lewis is diversity and systematics of green algae (Phylum Chlorophyta) living in the soils of North American deserts. These unicellular green algae are capable of tolerating the harsh conditions posed by desert soil environments, and represent an important (yet not well understood) component of desert microbiotic crust communities. The 18S rDNA sequences of a number of green algal isolates have been determined, and these data suggest that several lineages of green algae have diversified within deserts. One might be tempted to think that the green algal cells isolated from desert soils are simply the result of spores dispersed into deserts from distant aquatic sources. This study shows that the 18S sequences of these desert isolates are more divergent from their nearest aquatic relatives than would be predicted if they were merely incidental visitors. We characterize the molecular phylodiversity of desert green algae and demonstrate with a Bayesian analysis of 150 green algal 18S sequences that all freshwater classes of green algae have yielded desert lineages. The numerous transitions from desert to aquatic existence apparent from the phylogeny argue that it is no longer accurate to portray land plants as resulting from a single origin. The highly celebrated origin leading to the embryophytes is but one of many transitions to terrestriality.

Reference: Lewis, L. A., and Lewis, P. O. 2005. Unearthing the molecular phylodiversity of desert soil green algae (Chlorophyta). Systematic Biology 54(6): 936-947.