Test Likelihood

A tree

To test the Likelihood class, we need a data set and a tree. You have already created a data file named rbcL.nex. Here is a tree file containing the maximum likelihood tree for the data in rbcL.nex. The log-likelihood of this tree under the JC69 model is -286.9238. Create a file containing the text below and name it rbcLjc.tre:


Begin trees;  [Treefile saved Monday, October 26, 2015 at 7:44:59 PM Eastern Daylight Time]
>Data file = rbcL.nex
>Heuristic search settings:
>  Optimality criterion = likelihood
>    Likelihood settings:
>                      Substitution types = 1
>                       State frequencies = equal
>          Proportion of invariable sites = none
>                 Rates at variable sites = equal
>                    Model correspondence = JC69
>      Number of distinct data patterns under this model = 23
>      Molecular clock not enforced
>      Starting branch lengths obtained using Rogers-Swofford approximation method
>      Trees with approximate log likelihoods 5% or further from the target score are rejected without additional iteration
>      Branch-length optimization = one-dimensional Newton-Raphson with pass limit=20, tolerance=1e-07
>      Likelihood calculations performed in single precision
>      Vector processing enabled
>      Conditional-likelihood rescaling threshold = 1e-20
>  Starting tree(s) obtained via stepwise addition
>    Addition sequence: as-is
>    Number of trees held at each step = 1
>  Branch-swapping algorithm: tree-bisection-reconnection (TBR) with reconnection limit = 8
>    Steepest descent option not in effect
>  Initial 'Maxtrees' setting = 1100 (will be auto-increased by 100)
>  Branches collapsed (creating polytomies) if branch length is less than or equal to 1e-08
>  'MulTrees' option in effect
>  No topological constraints in effect
>  Trees are unrooted
>Heuristic search completed
>  Total number of rearrangements tried = 1632
>  Score of best tree(s) found = 286.9238
>  Number of trees retained = 1
>  Time used = 0.14 sec (CPU time = 0.14 sec)
		1 Atractomorpha_echinata,
		2 Bracteacoccus_aerius,
		3 Bracteacoccus_minor,
		4 Chlorotetraedron_incus,
		5 Chromochloris_zofingiensis,
		6 Kirchneriella_aperta,
		7 Mychonastes_homosphaera,
		8 Neochloris_aquatica,
		9 Ourococcus_multisporus,
		10 Pediastrum_duplex,
		11 Pseudomuriella_schumacherensis,
		12 Rotundella_rotunda,
		13 Scenedesmus_obliquus,
		14 Stigeoclonium_helveticum
tree 'PAUP_1' = [&U] (1:0.052781,(((((((((2:1.00000e-08,3:0.051745):0.016986,5:0.070562):0.017072,(6:0.047518,11:0.067139):0.025263):0.016986,9:1.00000e-08):0.016955,4:0.034406):0.016955,13:1.00000e-08):1.00000e-08,7:0.051745):0.016955,12:0.016955):0.034406,(8:1.00000e-08,10:1.00000e-08):1.00000e-08):0.052781,14:0.071604);

The main function

Here is a revised main.cpp that will compute the log-likelihood of the tree above under the JC69 model:

#include <iostream>
#include "node.hpp"
#include "tree.hpp"
#include "data.hpp"
#include "tree_summary.hpp"
#include "likelihood.hpp"

using namespace strom;

int main(int argc, const char * argv[])
    std::cout << "Starting..." << std::endl;

        // Read and store data
        Data::SharedPtr d(new Data());

        // Create a likelihood object that will compute log-likelihoods
        Likelihood<Node>::SharedPtr likelihood(new Likelihood<Node>());

        // Read in a tree
        TreeSummary<Node>::SharedPtr tree_summary(new TreeSummary<Node>());
        tree_summary->readTreefile("rbcLjc.tre", 0);
        Tree<Node>::SharedPtr tree = tree_summary->getTree(0);

        // Calculate the log-likelihood for the tree
        double lnL = likelihood->calcLogLikelihood(tree);
        std::cout << boost::str(boost::format("log likelihood = %.5f") % lnL) << std::endl;
        std::cout << "      (expecting -286.9238)" << std::endl;
    catch (XStrom x)
        std::cerr << "Strom encountered a problem:\n  " << x.what() << std::endl;

    std::cout << "\nFinished!" << std::endl;

    return 0;

The main function first creates a Likelihood object and creates a shared pointer named likelihood pointing to the new object. It then calls the Likelihood::availableResources function to show a summary of resources available to BeagleLib (even though we will only be using the default CPU plugin of BeagleLib).

Next, a shared pointer named d is created pointing to a new Data object, and the Data::getDataFromFile function is used to read the sequences in rbcL.nex.

Next, a TreeSummary object is created and its readTreefile function is called to read the tree file rbcLjc.tre that you just created. A shared pointer named tree to a Tree is created to point to the first (and only) tree read from rbcLjc.tre.

Finally, the init, setTipStates, setPatternWeights, and calcLogLikelihood functions of the Likelihood object are called to obtain the log-likelihood.

Most of the function is wrapped in a try block. If anything goes wrong and an XStrom exception is thrown, the program will jump immediately to the catch block and the message stored by the exception object will be displayed.

If all goes well when you run the program, you should see the following output:

Constructing a TreeSummary
Sequence length:    60
Number of taxa:     14
Number of patterns: 23
log likelihood = -286.92376
      (expecting -286.9238)
Destroying a TreeSummary