The Likelihood class

Not yet finished: please avoid until this notice goes away

By the time you are finished with this lesson, you will be able to read in a tree and calculate the log-likelihood of this tree under the simplest substitution model (the Jukes-Cantor, or JC69, model). Create a new header file named likelihood.hpp containing
the following code.

#pragma once

#include <map>
#include <boost/algorithm/string.hpp>
#include <boost/format.hpp>
#include <boost/shared_ptr.hpp>
#include "libhmsbeagle/beagle.h"
#include "data.hpp"
#include "xstrom.hpp"

namespace strom {

class Likelihood
    {
    public:
                                    Likelihood();
                                    ~Likelihood();
        
        std::string                 availableResources();

        double                      calcLogLikelihood(typename Tree::SharedPtr t);

        void                        setData(Data::SharedPtr d);
        Data::SharedPtr             getData();

    private:

        void                        initBeagleLib();
        void                        setTipStates();
        void                        setPatternWeights();
        void                        setDiscreteGammaShape();
        void                        setModelRateMatrix();
        void                        defineOperations(typename Tree::SharedPtr t);
        void                        updateTransitionMatrices();
        void                        calculatePartials();

        int                         _instance;
        std::map<int, std::string>  _beagle_error;
        std::vector<int>            _operations;
        std::vector<int>            _pmatrix_index;
        std::vector<double>         _edge_lengths;

        Data::SharedPtr             _data;
        unsigned                    _ntaxa;
        unsigned                    _nstates;
        unsigned                    _npatterns;
        bool                        _rooted;
        bool                        _prefer_gpu;

    public:
        typedef std::shared_ptr< Likelihood > SharedPtr;
    };

// member function bodies go here

} // namespace strom

Constructor

The constructors we have created for most classes do not really do anything except report that an object of that type was created. The constructor for the Likelihood class is a little more complex because it creates a vector of possible error messages that could be issued by BeagleLib.

inline Likelihood::Likelihood()
    {
    _instance   = -1;
    _ntaxa      = 0;
    _nstates    = 0;
    _npatterns  = 0;
    _rooted     = false;
    _prefer_gpu = false;

    // store BeagleLib error codes so that useful 
    // error messages may be provided to the user
    _beagle_error[0]  = std::string("success");
    _beagle_error[-1] = std::string("unspecified error");
    _beagle_error[-2] = std::string("not enough memory could be allocated");
    _beagle_error[-3] = std::string("unspecified exception");
    _beagle_error[-4] = std::string("the instance index is out of range, or the instance has not been created");
    _beagle_error[-5] = std::string("one of the indices specified exceeded the range of the array");
    _beagle_error[-6] = std::string("no resource matches requirements");
    _beagle_error[-7] = std::string("no implementation matches requirements");
    _beagle_error[-8] = std::string("floating-point range exceeded");
    }

The constructor first initializes the _instance data member with the value -1. The value of _instance will remain -1 until a valid BeagleLib instance is available, in which case it will be set by BeagleLib to a non-negative integer. For this reason, a negative value stored in _instance indicates that no BeagleLib instance has been created.

The _beagle_error map simply stores BeagleLib error codes (cryptic negative integers) and associates them with meaningful messages so that our users have some chance of knowing what went wrong if BeagleLib fails. The constructor fills the map with all possible error codes so that we are ready for any error code that BeagleLib can spit out.

Destructor

Here is the destructor.

inline Likelihood::~Likelihood()
    {
    if (_instance >= 0)
        {
        int code = beagleFinalizeInstance(_instance);
        if (code != 0)
            throw XStrom(boost::str(boost::format("Likelihood destructor failed to finalize BeagleLib instance. BeagleLib error code was %d (%s).") % code % _beagle_error[code]));
        }
    }

The destructor calls the function beagleFinalizeInstance to tell BeagleLib that our program is shutting down and its services will no longer be needed. If for some reason BeagleLib cannot shut down cleanly, an exception is thrown so that the user is informed.

The getData and setData member functions

These functions allow you to associate a Data object with this Likelihood object. Note that changing the data set will require us to delete ("finalize" in BeagleLib terminology) the existing BeagleLib instance and create a new one.

inline Data::SharedPtr Likelihood::getData()
    {
    return _data;
    }

inline void Likelihood::setData(Data::SharedPtr data)
    {
    _data = data;
    if (_instance >= 0)
        {
        // initBeagleLib function was previously called, so 
        // finalize existing instance and create new one
        int code = beagleFinalizeInstance(_instance);
        if (code != 0)
            throw XStrom(boost::str(boost::format("Likelihood setData function failed to finalize BeagleLib instance. BeagleLib error code was %d (%s).") % code % _beagle_error[code]));
        _instance = -1;
        assert(_ntaxa > 0 && _nstates > 0 && _npatterns > 0);
        initBeagleLib();
        }
    }

The initBeagleLib member function

The initBeagleLib member function initializes BeagleLib, obtaining a BeagleLib instance handle (integer value) that is stored in the data member _instance. Before calling this function, you will need to know details about the data set you plan to analyze: the number of taxa, number of states (e.g. 4 for DNA), number of unique site patterns, whether the tree is rooted, and whether you prefer to use GPUs or the default CPU resource if there is a choice. Here is the body of this function; below the code I will explain more about the call to beagleCreateInstance:

inline void Likelihood::initBeagleLib()
    {
    // a non-operation ("no-op") if a valid instance has already been created
    if (_instance >= 0)
        return;

    assert(_data);

    _ntaxa      = _data->getNumTaxa();
    _npatterns  = _data->getNumPatterns();
    _nstates    = 4;
    _rooted     = false;
    _prefer_gpu = false;

    std::cout << "Sequence length:    " << _data->getSeqLen() << std::endl;
    std::cout << "Number of taxa:     " << _ntaxa << std::endl;
    std::cout << "Number of patterns: " << _npatterns << std::endl;

    unsigned num_internals        = (_rooted ? (_ntaxa - 1) : (_ntaxa - 2));
    unsigned num_transition_probs = (_rooted ? (2*_ntaxa - 2) : (2*_ntaxa - 3));

    long requirementFlags = 0;
    requirementFlags |= BEAGLE_FLAG_PRECISION_DOUBLE;

    long preferenceFlags = 0;
    if (_prefer_gpu)
        preferenceFlags |= BEAGLE_FLAG_PROCESSOR_GPU;
    else
        preferenceFlags |= BEAGLE_FLAG_PROCESSOR_CPU;
        
    BeagleInstanceDetails instance_details;
    _instance = beagleCreateInstance(
         _ntaxa,                    // tips
         num_internals,             // partials
         _ntaxa,                    // sequences
         _nstates,                  // states
         _npatterns,                // patterns
         1,                         // models
         num_transition_probs,      // transition matrices
         1,                         // rate categories
         0,                         // scale buffers
         NULL,                      // resource restrictions
         0,                         // length of resource list
         preferenceFlags,           // preferred flags
         requirementFlags,          // required flags
         &instance_details);        // pointer for details
    
    if (_instance < 0)
        {
        // beagleCreateInstance returns one of the following:
        //   valid instance (0, 1, 2, ...)
        //   error code (negative integer)
        throw XStrom(boost::str(boost::format("Likelihood init function failed to create Likelihood instance (BeagleLib error code was %d)") % _beagle_error[_instance]));
        }

    setTipStates();
    setPatternWeights();

    //std::cout << boost::str(boost::format("BeagleLib instance (%d) created.") % _instance) << std::endl;
    }

Note that beagleCreateInstance is a function defined by BeagleLib. Your program knows about it because you included the beagle.h header file at the top. The actual code for this function will not be stored in your program; it will be loaded on demand from a Dynamic Link Library (provided by a file with a name that ends in .dll on Windows, .dylib on Mac, or .so in Unix) after your program begins to execute.

Calling the beagleCreateInstance requires you to provide a lot of details. Here is a short description of these in the order in which they are supplied to the function:

Number of tips
This is the number of taxa in the tree.
Number of partials
This is the number of partials (also known as conditional likelihood arrays), and should equal the number of internal nodes (tips have data but not partials).
Number of sequences
This is the number of observed sequences, which should ordinarily equal the number of taxa.
Number of states
This is the number of states (e.g. 4 for DNA or RNA sequences, 20 for amino acid sequences, 61 for codons, etc.).
Number of patterns
This is the number of unique data patterns (you will later tell BeagleLib how many sites have each pattern).
Number of models
This will be 1 if a single rate matrix and state frequencies are used for all sites.
Number of transition matrices
This is the number of transition matrices needed, which should ordinarily equal the number of edges in the tree.
Number of rate categories
If your model allows for rate heterogeneity across sites, this is the number of rate categories. Use 1 if rates are assumed equal across sites.
Number of scale buffers
This has to do with scaling site likelihoods to avoid underflow, which is a concern for large trees, where site likelihoods can become so small that the computer cannot distinguish them from zero. The 0 value supplied here means we are not going to worry about scaling just yet.
Number of resource restrictions
This is a pointer to a list of potential resources (e.g. GPUs) on which this instance is allowed; we’ve supplied NULL (points to nothing) to indicate that we do not wish to restrict resources).
Length of resource list
The number of elements in the resource list supplied above (the pointer just points to the first element but stores no information about the total number of elements in the list, hence the need for this value); we’ve supplied 0 because we are not restricting BeagleLib.
Preferred flags
These tell BeagleLib your preferences (a key to possible flags is given below).
Required flags
These tell BeagleLib your requirements (a key to possible flags is given below).
Pointer for details
This is a pointer to a BeagleInstanceDetails structure, which will store details about implementation and resources (i.e. the decisions BeagleLib made given your preferences and requirements).

Here is the BeagleFlags enum in the beagle.h header file (this may change in later version of BeagleLib, so consult the header file itself if you want to be sure to have the most accurate information for your version of BeagleLib):

enum BeagleFlags {
BEAGLE_FLAG_PRECISION_SINGLE    = 1 << 0,    /* Single precision computation */
BEAGLE_FLAG_PRECISION_DOUBLE    = 1 << 1,    /* Double precision computation */

BEAGLE_FLAG_COMPUTATION_SYNCH   = 1 << 2,    /* Synchronous computation (blocking) */
BEAGLE_FLAG_COMPUTATION_ASYNCH  = 1 << 3,    /* Asynchronous computation (non-blocking) */

BEAGLE_FLAG_EIGEN_REAL          = 1 << 4,    /* Real eigenvalue computation */
BEAGLE_FLAG_EIGEN_COMPLEX       = 1 << 5,    /* Complex eigenvalue computation */

BEAGLE_FLAG_SCALING_MANUAL      = 1 << 6,    /* Manual scaling */
BEAGLE_FLAG_SCALING_AUTO        = 1 << 7,    /* Auto-scaling on (deprecated, may not work correctly) */
BEAGLE_FLAG_SCALING_ALWAYS      = 1 << 8,    /* Scale at every updatePartials (deprecated, may not work correctly) */
BEAGLE_FLAG_SCALING_DYNAMIC     = 1 << 25,   /* Manual scaling with dynamic checking (deprecated, may not work correctly) */

BEAGLE_FLAG_SCALERS_RAW         = 1 << 9,    /* Save raw scalers */
BEAGLE_FLAG_SCALERS_LOG         = 1 << 10,   /* Save log scalers */

BEAGLE_FLAG_INVEVEC_STANDARD    = 1 << 20,   /* Inverse eigen vectors passed to BEAGLE have not been transposed */
BEAGLE_FLAG_INVEVEC_TRANSPOSED  = 1 << 21,   /* Inverse eigen vectors passed to BEAGLE have been transposed */

BEAGLE_FLAG_VECTOR_SSE          = 1 << 11,   /* SSE computation */
BEAGLE_FLAG_VECTOR_AVX          = 1 << 24,   /* AVX computation */
BEAGLE_FLAG_VECTOR_NONE         = 1 << 12,   /* No vector computation */

BEAGLE_FLAG_THREADING_OPENMP    = 1 << 13,   /* OpenMP threading */
BEAGLE_FLAG_THREADING_NONE      = 1 << 14,   /* No threading */

BEAGLE_FLAG_PROCESSOR_CPU       = 1 << 15,   /* Use CPU as main processor */
BEAGLE_FLAG_PROCESSOR_GPU       = 1 << 16,   /* Use GPU as main processor */
BEAGLE_FLAG_PROCESSOR_FPGA      = 1 << 17,   /* Use FPGA as main processor */
BEAGLE_FLAG_PROCESSOR_CELL      = 1 << 18,   /* Use Cell as main processor */
BEAGLE_FLAG_PROCESSOR_PHI       = 1 << 19,   /* Use Intel Phi as main processor */
BEAGLE_FLAG_PROCESSOR_OTHER     = 1 << 26,   /* Use other type of processor */

BEAGLE_FLAG_FRAMEWORK_CUDA      = 1 << 22,   /* Use CUDA implementation with GPU resources */
BEAGLE_FLAG_FRAMEWORK_OPENCL    = 1 << 23,   /* Use OpenCL implementation with GPU resources */
BEAGLE_FLAG_FRAMEWORK_CPU       = 1 << 27    /* Use CPU implementation */
};

The availableResources member function

This function fetches the resource list from BeagleLib. This function can be used to determine whether your system has GPUs that may be used by BeagleLib. It also shows the CPU resource, which is used in the absence of GPUs.

inline std::string Likelihood::availableResources()
    {
    BeagleResourceList * rsrcList = beagleGetResourceList();
	std::string s;
    for (int i = 0; i < rsrcList->length; ++i)
        {
        std::string desc = rsrcList->list[i].description;
        boost::trim(desc);
        if (desc.size() > 0)
            s += boost::str(boost::format("%d: %s (%s)\n") % i % rsrcList->list[i].name % desc);
        else
            s += boost::str(boost::format("%d: %s\n") % i % rsrcList->list[i].name);
        }
    return s;
    }

The setTipStates member function

This function provides BeagleLib with the observed data for each tip node (leaf) in the tree. The data are stored in a Data object and made available to the function by means of the shared pointer _data. The for loop provides a pointer to the beginning (address of element 0) of the vector of states for each taxon. By the time these data are used, BeagleLib will know the length of this array of states (it is provided in the beagleCreateInstance function that was called in the init member function above). Note that this function assumes that a BeagleLib instance has already been created.

inline void Likelihood::setTipStates()
    {
    assert(_data);

    const Data::data_matrix_t & data_matrix = _data->getDataMatrix();
    unsigned i = 0;

    typedef const std::vector<int> & ref_int_vect_t;
    for (ref_int_vect_t v : data_matrix)
        {
        int code = beagleSetTipStates(
            _instance,      // Instance number
            i,              // Index of destination compactBuffer
            &v[0]);         // Pointer to compact states vector

        if (code != 0)
            throw XStrom(boost::str(boost::format("failed to set tip state for taxon %d (\"%s\"; BeagleLib error code was %d)") % (i+1) % _data->getTaxonNames()[i] % code % _beagle_error[code]));
        ++i;
        }
    }

The setPatternWeights member function

Before computing a likelihood, BeagleLib will need to know how many sites exhibited each site pattern in the data set. Some patterns will be found at a single site, whereas other patterns will be common to many sites (for example, constant patterns in which the same state is found for all taxa are relatively common). The Data class stores a vector of site counts for each pattern in its _pattern_counts data member, so all we have to do here is transfer those counts to our BeagleLib instance. Note that this function assumes that a BeagleLib instance has already been created.

inline void Likelihood::setPatternWeights()
    {
    assert(_data);

    int code = 0;
    const Data::pattern_counts_t & v = _data->getPatternCounts();
    if (v.empty())
        throw XStrom(boost::str(boost::format("failed to set pattern weights because data matrix has empty pattern count vector") % code));

    code = beagleSetPatternWeights(
       _instance,     // instance number
       &v[0]);        // vector of pattern counts
       
    if (code != 0)
        throw XStrom(boost::str(boost::format("failed to set pattern weights. BeagleLib error code was %d (%s)") % code % _beagle_error[code]));
    }    

The setDiscreteGammaShape member function

BeagleLib handles rate heterogeneity in a very general way (as it handles all aspects of models). It expects you to provide it with an array of relative substitution rates and a corresponding array of rate probabilities. It does not care how you got these two arrays, so both rates and probabilities can be completely arbitrary, but in this tutorial we will ultimately supply BeagleLib with rates and probabilities derived from the discrete Gamma distribution widely used for modeling among-site rate heterogeneity. That explains why the parameter of this function is called shape; however, for the moment we are going to assume that the rate at each site is equal to the rate at every other site, so shape will be ignored. Later we will rewrite this function to take shape into consideration. This function calls two BeagleLib functions, beagleSetCategoryRates and beagleSetCategoryWeights, both of which assume that a valid BeagleLib instance exists. For equal rates, the relative rate for each site is 1.0 and the probability that each site has that relative rate is also 1.0, so the rates and probs arrays passed to BeagleLib both have a single element that has value 1.0.

inline void Likelihood::setDiscreteGammaShape()
    {
    // For now we are ignoring shape and assuming rates are equal
    double rates[1] = {1.0};
    double probs[1] = {1.0};

    // This function sets the vector of category rates for an instance
    int code = beagleSetCategoryRates(
        _instance,  // instance number
        rates);     // vector containing rate scalers
      
    if (code != 0)
        throw XStrom(boost::str(boost::format("failed to set category rates. BeagleLib error code was %d (%s)") % code % _beagle_error[code]));
    
    // This function copies a category weights array into an instance buffer.
    // maybe need to set seqLens times, same as eigenBufferCount
    code = beagleSetCategoryWeights(
        _instance,  // Instance number
        0,          // Index of category weights buffer. eigenIndex
        probs);     // Category weights array (categoryCount)

    if (code != 0)
        throw XStrom(boost::str(boost::format("failed to set category probabilities. BeagleLib error code was %d (%s)") % code % _beagle_error[code]));
    }

The setModelRateMatrix member function

BeagleLib handles the specification of a substitution model in a very general way. Specifically, it requires you to provide the equilibrium state frequencies and the eigenvalues and eigenvectors of the instantaneous rate matrix. We will be using the simplest model (the Jukes-Cantor, or JC69, model), which specifies equal base frequencies and a rate matrix in which all off-diagonal rates are identical. The BeagleLib function beagleSetStateFrequencies is used to transfer the state frequencies, and the function beagleSetEigenDecomposition is used to transfer the eigenvalues, eigenvectors, and inverse eigenvectors (all in the form of flat, single-dimensional arrays), to an existing BeagleLib instance. You may wonder why even simple models such as JC69, for which closed-form expressions (i.e. formulas) are available for transition probabilities, must be specified as eigensystems. The answer is that most models (e.g. GTR) used these days in phylogenetics are complex enough that transition probabilities must be computed by diagonalizing the rate matrix, so the approach used by BeagleLib is straightforward, flexible enough to allow quite complex models of evolution, and yet as simple as possible given the need to maintain generality.

inline void Likelihood::setModelRateMatrix()
    {
    // This function copies a state frequency array into an instance buffer.
    double state_freqs[4] = {0.25, 0.25, 0.25, 0.25};
    int code = beagleSetStateFrequencies(
         _instance,          // Instance number
         0,                  // Index of state frequencies buffer. eigenIndex
         state_freqs);      // State frequencies array (stateCount)

    if (code != 0)
        throw XStrom(boost::str(boost::format("failed to set state frequencies. BeagleLib error code was %d (%s)") % code % _beagle_error[code]));

    double eigenvalues[4] = {
        -4.0/3.0,
        -4.0/3.0,
        -4.0/3.0,
        0.0
        };

    double eigenvectors[16] = {
        -1,   -1,  -1,  1,
         0,    0,   1,  1,
         0,    1,   0,  1,
         1,    0,   0,  1
        };

    double inverse_eigenvectors[16] = {
        -0.25,   -0.25,   -0.25,   0.75,
        -0.25,   -0.25,    0.75,  -0.25,
        -0.25,    0.75,   -0.25,  -0.25,
         0.25,    0.25,    0.25,   0.25
        };

    // This function copies an eigen-decomposition into an instance buffer.
    // Note that BeagleLib transposes the input eigenvectors and inverse-eigenvectors.
    code = beagleSetEigenDecomposition(
        _instance,                              // Instance number
        0,                                      // Index of eigen-decomposition buffer
        (const double *)eigenvectors,           // Flattened matrix (stateCount x stateCount) of eigenvectors
        (const double *)inverse_eigenvectors,   // Flattened matrix (stateCount x stateCount) of inverse-eigenvectors
        eigenvalues);                           // Vector of eigenvalues

    if (code != 0)
        throw XStrom(boost::str(boost::format("failed to set eigen decomposition. BeagleLib error code was %d (%s)") % code % _beagle_error[code]));
    }

The defineOperations member function

BeagleLib does not define or use trees per se. Instead, we must tell BeagleLib exactly which transition matrices to compute and which partial likelihoods to calculate in order to compute the likelihood. To do this, we traverse a tree (passed in using the shared pointer t to a Tree object) in postorder sequence (i.e. visiting all descendants of a given node before visiting the node itself) and create a list of "operations" for BeagleLib to carry out. Each operation is defined by providing 7 values in a particular sequence for each internal node. Here is a description of these 7 values:

1. Destination partials
The index of the internal node (leaf nodes are indexed from 0,…,n-1, and internal nodes are indexed starting at n, which is the number of taxa). This value tells BeagleLib which partial (conditional likelihood array) to compute.
2. Destination scale write
We provide the value BEAGLE_OP_NONE to indicate we are not scaling site likelihoods.
3. Destination scale read
We provide the value BEAGLE_OP_NONE to indicate we are not scaling site likelihoods.
4. Left child partials
The index of the left child node. BeagleLib will use data if the left child is a tip, and only uses a partial if the left child is an internal node.
5. Left child transition matrix
The index of the left child node. This tells BeagleLib which transition matrix to use to accommodate evolution along the edge leading to the left child.
6. Right child partials
The index of the right child node. BeagleLib will use data if the right child is a tip, and only uses a partial if the right child is an internal node.
7. Right child transition matrix
The index of the right child node. This tells BeagleLib which transition matrix to use to accommodate evolution along the edge leading to the right child.

The operations array is flat, meaning that if 3 operations are defined, the operations array will be a single-dimensional array having 21 elements. While we are traversing the tree, we create a vector of node numbers (stored in the data member _pmatrix_index) and edge lengths (stored in the data member _edge_lengths) as they are encountered for use in the calcLogLikelihood function defined below.

template <class T>
inline void Likelihood<T>::defineOperations(typename Tree<T>::SharedPtr t) 
    {
    _operations.clear();
    _pmatrix_index.clear();
    _edge_lengths.clear();

    BOOST_REVERSE_FOREACH(T * nd, t->_preorder)
        {
        if (!nd->_left_child)
            {
            // This is a leaf
            _pmatrix_index.push_back(nd->_number);
            _edge_lengths.push_back(nd->_edge_length);
            }
        else
            {
            // This is an internal node
            _pmatrix_index.push_back(nd->_number);
            _edge_lengths.push_back(nd->_edge_length);

            // Internal nodes have partials to be calculated, so define
            // an operation to compute the partials for this node

            // 1. destination partial to be calculated
            int partial = nd->_number;
            _operations.push_back(partial);

            // 2. destination scaling buffer index to write to
            _operations.push_back(BEAGLE_OP_NONE);

            // 3. destination scaling buffer index to read from
            _operations.push_back(BEAGLE_OP_NONE);

            // 4. left child partial index
            partial = nd->_left_child->_number;
            _operations.push_back(partial);

            // 5. left child transition matrix index
            int tmatrix = nd->_left_child->_number;
            _operations.push_back(tmatrix);

            // 6. right child partial index
            assert(nd->_left_child);
            assert(nd->_left_child->_right_sib);
            partial = nd->_left_child->_right_sib->_number;
            _operations.push_back(partial); // assumes binary tree

            // 7. right child transition matrix index
            tmatrix = nd->_left_child->_right_sib->_number;
            _operations.push_back(tmatrix);
        }
    }

    // if tree is unrooted and thus "rooted" at a leaf, need to 
    // use the transition matrix associated with the leaf
    if (!t->_is_rooted)
        _pmatrix_index[_pmatrix_index.size()-1] = t->getRoot()->_number;
    }

Important! The defineOperations member function body above accesses private data members of both the Tree and Node classes. To avoid compile-time errors, you will thus need to declare the Likelihood class to be a friend of both Tree and Node. In both node.hpp and tree.hpp, you should uncomment the 2 lines below:

template<class T> class Likelihood;
friend class Likelihood<T>;

The updateTransitionMatrices member function

This function recalculates all transition matrices. In the defineOperations member function, we created a vector (_pmatrix_index) comprising the indices of all nodes that have a transition matrix needing to be recomputed, and we also created a vector _edge_lengths containing the edge lengths corresponding to the nodes listed in _pmatrix_index, so here all we need do is provide these lists to BeagleLib.

The first argument must be a valid BeagleLib instance, the second argument is the index of the eigen decomposition to use (0 in our case because there is only one), the third argument is the address of the first element of the array of node indices specifying the transition matrices to be computed, the fourth and fifth arguments are NULL because we are not concerned with calculating first and second deriviatives, the sixth argument is the address of the first element of the arrayof edge lengths, and the seventh and final argument is the number of transition probabilities to calculate.

template <class T>
inline void Likelihood<T>::updateTransitionMatrices()
    {
    int code = beagleUpdateTransitionMatrices(
        _instance,                      // Instance number
        0,                              // Index of eigen-decomposition buffer
        &_pmatrix_index[0],             // transition probability matrices to update
        NULL,                           // first derivative matrices to update
        NULL,                           // second derivative matrices to update
        &_edge_lengths[0],              // List of edge lengths
        (int)_pmatrix_index.size());    // Length of lists

    if (code != 0)
        throw XStrom(boost::str(boost::format("failed to update transition matrices. BeagleLib error code was %d (%s)") % code % _beagle_error[code]));
    }

The calculatePartials member function

This function instructs BeagleLib to update all its partials using the information in the operations array. In the defineOperations member function, we created a vector _operations that stores all the information BeagleLib needs, so here we simply allow it to copy the _operations vector.

The first argument must be a valid BeagleLib instance, the second argument is the address of the starting element of the _operations vector, the third argument is the number of elements in the _operations vector, and the final argument is the index of the scale buffer used to store accumulated scaling factors, but because we are not scaling site likelihoods currently, we supply the value BEAGLE_OP_NONE.

template <class T>
inline void Likelihood<T>::calculatePartials()
    {
    // Calculate or queue for calculation partials using a list of operations
    int totalOperations = (int)(_operations.size()/7);
    int code = beagleUpdatePartials(
        _instance,                              // Instance number
        (BeagleOperation *) &_operations[0],    // BeagleOperation list specifying operations
        totalOperations,                        // Number of operations
        BEAGLE_OP_NONE);                        // Index number of scaleBuffer to store accumulated factors

    if (code != 0)
        throw XStrom(boost::str(boost::format("failed to update partials. BeagleLib error code was %d (%s)") % code % _beagle_error[code]));
    }

The calcLogLikelihood member function

The final member function we need to define is the one that returns the log-likelihood for a tree. For now, we only consider unrooted trees, so the beginning of this function is concerned with ensuring that this is true. After that, the member functions setJCRateMatrix, setDiscreteGammaShape, defineOperations, updateTransitionMatrices and calculatePartials are called: see the description of those functions above for an explanation of what they do. The only remaining BeagleLib function that needs to be called is beagleCalculateEdgeLogLikelihoods. We have computed partials for the tree down to the single descendant (call this node the parent) of the tip node serving as the root (called the child) in this unrooted tree. There is thus one edge left that has not been accounted for, and that is the edge connecting the parent to the child.

The 13 arguments supplied to the beagleCalculateEdgeLogLikelihoods function are:

Instance
This must be a valid BeagleLib instance.
List of indices of parent partials buffers
In our case, the list has only one element, which is the index of the parent node.
List of indices of child partials buffers
In our case, the list has only one element, which is the index of the child node.
List of indices of transition probability matrices for this edge
In our case, the list has only one element, which is the index of the child node.
List of indices of first derivative matrices
We do not need to calculate first derivatives, so NULL is supplied for this.
List of indices of second derivative matrices
We do not need to calculate second derivatives, so NULL is supplied for this.
List of weights to apply to each partials buffer
This list supplies the relative rates and probabilities, of which there is only one set in our case, so we supply the index 0 here.
List of state frequencies for each partials buffer
This list supplies the state frequencies, of which there is only one set in our case, so we supply the index 0 here.
List of scale buffers containing accumulated factors to apply to each partials buffer
Because we are not protecting site likelihoods from underflow yet, we supply BEAGLE_OP_NONE here.
Number of partials buffers
This is the number of elements in each of the lists supplied above. In our case, the number of partials buffers is just 1.
Destination for log-likelihood value
This is the address of a double value in which will be stored the final log-likelihood.
Destination for first derivative value
This is the address of a double value in which will be stored the first derivative (we supply NULL because we do not need first derivatives to be calculated).
Destination for second derivative value
This is the address of a double value in which will be stored the second derivative (we supply NULL because we do not need second derivatives to be calculated).
template <class T>
inline double Likelihood<T>::calcLogLikelihood(typename Tree<T>::SharedPtr t)
    {
    if (t->_is_rooted)
        throw XStrom("can only compute likelihoods for unrooted trees currently");

    if (!_data)
        throw XStrom("must call setData before calcLogLikelihood");

    initBeagleLib(); // this is a no-op if a valid instance already exists
    
    // Assuming "root" is leaf 0
    assert(t->getRoot()->_number == 0 && t->getRoot()->_left_child == t->_preorder[0] && !t->_preorder[0]->_right_sib);

    // Assuming there are as many transition matrices as there are edge lengths
    assert(_pmatrix_index.size() == _edge_lengths.size());

    setModelRateMatrix();
    setDiscreteGammaShape();
    defineOperations(t);
    updateTransitionMatrices();
    calculatePartials();

    // The beagleCalculateEdgeLogLikelihoods function integrates a list of partials 
    // at a parent and child node with respect to a set of partials-weights and 
    // state frequencies to return the log likelihood and first and second derivative sums
    int stateFrequencyIndex  = 0;
    int categoryWeightsIndex = 0;   
    int cumulativeScalingIndex = BEAGLE_OP_NONE;
    double log_likelihood = 0.0;

    // index_focal_child is the root node
    int index_focal_child  = t->getRoot()->_number;

    // index_focal_parent is the only child of root node  
    int index_focal_parent = t->_preorder[0]->_number;  

    int code = beagleCalculateEdgeLogLikelihoods(
        _instance,                  // instance number
        &index_focal_parent,        // indices of parent partialsBuffers
        &index_focal_child,         // indices of child partialsBuffers
        &index_focal_child,         // transition probability matrices for this edge
        NULL,                       // first derivative matrices
        NULL,                       // second derivative matrices
        &categoryWeightsIndex,      // weights to apply to each partialsBuffer
        &stateFrequencyIndex,       // state frequencies for each partialsBuffer
        &cumulativeScalingIndex,    // scaleBuffers containing accumulated factors
        1,                          // Number of partialsBuffer
        &log_likelihood,            // destination for log likelihood
        NULL,                       // destination for first derivative
        NULL);                      // destination for second derivative

    if (code != 0)
        throw XStrom(boost::str(boost::format("failed to calculate edge logLikelihoods in CalcLogLikelihood. BeagleLib error code was %d (%s)") % code % _beagle_error[code]));

    return log_likelihood;
    }