**Not yet finished:** please avoid until this notice goes away

By the time you are finished with this lesson, you will be able to read in a tree and calculate the log-likelihood of this tree under the simplest substitution model (the Jukes-Cantor, or JC69, model). Create a new header file named *likelihood.hpp* containing

the following code.

#pragma once #include <map> #include <boost/algorithm/string.hpp> #include <boost/format.hpp> #include <boost/shared_ptr.hpp> #include "libhmsbeagle/beagle.h" #include "data.hpp" #include "xstrom.hpp" namespace strom { class Likelihood { public: Likelihood(); ~Likelihood(); std::string availableResources(); double calcLogLikelihood(typename Tree::SharedPtr t); void setData(Data::SharedPtr d); Data::SharedPtr getData(); private: void initBeagleLib(); void setTipStates(); void setPatternWeights(); void setDiscreteGammaShape(); void setModelRateMatrix(); void defineOperations(typename Tree::SharedPtr t); void updateTransitionMatrices(); void calculatePartials(); int _instance; std::map<int, std::string> _beagle_error; std::vector<int> _operations; std::vector<int> _pmatrix_index; std::vector<double> _edge_lengths; Data::SharedPtr _data; unsigned _ntaxa; unsigned _nstates; unsigned _npatterns; bool _rooted; bool _prefer_gpu; public: typedef std::shared_ptr< Likelihood > SharedPtr; }; // member function bodies go here } // namespace strom

## Constructor

The constructors we have created for most classes do not really do anything except report that an object of that type was created. The constructor for the Likelihood class is a little more complex because it creates a vector of possible error messages that could be issued by BeagleLib.

inline Likelihood::Likelihood() { _instance = -1; _ntaxa = 0; _nstates = 0; _npatterns = 0; _rooted = false; _prefer_gpu = false; // store BeagleLib error codes so that useful // error messages may be provided to the user _beagle_error[0] = std::string("success"); _beagle_error[-1] = std::string("unspecified error"); _beagle_error[-2] = std::string("not enough memory could be allocated"); _beagle_error[-3] = std::string("unspecified exception"); _beagle_error[-4] = std::string("the instance index is out of range, or the instance has not been created"); _beagle_error[-5] = std::string("one of the indices specified exceeded the range of the array"); _beagle_error[-6] = std::string("no resource matches requirements"); _beagle_error[-7] = std::string("no implementation matches requirements"); _beagle_error[-8] = std::string("floating-point range exceeded"); }

The constructor first initializes the `_instance`

data member with the value -1. The value of `_instance`

will remain -1 until a valid BeagleLib instance is available, in which case it will be set by BeagleLib to a non-negative integer. For this reason, a negative value stored in `_instance`

indicates that no BeagleLib instance has been created.

The `_beagle_error`

map simply stores BeagleLib error codes (cryptic negative integers) and associates them with meaningful messages so that our users have some chance of knowing what went wrong if BeagleLib fails. The constructor fills the map with all possible error codes so that we are ready for any error code that BeagleLib can spit out.

## Destructor

Here is the destructor.

inline Likelihood::~Likelihood() { if (_instance >= 0) { int code = beagleFinalizeInstance(_instance); if (code != 0) throw XStrom(boost::str(boost::format("Likelihood destructor failed to finalize BeagleLib instance. BeagleLib error code was %d (%s).") % code % _beagle_error[code])); } }

The destructor calls the function `beagleFinalizeInstance`

to tell BeagleLib that our program is shutting down and its services will no longer be needed. If for some reason BeagleLib cannot shut down cleanly, an exception is thrown so that the user is informed.

## The getData and setData member functions

These functions allow you to associate a Data object with this Likelihood object. Note that changing the data set will require us to delete ("finalize" in BeagleLib terminology) the existing BeagleLib instance and create a new one.

inline Data::SharedPtr Likelihood::getData() { return _data; } inline void Likelihood::setData(Data::SharedPtr data) { _data = data; if (_instance >= 0) { // initBeagleLib function was previously called, so // finalize existing instance and create new one int code = beagleFinalizeInstance(_instance); if (code != 0) throw XStrom(boost::str(boost::format("Likelihood setData function failed to finalize BeagleLib instance. BeagleLib error code was %d (%s).") % code % _beagle_error[code])); _instance = -1; assert(_ntaxa > 0 && _nstates > 0 && _npatterns > 0); initBeagleLib(); } }

## The initBeagleLib member function

The `initBeagleLib`

member function initializes BeagleLib, obtaining a BeagleLib instance handle (integer value) that is stored in the data member `_instance`

. Before calling this function, you will need to know details about the data set you plan to analyze: the number of taxa, number of states (e.g. 4 for DNA), number of unique site patterns, whether the tree is rooted, and whether you prefer to use GPUs or the default CPU resource if there is a choice. Here is the body of this function; below the code I will explain more about the call to `beagleCreateInstance`

:

inline void Likelihood::initBeagleLib() { // a non-operation ("no-op") if a valid instance has already been created if (_instance >= 0) return; assert(_data); _ntaxa = _data->getNumTaxa(); _npatterns = _data->getNumPatterns(); _nstates = 4; _rooted = false; _prefer_gpu = false; std::cout << "Sequence length: " << _data->getSeqLen() << std::endl; std::cout << "Number of taxa: " << _ntaxa << std::endl; std::cout << "Number of patterns: " << _npatterns << std::endl; unsigned num_internals = (_rooted ? (_ntaxa - 1) : (_ntaxa - 2)); unsigned num_transition_probs = (_rooted ? (2*_ntaxa - 2) : (2*_ntaxa - 3)); long requirementFlags = 0; requirementFlags |= BEAGLE_FLAG_PRECISION_DOUBLE; long preferenceFlags = 0; if (_prefer_gpu) preferenceFlags |= BEAGLE_FLAG_PROCESSOR_GPU; else preferenceFlags |= BEAGLE_FLAG_PROCESSOR_CPU; BeagleInstanceDetails instance_details; _instance = beagleCreateInstance( _ntaxa, // tips num_internals, // partials _ntaxa, // sequences _nstates, // states _npatterns, // patterns 1, // models num_transition_probs, // transition matrices 1, // rate categories 0, // scale buffers NULL, // resource restrictions 0, // length of resource list preferenceFlags, // preferred flags requirementFlags, // required flags &instance_details); // pointer for details if (_instance < 0) { // beagleCreateInstance returns one of the following: // valid instance (0, 1, 2, ...) // error code (negative integer) throw XStrom(boost::str(boost::format("Likelihood init function failed to create Likelihood instance (BeagleLib error code was %d)") % _beagle_error[_instance])); } setTipStates(); setPatternWeights(); //std::cout << boost::str(boost::format("BeagleLib instance (%d) created.") % _instance) << std::endl; }

Note that `beagleCreateInstance`

is a function defined by BeagleLib. Your program knows about it because you included the *beagle.h* header file at the top. The actual code for this function will not be stored in your program; it will be loaded on demand from a Dynamic Link Library (provided by a file with a name that ends in *.dll* on Windows, *.dylib* on Mac, or *.so* in Unix) after your program begins to execute.

Calling the `beagleCreateInstance`

requires you to provide a lot of details. Here is a short description of these in the order in which they are supplied to the function:

- Number of tips
- This is the number of taxa in the tree.
- Number of partials
- This is the number of partials (also known as conditional likelihood arrays), and should equal the number of internal nodes (tips have data but not partials).
- Number of sequences
- This is the number of observed sequences, which should ordinarily equal the number of taxa.
- Number of states
- This is the number of states (e.g. 4 for DNA or RNA sequences, 20 for amino acid sequences, 61 for codons, etc.).
- Number of patterns
- This is the number of unique data patterns (you will later tell BeagleLib how many sites have each pattern).
- Number of models
- This will be 1 if a single rate matrix and state frequencies are used for all sites.
- Number of transition matrices
- This is the number of transition matrices needed, which should ordinarily equal the number of edges in the tree.
- Number of rate categories
- If your model allows for rate heterogeneity across sites, this is the number of rate categories. Use 1 if rates are assumed equal across sites.
- Number of scale buffers
- This has to do with scaling site likelihoods to avoid underflow, which is a concern for large trees, where site likelihoods can become so small that the computer cannot distinguish them from zero. The 0 value supplied here means we are not going to worry about scaling just yet.
- Number of resource restrictions
- This is a pointer to a list of potential resources (e.g. GPUs) on which this instance is allowed; we’ve supplied NULL (points to nothing) to indicate that we do not wish to restrict resources).
- Length of resource list
- The number of elements in the resource list supplied above (the pointer just points to the first element but stores no information about the total number of elements in the list, hence the need for this value); we’ve supplied 0 because we are not restricting BeagleLib.
- Preferred flags
- These tell BeagleLib your preferences (a key to possible flags is given below).
- Required flags
- These tell BeagleLib your requirements (a key to possible flags is given below).
- Pointer for details
- This is a pointer to a BeagleInstanceDetails structure, which will store details about implementation and resources (i.e. the decisions BeagleLib made given your preferences and requirements).

Here is the `BeagleFlags`

enum in the *beagle.h* header file (this may change in later version of BeagleLib, so consult the header file itself if you want to be sure to have the most accurate information for your version of BeagleLib):

enum BeagleFlags { BEAGLE_FLAG_PRECISION_SINGLE = 1 << 0, /* Single precision computation */ BEAGLE_FLAG_PRECISION_DOUBLE = 1 << 1, /* Double precision computation */ BEAGLE_FLAG_COMPUTATION_SYNCH = 1 << 2, /* Synchronous computation (blocking) */ BEAGLE_FLAG_COMPUTATION_ASYNCH = 1 << 3, /* Asynchronous computation (non-blocking) */ BEAGLE_FLAG_EIGEN_REAL = 1 << 4, /* Real eigenvalue computation */ BEAGLE_FLAG_EIGEN_COMPLEX = 1 << 5, /* Complex eigenvalue computation */ BEAGLE_FLAG_SCALING_MANUAL = 1 << 6, /* Manual scaling */ BEAGLE_FLAG_SCALING_AUTO = 1 << 7, /* Auto-scaling on (deprecated, may not work correctly) */ BEAGLE_FLAG_SCALING_ALWAYS = 1 << 8, /* Scale at every updatePartials (deprecated, may not work correctly) */ BEAGLE_FLAG_SCALING_DYNAMIC = 1 << 25, /* Manual scaling with dynamic checking (deprecated, may not work correctly) */ BEAGLE_FLAG_SCALERS_RAW = 1 << 9, /* Save raw scalers */ BEAGLE_FLAG_SCALERS_LOG = 1 << 10, /* Save log scalers */ BEAGLE_FLAG_INVEVEC_STANDARD = 1 << 20, /* Inverse eigen vectors passed to BEAGLE have not been transposed */ BEAGLE_FLAG_INVEVEC_TRANSPOSED = 1 << 21, /* Inverse eigen vectors passed to BEAGLE have been transposed */ BEAGLE_FLAG_VECTOR_SSE = 1 << 11, /* SSE computation */ BEAGLE_FLAG_VECTOR_AVX = 1 << 24, /* AVX computation */ BEAGLE_FLAG_VECTOR_NONE = 1 << 12, /* No vector computation */ BEAGLE_FLAG_THREADING_OPENMP = 1 << 13, /* OpenMP threading */ BEAGLE_FLAG_THREADING_NONE = 1 << 14, /* No threading */ BEAGLE_FLAG_PROCESSOR_CPU = 1 << 15, /* Use CPU as main processor */ BEAGLE_FLAG_PROCESSOR_GPU = 1 << 16, /* Use GPU as main processor */ BEAGLE_FLAG_PROCESSOR_FPGA = 1 << 17, /* Use FPGA as main processor */ BEAGLE_FLAG_PROCESSOR_CELL = 1 << 18, /* Use Cell as main processor */ BEAGLE_FLAG_PROCESSOR_PHI = 1 << 19, /* Use Intel Phi as main processor */ BEAGLE_FLAG_PROCESSOR_OTHER = 1 << 26, /* Use other type of processor */ BEAGLE_FLAG_FRAMEWORK_CUDA = 1 << 22, /* Use CUDA implementation with GPU resources */ BEAGLE_FLAG_FRAMEWORK_OPENCL = 1 << 23, /* Use OpenCL implementation with GPU resources */ BEAGLE_FLAG_FRAMEWORK_CPU = 1 << 27 /* Use CPU implementation */ };

## The availableResources member function

This function fetches the resource list from BeagleLib. This function can be used to determine whether your system has GPUs that may be used by BeagleLib. It also shows the CPU resource, which is used in the absence of GPUs.

inline std::string Likelihood::availableResources() { BeagleResourceList * rsrcList = beagleGetResourceList(); std::string s; for (int i = 0; i < rsrcList->length; ++i) { std::string desc = rsrcList->list[i].description; boost::trim(desc); if (desc.size() > 0) s += boost::str(boost::format("%d: %s (%s)\n") % i % rsrcList->list[i].name % desc); else s += boost::str(boost::format("%d: %s\n") % i % rsrcList->list[i].name); } return s; }

## The setTipStates member function

This function provides BeagleLib with the observed data for each tip node (leaf) in the tree. The data are stored in a `Data`

object and made available to the function by means of the shared pointer `_data`

. The for loop provides a pointer to the beginning (address of element 0) of the vector of states for each taxon. By the time these data are used, BeagleLib will know the length of this array of states (it is provided in the `beagleCreateInstance`

function that was called in the `init`

member function above). Note that this function assumes that a BeagleLib instance has already been created.

inline void Likelihood::setTipStates() { assert(_data); const Data::data_matrix_t & data_matrix = _data->getDataMatrix(); unsigned i = 0; typedef const std::vector<int> & ref_int_vect_t; for (ref_int_vect_t v : data_matrix) { int code = beagleSetTipStates( _instance, // Instance number i, // Index of destination compactBuffer &v[0]); // Pointer to compact states vector if (code != 0) throw XStrom(boost::str(boost::format("failed to set tip state for taxon %d (\"%s\"; BeagleLib error code was %d)") % (i+1) % _data->getTaxonNames()[i] % code % _beagle_error[code])); ++i; } }

## The setPatternWeights member function

Before computing a likelihood, BeagleLib will need to know how many sites exhibited each site pattern in the data set. Some patterns will be found at a single site, whereas other patterns will be common to many sites (for example, constant patterns in which the same state is found for all taxa are relatively common). The Data class stores a vector of site counts for each pattern in its `_pattern_counts`

data member, so all we have to do here is transfer those counts to our BeagleLib instance. Note that this function assumes that a BeagleLib instance has already been created.

inline void Likelihood::setPatternWeights() { assert(_data); int code = 0; const Data::pattern_counts_t & v = _data->getPatternCounts(); if (v.empty()) throw XStrom(boost::str(boost::format("failed to set pattern weights because data matrix has empty pattern count vector") % code)); code = beagleSetPatternWeights( _instance, // instance number &v[0]); // vector of pattern counts if (code != 0) throw XStrom(boost::str(boost::format("failed to set pattern weights. BeagleLib error code was %d (%s)") % code % _beagle_error[code])); }

## The setDiscreteGammaShape member function

BeagleLib handles rate heterogeneity in a very general way (as it handles all aspects of models). It expects you to provide it with an array of relative substitution rates and a corresponding array of rate probabilities. It does not care how you got these two arrays, so both rates and probabilities can be completely arbitrary, but in this tutorial we will ultimately supply BeagleLib with rates and probabilities derived from the discrete Gamma distribution widely used for modeling among-site rate heterogeneity. That explains why the parameter of this function is called `shape`

; however, for the moment we are going to assume that the rate at each site is equal to the rate at every other site, so `shape`

will be ignored. Later we will rewrite this function to take `shape`

into consideration. This function calls two BeagleLib functions, `beagleSetCategoryRates`

and `beagleSetCategoryWeights`

, both of which assume that a valid BeagleLib instance exists. For equal rates, the relative rate for each site is 1.0 and the probability that each site has that relative rate is also 1.0, so the `rates`

and `probs`

arrays passed to BeagleLib both have a single element that has value 1.0.

inline void Likelihood::setDiscreteGammaShape() { // For now we are ignoring shape and assuming rates are equal double rates[1] = {1.0}; double probs[1] = {1.0}; // This function sets the vector of category rates for an instance int code = beagleSetCategoryRates( _instance, // instance number rates); // vector containing rate scalers if (code != 0) throw XStrom(boost::str(boost::format("failed to set category rates. BeagleLib error code was %d (%s)") % code % _beagle_error[code])); // This function copies a category weights array into an instance buffer. // maybe need to set seqLens times, same as eigenBufferCount code = beagleSetCategoryWeights( _instance, // Instance number 0, // Index of category weights buffer. eigenIndex probs); // Category weights array (categoryCount) if (code != 0) throw XStrom(boost::str(boost::format("failed to set category probabilities. BeagleLib error code was %d (%s)") % code % _beagle_error[code])); }

## The setModelRateMatrix member function

BeagleLib handles the specification of a substitution model in a very general way. Specifically, it requires you to provide the equilibrium state frequencies and the eigenvalues and eigenvectors of the instantaneous rate matrix. We will be using the simplest model (the Jukes-Cantor, or JC69, model), which specifies equal base frequencies and a rate matrix in which all off-diagonal rates are identical. The BeagleLib function `beagleSetStateFrequencies`

is used to transfer the state frequencies, and the function `beagleSetEigenDecomposition`

is used to transfer the eigenvalues, eigenvectors, and inverse eigenvectors (all in the form of flat, single-dimensional arrays), to an existing BeagleLib instance. You may wonder why even simple models such as JC69, for which closed-form expressions (i.e. formulas) are available for transition probabilities, must be specified as eigensystems. The answer is that most models (e.g. GTR) used these days in phylogenetics are complex enough that transition probabilities must be computed by diagonalizing the rate matrix, so the approach used by BeagleLib is straightforward, flexible enough to allow quite complex models of evolution, and yet as simple as possible given the need to maintain generality.

inline void Likelihood::setModelRateMatrix() { // This function copies a state frequency array into an instance buffer. double state_freqs[4] = {0.25, 0.25, 0.25, 0.25}; int code = beagleSetStateFrequencies( _instance, // Instance number 0, // Index of state frequencies buffer. eigenIndex state_freqs); // State frequencies array (stateCount) if (code != 0) throw XStrom(boost::str(boost::format("failed to set state frequencies. BeagleLib error code was %d (%s)") % code % _beagle_error[code])); double eigenvalues[4] = { -4.0/3.0, -4.0/3.0, -4.0/3.0, 0.0 }; double eigenvectors[16] = { -1, -1, -1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1 }; double inverse_eigenvectors[16] = { -0.25, -0.25, -0.25, 0.75, -0.25, -0.25, 0.75, -0.25, -0.25, 0.75, -0.25, -0.25, 0.25, 0.25, 0.25, 0.25 }; // This function copies an eigen-decomposition into an instance buffer. // Note that BeagleLib transposes the input eigenvectors and inverse-eigenvectors. code = beagleSetEigenDecomposition( _instance, // Instance number 0, // Index of eigen-decomposition buffer (const double *)eigenvectors, // Flattened matrix (stateCount x stateCount) of eigenvectors (const double *)inverse_eigenvectors, // Flattened matrix (stateCount x stateCount) of inverse-eigenvectors eigenvalues); // Vector of eigenvalues if (code != 0) throw XStrom(boost::str(boost::format("failed to set eigen decomposition. BeagleLib error code was %d (%s)") % code % _beagle_error[code])); }

## The defineOperations member function

BeagleLib does not define or use trees *per se*. Instead, we must tell BeagleLib exactly which transition matrices to compute and which partial likelihoods to calculate in order to compute the likelihood. To do this, we traverse a tree (passed in using the shared pointer `t`

to a `Tree`

object) in postorder sequence (i.e. visiting all descendants of a given node before visiting the node itself) and create a list of "operations" for BeagleLib to carry out. Each operation is defined by providing 7 values in a particular sequence for each internal node. Here is a description of these 7 values:

- 1. Destination partials
- The index of the internal node (leaf nodes are indexed from 0,…,n-1, and internal nodes are indexed starting at n, which is the number of taxa). This value tells BeagleLib which partial (conditional likelihood array) to compute.
- 2. Destination scale write
- We provide the value
`BEAGLE_OP_NONE`

to indicate we are not scaling site likelihoods. - 3. Destination scale read
- We provide the value
`BEAGLE_OP_NONE`

to indicate we are not scaling site likelihoods. - 4. Left child partials
- The index of the left child node. BeagleLib will use data if the left child is a tip, and only uses a partial if the left child is an internal node.
- 5. Left child transition matrix
- The index of the left child node. This tells BeagleLib which transition matrix to use to accommodate evolution along the edge leading to the left child.
- 6. Right child partials
- The index of the right child node. BeagleLib will use data if the right child is a tip, and only uses a partial if the right child is an internal node.
- 7. Right child transition matrix
- The index of the right child node. This tells BeagleLib which transition matrix to use to accommodate evolution along the edge leading to the right child.

The operations array is flat, meaning that if 3 operations are defined, the operations array will be a single-dimensional array having 21 elements. While we are traversing the tree, we create a vector of node numbers (stored in the data member `_pmatrix_index`

) and edge lengths (stored in the data member `_edge_lengths`

) as they are encountered for use in the `calcLogLikelihood`

function defined below.

template <class T> inline void Likelihood<T>::defineOperations(typename Tree<T>::SharedPtr t) { _operations.clear(); _pmatrix_index.clear(); _edge_lengths.clear(); BOOST_REVERSE_FOREACH(T * nd, t->_preorder) { if (!nd->_left_child) { // This is a leaf _pmatrix_index.push_back(nd->_number); _edge_lengths.push_back(nd->_edge_length); } else { // This is an internal node _pmatrix_index.push_back(nd->_number); _edge_lengths.push_back(nd->_edge_length); // Internal nodes have partials to be calculated, so define // an operation to compute the partials for this node // 1. destination partial to be calculated int partial = nd->_number; _operations.push_back(partial); // 2. destination scaling buffer index to write to _operations.push_back(BEAGLE_OP_NONE); // 3. destination scaling buffer index to read from _operations.push_back(BEAGLE_OP_NONE); // 4. left child partial index partial = nd->_left_child->_number; _operations.push_back(partial); // 5. left child transition matrix index int tmatrix = nd->_left_child->_number; _operations.push_back(tmatrix); // 6. right child partial index assert(nd->_left_child); assert(nd->_left_child->_right_sib); partial = nd->_left_child->_right_sib->_number; _operations.push_back(partial); // assumes binary tree // 7. right child transition matrix index tmatrix = nd->_left_child->_right_sib->_number; _operations.push_back(tmatrix); } } // if tree is unrooted and thus "rooted" at a leaf, need to // use the transition matrix associated with the leaf if (!t->_is_rooted) _pmatrix_index[_pmatrix_index.size()-1] = t->getRoot()->_number; }

**Important!** The `defineOperations`

member function body above accesses private data members of both the `Tree`

and `Node`

classes. To avoid compile-time errors, you will thus need to declare the `Likelihood`

class to be a friend of both `Tree`

and `Node`

. In both *node.hpp* and *tree.hpp*, you should uncomment the 2 lines below:

template<class T> class Likelihood; friend class Likelihood<T>;

## The updateTransitionMatrices member function

This function recalculates all transition matrices. In the `defineOperations`

member function, we created a vector (`_pmatrix_index`

) comprising the indices of all nodes that have a transition matrix needing to be recomputed, and we also created a vector `_edge_lengths`

containing the edge lengths corresponding to the nodes listed in `_pmatrix_index`

, so here all we need do is provide these lists to BeagleLib.

The first argument must be a valid BeagleLib instance, the second argument is the index of the eigen decomposition to use (0 in our case because there is only one), the third argument is the address of the first element of the array of node indices specifying the transition matrices to be computed, the fourth and fifth arguments are NULL because we are not concerned with calculating first and second deriviatives, the sixth argument is the address of the first element of the arrayof edge lengths, and the seventh and final argument is the number of transition probabilities to calculate.

template <class T> inline void Likelihood<T>::updateTransitionMatrices() { int code = beagleUpdateTransitionMatrices( _instance, // Instance number 0, // Index of eigen-decomposition buffer &_pmatrix_index[0], // transition probability matrices to update NULL, // first derivative matrices to update NULL, // second derivative matrices to update &_edge_lengths[0], // List of edge lengths (int)_pmatrix_index.size()); // Length of lists if (code != 0) throw XStrom(boost::str(boost::format("failed to update transition matrices. BeagleLib error code was %d (%s)") % code % _beagle_error[code])); }

## The calculatePartials member function

This function instructs BeagleLib to update all its partials using the information in the operations array. In the `defineOperations`

member function, we created a vector `_operations`

that stores all the information BeagleLib needs, so here we simply allow it to copy the `_operations`

vector.

The first argument must be a valid BeagleLib instance, the second argument is the address of the starting element of the `_operations`

vector, the third argument is the number of elements in the `_operations`

vector, and the final argument is the index of the scale buffer used to store accumulated scaling factors, but because we are not scaling site likelihoods currently, we supply the value `BEAGLE_OP_NONE`

.

template <class T> inline void Likelihood<T>::calculatePartials() { // Calculate or queue for calculation partials using a list of operations int totalOperations = (int)(_operations.size()/7); int code = beagleUpdatePartials( _instance, // Instance number (BeagleOperation *) &_operations[0], // BeagleOperation list specifying operations totalOperations, // Number of operations BEAGLE_OP_NONE); // Index number of scaleBuffer to store accumulated factors if (code != 0) throw XStrom(boost::str(boost::format("failed to update partials. BeagleLib error code was %d (%s)") % code % _beagle_error[code])); }

## The calcLogLikelihood member function

The final member function we need to define is the one that returns the log-likelihood for a tree. For now, we only consider unrooted trees, so the beginning of this function is concerned with ensuring that this is true. After that, the member functions `setJCRateMatrix`

, `setDiscreteGammaShape`

, `defineOperations`

, `updateTransitionMatrices`

and `calculatePartials`

are called: see the description of those functions above for an explanation of what they do. The only remaining BeagleLib function that needs to be called is `beagleCalculateEdgeLogLikelihoods`

. We have computed partials for the tree down to the single descendant (call this node the *parent*) of the tip node serving as the root (called the *child*) in this unrooted tree. There is thus one edge left that has not been accounted for, and that is the edge connecting the parent to the child.

The 13 arguments supplied to the `beagleCalculateEdgeLogLikelihoods`

function are:

- Instance
- This must be a valid BeagleLib instance.
- List of indices of parent partials buffers
- In our case, the list has only one element, which is the index of the parent node.
- List of indices of child partials buffers
- In our case, the list has only one element, which is the index of the child node.
- List of indices of transition probability matrices for this edge
- In our case, the list has only one element, which is the index of the child node.
- List of indices of first derivative matrices
- We do not need to calculate first derivatives, so NULL is supplied for this.
- List of indices of second derivative matrices
- We do not need to calculate second derivatives, so NULL is supplied for this.
- List of weights to apply to each partials buffer
- This list supplies the relative rates and probabilities, of which there is only one set in our case, so we supply the index 0 here.
- List of state frequencies for each partials buffer
- This list supplies the state frequencies, of which there is only one set in our case, so we supply the index 0 here.
- List of scale buffers containing accumulated factors to apply to each partials buffer
- Because we are not protecting site likelihoods from underflow yet, we supply
`BEAGLE_OP_NONE`

here. - Number of partials buffers
- This is the number of elements in each of the lists supplied above. In our case, the number of partials buffers is just 1.
- Destination for log-likelihood value
- This is the address of a double value in which will be stored the final log-likelihood.
- Destination for first derivative value
- This is the address of a double value in which will be stored the first derivative (we supply NULL because we do not need first derivatives to be calculated).
- Destination for second derivative value
- This is the address of a double value in which will be stored the second derivative (we supply NULL because we do not need second derivatives to be calculated).

template <class T> inline double Likelihood<T>::calcLogLikelihood(typename Tree<T>::SharedPtr t) { if (t->_is_rooted) throw XStrom("can only compute likelihoods for unrooted trees currently"); if (!_data) throw XStrom("must call setData before calcLogLikelihood"); initBeagleLib(); // this is a no-op if a valid instance already exists // Assuming "root" is leaf 0 assert(t->getRoot()->_number == 0 && t->getRoot()->_left_child == t->_preorder[0] && !t->_preorder[0]->_right_sib); // Assuming there are as many transition matrices as there are edge lengths assert(_pmatrix_index.size() == _edge_lengths.size()); setModelRateMatrix(); setDiscreteGammaShape(); defineOperations(t); updateTransitionMatrices(); calculatePartials(); // The beagleCalculateEdgeLogLikelihoods function integrates a list of partials // at a parent and child node with respect to a set of partials-weights and // state frequencies to return the log likelihood and first and second derivative sums int stateFrequencyIndex = 0; int categoryWeightsIndex = 0; int cumulativeScalingIndex = BEAGLE_OP_NONE; double log_likelihood = 0.0; // index_focal_child is the root node int index_focal_child = t->getRoot()->_number; // index_focal_parent is the only child of root node int index_focal_parent = t->_preorder[0]->_number; int code = beagleCalculateEdgeLogLikelihoods( _instance, // instance number &index_focal_parent, // indices of parent partialsBuffers &index_focal_child, // indices of child partialsBuffers &index_focal_child, // transition probability matrices for this edge NULL, // first derivative matrices NULL, // second derivative matrices &categoryWeightsIndex, // weights to apply to each partialsBuffer &stateFrequencyIndex, // state frequencies for each partialsBuffer &cumulativeScalingIndex, // scaleBuffers containing accumulated factors 1, // Number of partialsBuffer &log_likelihood, // destination for log likelihood NULL, // destination for first derivative NULL); // destination for second derivative if (code != 0) throw XStrom(boost::str(boost::format("failed to calculate edge logLikelihoods in CalcLogLikelihood. BeagleLib error code was %d (%s)") % code % _beagle_error[code])); return log_likelihood; }