Before we add rescaling, let me prove to you that our code as it now stands will not work on a large tree. Download the file *rbcl738.zip* and move the *rbcl738.nex* and *rbcl738nj.tre* files within it to the same directory where you have *rbcL.nex*.

## Modify your *main.cpp* file

To compute the log-likelihood of this 738 taxon tree, modify your *main.cpp* as follows:

#include <iostream> #include "node.hpp" #include "tree_summary.hpp" #include "likelihood.hpp" using namespace strom; int main(int argc, const char * argv[]) { std::cout << "Starting..." << std::endl; try { // Read and store data Data::SharedPtr d(new Data()); d->getDataFromFile("rbcl738.nex"); // Create a substitution model GTRModel::SharedPtr gtr = GTRModel::SharedPtr(new GTRModel()); gtr->setExchangeabilitiesAndStateFreqs( {0.08394222, 0.34116704, 0.03603322, 0.15737940, 0.30297095, 0.07850717}, {0.309769, 0.163380, 0.121023, 0.405828}); gtr->setGammaShape(0.517281); gtr->setGammaNCateg(4); std::cout << gtr->describeModel() << std::endl; // Create a likelihood object that will compute log-likelihoods Likelihood<Node>::SharedPtr likelihood(new Likelihood<Node>()); likelihood->setData(d); likelihood->setModel(gtr); // Read in a tree TreeSummary<Node>::SharedPtr tree_summary(new TreeSummary<Node>()); tree_summary->readTreefile("rbcl738nj.tre", 0); Tree<Node>::SharedPtr tree = tree_summary->getTree(0); // Calculate the log-likelihood for the tree double lnL = likelihood->calcLogLikelihood(tree); std::cout << boost::str(boost::format("log likelihood = %.5f") % lnL) << std::endl; std::cout << " (expecting -144730.8)" << std::endl; } catch (XStrom x) { std::cerr << "Strom encountered a problem:\n " << x.what() << std::endl; } std::cout << "\nFinished!" << std::endl; return 0; }

## Run your software

Execute your software within the debugger. You should find that the log likelihood computed is shown as `-inf`

(negative infinity), which means the likelihood is zero, which in turn means that underflow has occurred in at least some of the site likelihood calculations. The next page will show you how to set up BeagleLib to rescale the likelihood so that this underflow does not happen.

## Read this if you are interested in checking the likelihood in PAUP*…

There is a file named *paup.nex* bundled with *rbcl738.nex* that can be used to verify the log-likelihood in the program PAUP*. If you look at the `format`

command in the *rbcl738.nex* file, you will notice this `equate`

statement:

equate="S=? Y=? R=? M=? W=? K=? V=? D=? B=?"

The sequences in this file contain several ambiguity codes. The full set of ambiguity codes for DNA sequence data is:

R,r ==> {AG} Y,y ==> {CT} M,m ==> {AC} K,k ==> {GT} S,s ==> {CG} W,w ==> {AT} H,h ==> {ACT} B,b ==> {CGT} V,v ==> {ACG} D,d ==> {AGT} N,n ==> {ACGT}

PAUP* recognizes all of these ambiguity codes and handles them accordingly, but BeagleLib treats any ambiguity as complete ambiguity, so the equate statement has been added to the file to effectively turn all of the ambiguity codes present in the file into ?, which represents complete ambiguity and is equivalent to N. That will allow our program to obtain the same log-likelihood as PAUP*.