Adding splits to Node and TreeManip

Before we can do anything interesting with our Split class, we need to add a Split object to the Node class and a storeSplits member function to the TreeManip class.

Adding a Split to the Node class

Add a Split data member to the Node class by uncommenting the line that includes the split.hpp header, which is located at the top of the node.hpp file:

#include "split.hpp"

Also uncomment the line that declares a data member of type Split just below _edge_length inside the Node class declaration:

Split _split;

Finally, uncomment the getSplit accessor member function:

Split getSplit() {return _split;}

Adding the storeSplits member function to the TreeManip class

Add the following #include line at the top of the tree_manip.hpp file.

#include <boost/foreach.hpp> 

Add the following member function prototype to the TreeManip class declaration (just below buildFromNewick member function prototype).

void storeSplits(std::set<Split> & splitset);

Now add the member function body to the end of the tree_manip.hpp file (but before the right curly bracket that terminates the namespace scope).

template <class T>
inline void TreeManip<T>::storeSplits(std::set<Split> & splitset)
    {
    // Start by clearing and resizing all splits
    BOOST_FOREACH(T & nd, _tree->_nodes)
        {
        nd._split.resize(_tree->_nleaves);
        }

    // Now do a postorder traversal and add the bit corresponding
    // to the current node in its parent node's split
    BOOST_REVERSE_FOREACH(T * nd, _tree->_preorder)
        {
        if (nd->_left_child)
            {
            // add this internal node's split to splitset
            splitset.insert(nd->_split);
            }
        else
            {
            // set bit corresponding to this leaf node's number
            nd->_split.setBitAt(nd->_number);
            }

        if (nd->_parent)
            {
            // parent's bits are the union of the bits set in all its children
            nd->_parent->_split.addSplit(nd->_split);
            }  
        }
    }

The first BOOST_FOREACH loop visits each node in the tree (i.e. every element of the _nodes vector) and resizes that node’s _split to ensure that all splits have just enough elements in their _bits vector to store any possible split defined on _nleaves taxa.

The BOOST_FOREACH macro is optional and does not do anything magical; it merely simplifies looping over the elements of a vector. Without BOOST_FOREACH, this loop would be slightly harder to comprehend at a glance:

(note: do not add this to your code! It is for illustration only.)
for (std::vector<T>::iterator it = _nodes.begin(); it != _nodes.end; ++it)
    {
    it->_split.resize(_tree->_nleaves);
    }

The second loop involving BOOST_REVERSE_FOREACH conducts a postorder traversal (i.e. all descendants are visited before the node representing their most recent common ancestor is visited). Because of the resize call we just did for every node, it is safe to assume that all splits have zeros in every bit. When visiting a leaf node (i.e. nd->_left_child = 0), this function sets 1 bit in its split corresponding to that leaf’s number. If the leaf node has _number = 0, then the first bit will be set in its _split. Note that for any node having a _parent (which is all nodes except the root node), the bits in the node being visited are added to the bits already set in the _parent node. Thus, by the time all descendants of an internal node have been visited, the _split for that internal node is already completely specified. When an internal node is visited, all that need be done is to add its _split to the growing splitset.

The parameter splitset is a reference to a set of Split objects. A reference in C++ represents an alias to an existing variable. In this case, the std::set<Split> variable aliased must be created before this function is called, and this function builds up the set as it walks through the tree. When the function exists, the external std::set<Split> variable will contain the same number of splits as there are internal nodes in the tree, and will serve as a tree ID: a unique bar code that identifies the tree topology and allows other trees with the same topology (but probably different edge lengths) to be identified.