University of Connecticut University of UC Title Fallback Connecticut

Adding splits to Node and TreeManip

Before we can do anything interesting with our Split class, we need to add a Split object to the Node class and a storeSplits member function to the TreeManip class.

Adding a Split to the Node class

Add a Split data member to the Node class by uncommenting the line that includes the split.hpp header, which is located at the top of the node.hpp file:

#include "split.hpp"

Also uncomment the line that declares a data member of type Split just below _edge_length inside the Node class declaration:

Split _split;

Finally, uncomment the getSplit accessor member function:

Split getSplit() {return _split;}

Adding the storeSplits member function to the TreeManip class

Add the following member function prototype to the TreeManip class declaration (just below buildFromNewick member function prototype).

void storeSplits(std::set<Split> & splitset);

Now add the member function body to the end of the tree_manip.hpp file (but before the right curly bracket that terminates the namespace scope).

inline void TreeManip::storeSplits(std::set<Split> & splitset)
    {
    // Start by clearing and resizing all splits
    for (auto & nd : _tree->_nodes)
        {
        nd._split.resize(_tree->_nleaves);
        }

    // Now do a postorder traversal and add the bit corresponding
    // to the current node in its parent node's split
    for (auto nd : boost::adaptors::reverse(_tree->_preorder))
        {
        if (nd->_left_child)
            {
            // add this internal node's split to splitset
            splitset.insert(nd->_split);
            }
        else
            {
            // set bit corresponding to this leaf node's number
            nd->_split.setBitAt(nd->_number);
            }

        if (nd->_parent)
            {
            // parent's bits are the union of the bits set in all its children
            nd->_parent->_split.addSplit(nd->_split);
            }  
        }
    }

The first for loop visits each node in the tree (i.e. every element of the _nodes vector) and resizes that node’s _split to ensure that all splits have just enough elements in their _bits vector to store any possible split defined on _nleaves taxa.

The second loop (which involves the use of boost::adaptors::reverse to reverse the order in which nodes in the _preorder vector are visited) conducts a postorder traversal (i.e. all descendants are visited before the node representing their most recent common ancestor is visited). Because of the resize call we just did for every node, it is safe to assume that all splits have zeros in every bit. When visiting a leaf node (i.e. nd->_left_child = 0), this function sets the bit in its split corresponding to that leaf’s taxon number. If the leaf node has _number = 0, then the first (i.e. 0th) bit will be set in its _split.

Note that for any node having a _parent (which is all nodes except the root node), the bits in the node being visited are added to the bits already set in the _parent node. Thus, by the time all descendants of an internal node have been visited, the _split for that internal node is already completely specified. When an internal node is visited, all that needs to be done is to add its _split to the growing splitset and, of course, add the bits defined in its _split to its own parent.

The parameter splitset is a reference to a std::set of Split objects. A reference in C++ represents an alias to an existing variable. In this case, the std::set<Split> variable aliased must be created before this function is called, and this function builds up the set as it walks through the tree. Why use a std::set and not a std::vector here? Different trees may have identical splits, but they may not be in the same order depending on how each edge is rotated. If splits are added to a set, then the order in which they are encountered does not matter: trees with the same set of splits will have identical split sets. When the function exits, the external std::set<Split> variable will contain the same number of splits as there are internal nodes in the tree, and will serve as a tree ID: a unique bar code that identifies the tree topology and allows other trees with the same topology (but probably different edge lengths) to be identified.