Build a tree from a Newick description

We not only need the ability to create descriptions of existing trees, we need to build trees from descriptions. In this section of the tutorial, we will ultimately create the function buildFromNewick, but first we need to create several helper functions that will be called from within buildFromNewick. For some of the functions, I depend on comments in the code itself to explain what is happening. So, if it feels that I’m not explaining things adequately, see if you can understand how the function works by reading the comments (lines starting with //) I’ve placed in the function body itself.

Add declarations to TreeManip class

Start by adding declarations for all the methods we will need to the TreeManip class declaration in the file tree_manip.hpp. New additions are in bold, blue text:

class TreeManip
{
public:
                                TreeManip();
                                TreeManip(Tree::SharedPtr t);
                                ~TreeManip();

    void                        setTree(Tree::SharedPtr t);
    Tree::TreeShPtr             getTree();
    void                        createTestTree();
    std::string                 makeNewick(unsigned precision) const;
    void                        buildFromNewick(const std::string newick, bool rooted = false, bool allow_polytomies);
    void                        rerootAt(int node_index);
    int                         findNodeNumber(int node_number);
    void                        clear();

private:

    void                        refreshPreorder(Node * root_node);
    void                        rerootHelper(Node * m, Node * t);
    void                        extractNodeNumberFromName(Node * nd, std::set<unsigned> & used);
    void                        extractEdgeLen(Node * nd, std::string edge_length_string);
    unsigned                    countNewickLeaves(const std::string newick);
    void                        stripOutNexusComments(std::string & newick);
    bool                        canHaveSibling(Node * nd, bool rooted, bool allow_polytomies);

    Tree::TreeShPtr             _tree;


public:
    typedef std::shared_ptr< TreeManip > SharedPtr;
};

You will also need to add a couple of header files to the top of the file:

#include 
#include 

The extractNodeNumberFromName member function

The program we are creating will only ever read newick tree descriptions in which taxa are represented by positive integers (normally representing their position in the data matrix, with the first taxon being 1). This private function converts the node name (a string representation of an integer greater than zero) into an unsigned integer. If the node name cannot be converted to a positive integer, an XStrom exception is thrown.

inline void TreeManip::extractNodeNumberFromName(Node * nd, std::set<unsigned> & used)
    {
    assert(nd);
    bool success = true;
    unsigned x = 0;
    try
        {
        x = std::stoi(nd->_name);
        }
    catch(std::invalid_argument &)
        {
        // node name could not be converted to an integer value
        success = false;
        }

    if (success)
        {
        // conversion succeeded
        // attempt to insert x into the set of node numbers altready used
        std::pair<std::set<unsigned>::iterator, bool> insert_result = used.insert(x);
        if (insert_result.second)
            {
            // insertion was made, so x has NOT already been used
            nd->_number = x - 1;
            }
        else
            {
            // insertion was not made, so set already contained x
            throw XStrom(boost::str(boost::format("leaf number %d used more than once") % x));
            }
        }
    else
        throw XStrom(boost::str(boost::format("node name (%s) not interpretable as a positive integer") % nd->_name));
    }

The function std::stoi is used to do the conversion of string to integer. The std::stoi function throws a invalid_argument exception if the string passed to it cannot be converted to an integer. This allows us to give the user an accurate error message if we ever encounter a node whose name is not integer-like.

This function also adds tip node numbers to the set used. This allows us to provide important error information to the user if the same tip number is used more than once in the newick string. The more sanity checks you can put into your code, the less frustrated your users will be when they use your software. And remember, you yourself are probably going to be the most frequent user of your programs!

The extractEdgeLen member function

This private function serves to convert a string representation of an edge length (taken from the newick tree description string) into an actual edge length (i.e. floating point value).

The function std::stof is used to do the conversion of string to double. The std::stof function throws a invalid_argument exception if the string passed to it cannot be converted to a floating point number. This allows us to give the user an accurate error message if we ever encounter a string that cannot be converted to a valid edge length.

inline void TreeManip::extractEdgeLen(Node * nd, std::string edge_length_string)
    {
    assert(nd);
    bool success = true;
    double d = 0.0;
    try
        {
        d = std::stof(edge_length_string);
        }
    catch(std::invalid_argument &)
        {
        // edge_length_string could not be converted to a double value
        success = false;
        }

    if (success)
        {
        // conversion succeeded
        nd->_edge_length = (d < 0.0 ? 0.0 : d);
        }
    else
        throw XStrom(boost::str(boost::format("%s is not interpretable as an edge length") % edge_length_string));

    }

The countNewickLeaves member function

This private function counts the number of leaves (i.e. tip nodes) in the tree defined by a newick string. This makes use of the standard regular expression (regex) library, which comes with C++11. The regular expression pattern (taxonexpr) identifies leaf nodes as integer values preceded by either a comma or a left parenthesis and followed by a comma, right parenthesis, or colon. It also allows non-integer taxon names, and even taxon names containing embedded spaces if they are surrounded by single quotes, even though such newick tree descriptions will ultimately be rejected (see section above on the extractNodeNumberFromName function).

The way this function works is concise but admittedly rather opaque. If you are not familiar with how regular expression searches work, you probably should spend some time reading about them before trying to understand this function. The variable m1 iterates through all taxon names in the supplied newick string, while m2 is never initialized and thus represents the value that m1 would take if it was iterated past the end of the string. The function std::distance(m1, m2) counts the number of times m1 needs to be incremented until it equals m2, which is the number of taxa in the newick string.

inline unsigned TreeManip::countNewickLeaves(const std::string newick)
    {
    std::regex taxonexpr("[(,]\\s*(\\d+|\\S+?|['].+?['])\\s*(?=[,):])");
    std::sregex_iterator m1(newick.begin(), newick.end(), taxonexpr);
    std::sregex_iterator m2;
    return (unsigned)std::distance(m1, m2);
    }

The stripOutNexusComments member function

This private function strips out comments from the newick string. Many programs place special comments inside newick strings that only they understand (e.g. FigTree stores color information about nodes in such comments). Comments in newick strings are delimited by square brackets, so this function uses the C++11 standard library’s regex_replace function to replace all such comments with the empty string. Note that the ampersand (&) in the parameter list means that the newick string is supplied as a reference. As a result, it is modified in place, which explains why this function does not return the modified string.

inline void TreeManip::stripOutNexusComments(std::string & newick)
    {
    std::regex commentexpr("\\[.*?\\]");
    newick = std::regex_replace(newick, commentexpr, std::string(""));
    }

The refreshPreorder member function

This private function must be called whenever the structure of the tree changes. It rebuilds the _preorder vector by visiting nodes in preorder sequence and storing pointers to these nodes as it goes.

inline void TreeManip::refreshPreorder(Node * root_node)
    {
    // Create vector of node pointers in preorder sequence
    _tree->_preorder.clear();
    _tree->_preorder.reserve(_tree->_nodes.size() - 1); // _preorder does not include root node

    if (!root_node)
        return;
        
    Node * first_preorder = root_node->_left_child;
        
    // sanity check: first preorder node should be the only child of the root node
    assert(first_preorder->_right_sib == 0);
        
    Node * nd = first_preorder;
    _tree->_preorder.push_back(nd);
    
    while (true)
        {
        if (!nd->_left_child && !nd->_right_sib)
            {
            // nd has no children and no siblings, so next preorder is the right sibling of 
            // the first ancestral node that has a right sibling.
            Node * anc = nd->_parent;
            while (anc && !anc->_right_sib)
                anc = anc->_parent;
            if (anc)
                {
                // We found an ancestor with a right sibling
                _tree->_preorder.push_back(anc->_right_sib);
                nd = anc->_right_sib;
                }
            else
                {
                // nd is last preorder node in the tree
                break;
                }
            }
        else if (nd->_right_sib && !nd->_left_child)
            {
            // nd has no children (it is a tip), but does have a sibling on its right
            _tree->_preorder.push_back(nd->_right_sib);
            nd = nd->_right_sib;
            }
        else if (nd->_left_child && !nd->_right_sib)
            {
            // nd has children (it is an internal node) but no siblings on its right
            _tree->_preorder.push_back(nd->_left_child);
            nd = nd->_left_child;
            }
        else
            {
            // nd has both children and siblings on its right
            _tree->_preorder.push_back(nd->_left_child);
            nd = nd->_left_child;
            }
            
        }   // end while loop
        
    // renumber internal nodes in postorder sequence
    int curr_internal = _tree->_nleaves;
    for (auto nd : boost::adaptors::reverse(_tree->_preorder))
        {
        if (nd->_left_child)
            {
            // nd is an internal node
            
            // node numbers for internal nodes are negative integers
            nd->_number = curr_internal++;
            }
        }
            
    if (_tree->_is_rooted)
        root_node->_number = curr_internal;
    }

The function first removes all elements from the tree’s _preorder vector and allocates enough memory to store pointers to all nodes except the root. It then returns if the data member _root does not point to a node because this can only mean that no tree has yet been built.

The first_preorder variable points to the only child of the root node. This will be the first node pointer added to the _preorder vector. The main while loop is then set to cycle forever (because the constant true is always true!), and only exits when the break keyword is encountered, which only happens after the last node in preorder sequence is visited. The comments inside the while(true) loop explain the decisions needed to determine which node should next be added to _preorder.

The function concludes with a loop that visits each node in postorder sequence. The postorder sequence is simply the reverse of the preorder sequence, which explains the need for boost::adaptors::reverse, which causes the _preorder vector to be traversed in reverse order. The internal nodes are numbered sequentially starting with the number of leaves (_nleaves). This means that no internal node can have a _number equal to any tip node because tip nodes have node numbers ranging from 0 to _nleaves-1. Numbering internals in postorder sequence means that we can use the last number for the root node if the tree happens to be rooted. If the tree is unrooted, remember that the root node is actually a leaf (tip) and thus has a number less than _nleaves.

Why do we even need to number the internal nodes? Later we will use the _number data member of each node as an index into arrays used by the BeagleLib library to compute likelihoods. Thus, it is important that all nodes be numbered and that the numbers are sequential with no gaps.

The canHaveSibling member function

This function is used to check for polytomies (internal nodes with degree > 3). The function first asserts that nd points to something and that the caller is not trying to add a sibling to the root node, which should never have a sibling. If allow_polytomies is true, then canHaveSibling returns true immediately because there is no need to guard against polytomies in this case. Assuming allow_polytomies is false, canHaveSibling returns true if and only if adding a right sibling to nd is allowed (i.e. would not generate a polytomy).

inline bool TreeManip::canHaveSibling(Node * nd, bool rooted, bool allow_polytomies)
    {
    assert(nd);
    if (!nd->_parent)
        {
        // trying to give root node a sibling
        return false;
        }

    if (allow_polytomies)
        return true;

    bool nd_can_have_sibling = true;
    if (nd != nd->_parent->_left_child)
        {
        if (nd->_parent->_parent)
            {
            // trying to give a sibling to a sibling of nd, and nd's parent is not the root
            nd_can_have_sibling = false;
            }
        else
            {
            if (rooted)
                {
                // root node has exactly 2 children in rooted trees
                nd_can_have_sibling = false;
                }
            else if (nd != nd->_parent->_left_child->_right_sib)
                {
                // trying to give root node more than 3 children
                nd_can_have_sibling = false;
                }
            }
        }

    return nd_can_have_sibling;

The rerootAt member function

This private member function reroots the tree at the node whose number is supplied, and is intended to be called only in the case of unrooted trees in order to specify the tip node that should serve as the root. The specified node is assumed to exist in the tree and is assumed to be a leaf (it throws an XStrom exception if either of these assumptions turn out to be false). The way this function works is to move the target node (the one having _number equal to node_number) down the tree one step at a time until it is at the root position. At each step, the nodes below it are moved up to maintain the branching structure of the tree.

This is best explained using a figure. In the figure below, node 4 (labeled nd) is to be the new root node, and the nodes t and m are indicated at each step as they are defined at the moment the rerootHelper function is called. Read the figure starting at the upper left panel and read left to right across rows. Each row does exactly the same thing, but with a different assignment of t and m. There are three steps (three rows) because the new root node starts out three segments away from its destination at the root of the tree. You can download a PDF version of this figure if the one below is too hard to read.
Figure illustrating algorithm for rerooting a tree at a tip

inline void TreeManip::rerootAt(int node_number)
    {
    // Locate node having _number equal to node_number
    Node * nd = 0;
    for (auto & curr : _tree->_nodes)
        {
        if (curr._number == node_number)
            {
            nd = &curr;
            break;
            }
        }
    if (!nd)
        throw XStrom(boost::str(boost::format("no node found with number equal to %d") % node_number));
        
    if (nd->_left_child)
        throw XStrom(boost::str(boost::format("cannot currently root trees at internal nodes (e.g. node %d)") % nd->_number));
    
    Node * t = nd;
    Node * m = nd->_parent;
    while (nd->_parent)
        {
        // Begin by swapping the mover's edge length with nd's edge length
        double tmp_edgelen = m->_edge_length;
        m->_edge_length = nd->_edge_length;
        nd->_edge_length = tmp_edgelen;

        // Make the "mover" node m (along with all of its children except nd) the rightmost child of the "target" node t
        rerootHelper(m, t);

        // The next target node is always the previous mover, and the next mover node is always nd's parent
        t = m;
        m = nd->_parent;
        }
    _tree->_root = nd;

The rerootHelper member function

This private member function simplifies the rerootAt member function described above by handling the node rearrangements needed to move the new root node down one step. See the figure in the description of rerootAt for an illustration of what this function does. At each step, there is a target node (t) and a mover node (m). Node m is moved (along with all of its children except for the child that is along the path to t) up the tree so that it becomes the rightmost child of t. Afterwards, t is assigned to m, and m is assigned to be the parent of the new root node. This function ends up being rather tedious because it has to handle all possible relative positions of m and t.

inline void TreeManip::rerootHelper(Node * m, Node * t)
    {
    assert(m);
    assert(t);
    
    // Save nodes to which m attaches
    Node * m_left_child    = m->_left_child;
    Node * m_right_sib     = m->_right_sib;
    Node * m_parent        = m->_parent;

    // Starting with t, walk down tree to identify x, the child of m that is on the path from m to t
    Node * x = t;
    while (x->_parent != m)
        {
        x = x->_parent;
        assert(x);
        }
    Node * x_right_sib = x->_right_sib;

    // Identify x_left_sib, the immediate left sibling of x (will be NULL if x is _left_child of m)
    Node * x_left_sib = 0;
    if (x != m_left_child)
        {
        x_left_sib = m_left_child;
        while (x_left_sib->_right_sib != x)
            {
            x_left_sib = x_left_sib->_right_sib;
            assert(x_left_sib);
            }
        }

    // identify m_left_sib, the immediate left sibling of m (will be NULL if m is root node or is _left_child of its parent)
    Node * m_left_sib = 0;
    if (m_parent && m != m_parent->_left_child)
        {
        m_left_sib = m_parent->_left_child;
        while (m_left_sib->_right_sib != m)
            {
            m_left_sib = m_left_sib->_right_sib;
            assert(m_left_sib);
            }
        }

    // Put x where m is now
    if (!m_parent)
        {
        // m is the root node
        assert(!m_right_sib);
        assert(!m_left_sib);
        x->_right_sib = 0;
        x->_parent = 0;
        if (x == m_left_child)
            m->_left_child = x_right_sib;
        else
            x_left_sib->_right_sib = x_right_sib;
        }
    else if (m == m_parent->_left_child)
        {
        // m is leftmost child of its parent
        x->_right_sib = m_right_sib;
        x->_parent = m_parent;
        m->_right_sib = 0;
        m->_parent = 0;
        m_parent->_left_child = x;
        if (x == m_left_child)
            m->_left_child = x_right_sib;
        else
            x_left_sib->_right_sib = x_right_sib;
        }
    else
        {
        // m is not leftmost child of its parent
        m_left_sib->_right_sib = x;
        x->_right_sib = m_right_sib;
        x->_parent = m_parent;
        m->_right_sib = 0;
        m->_parent = 0;
        if (x == m_left_child)
            m->_left_child = x_right_sib;
        else
            x_left_sib->_right_sib = x_right_sib;
        }
    
    // Make m the new rightmost child of t
    m->_parent = t;
    if (!t->_left_child)
        t->_left_child = m;
    else
        {
        // Find rightmost child of t
        m_left_sib = t->_left_child;
        while (m_left_sib->_right_sib)
            m_left_sib = m_left_sib->_right_sib;

        // Make rightmost child of t the left sib of m
        m_left_sib->_right_sib = m;
        }
    }

The buildFromNewick member function

Now that all the helper functions are in place, it is time to define the body of the buildFromNewick function.

inline void TreeManip::buildFromNewick(const std::string newick, bool rooted, bool allow_polytomies)
    {
    clear();
    _tree.reset(new Tree());
    _tree->_is_rooted = rooted;
    
    std::set<unsigned> used; // used to ensure that two tips do not have the same number
    unsigned curr_leaf = 0;

    std::string commentless_newick = newick;
    stripOutNexusComments(commentless_newick);

    _tree->_nleaves = countNewickLeaves(commentless_newick);
    if (_tree->_nleaves == 0)
        throw XStrom("Expecting newick tree description to have at least 4 leaves");
    unsigned max_nodes = 2*_tree->_nleaves - (_tree->_is_rooted ? 0 : 2);
    _tree->_nodes.resize(max_nodes);

    // Assign all nodes a default node number that is negative to make it easy to tell if we've not set it
    // (leaves will replace this number with the number equivalent of their name, internal nodes will replace
    // this number with a number higher than any leaf)
    for (unsigned i = 0; i < max_nodes; ++i)
        _tree->_nodes[i]._number = -1;
    
    // This will point to the first tip node encountered so that we can reroot at this node before returning
    Node * first_tip = 0;

    unsigned num_edge_lengths = 0;
    unsigned curr_node_index = 0;

    try
        {
        // Root node
        Node * nd = &_tree->_nodes[curr_node_index];
        _tree->_root = nd;

        if (_tree->_is_rooted)
            {
            nd = &_tree->_nodes[++curr_node_index];
            nd->_parent = &_tree->_nodes[curr_node_index - 1];
            nd->_parent->_left_child = nd;
            }

        // Some flags to keep track of what we did last
        enum {
            Prev_Tok_LParen		= 0x01,	// previous token was a left parenthesis ('(') 
            Prev_Tok_RParen		= 0x02,	// previous token was a right parenthesis (')') 
            Prev_Tok_Colon		= 0x04,	// previous token was a colon (':') 
            Prev_Tok_Comma		= 0x08,	// previous token was a comma (',') 
            Prev_Tok_Name		= 0x10,	// previous token was a node name (e.g. '2', 'P._articulata')
            Prev_Tok_EdgeLen	= 0x20	// previous token was an edge length (e.g. '0.1', '1.7e-3') 
            };
        unsigned previous = Prev_Tok_LParen;

        // Some useful flag combinations 
        unsigned LParen_Valid = (Prev_Tok_LParen | Prev_Tok_Comma);
        unsigned RParen_Valid = (Prev_Tok_RParen | Prev_Tok_Name | Prev_Tok_EdgeLen);
        unsigned Comma_Valid  = (Prev_Tok_RParen | Prev_Tok_Name | Prev_Tok_EdgeLen);
        unsigned Colon_Valid  = (Prev_Tok_RParen | Prev_Tok_Name);
        unsigned Name_Valid   = (Prev_Tok_RParen | Prev_Tok_LParen | Prev_Tok_Comma);

        // Set to true while reading an edge length
        bool inside_edge_length = false;
        std::string edge_length_str;
        unsigned edge_length_position = 0;

        // Set to true while reading a node name surrounded by (single) quotes
        bool inside_quoted_name = false;

        // Set to true while reading a node name not surrounded by (single) quotes
        bool inside_unquoted_name = false;

        // Set to start of each node name and used in case of error
        unsigned node_name_position = 0;

        // loop through the characters in newick, building up tree as we go
        unsigned position_in_string = 0;
        for (auto ch : commentless_newick)
            {
            position_in_string++;

            if (inside_quoted_name)
                {
                if (ch == '\'')
                    {
                    inside_quoted_name = false;
                    node_name_position = 0;
                    if (!nd->_left_child)
                        {
                        extractNodeNumberFromName(nd, used);
                        curr_leaf++;
                        if (!first_tip)
                            first_tip = nd;
                        }
                    previous = Prev_Tok_Name;
                    }
                else if (iswspace(ch))
                    nd->_name += ' ';
                else
                    nd->_name += ch;

                continue;
                }
            else if (inside_unquoted_name)
                {
                if (ch == '(')
                    throw XStrom(boost::str(boost::format("Unexpected left parenthesis inside node name at position %d in tree description") % node_name_position));

                if (iswspace(ch) || ch == ':' || ch == ',' || ch == ')')
                    {
                    inside_unquoted_name = false;

                    // Expect node name only after a left paren (child's name), a comma (sib's name) or a right paren (parent's name)
                    if (!(previous & Name_Valid))
                        throw XStrom(boost::str(boost::format("Unexpected node name (%s) at position %d in tree description") % nd->_name % node_name_position));

                    if (!nd->_left_child)
                        {
                        extractNodeNumberFromName(nd, used);
                        curr_leaf++;
                        if (!first_tip)
                            first_tip = nd;
                        }

                    previous = Prev_Tok_Name;
                    }
                else
                    {
                    nd->_name += ch;
                    continue;
                    }
                }
            else if (inside_edge_length)
                {
                if (ch == ',' || ch == ')' || iswspace(ch))
                    {
                    inside_edge_length = false;
                    edge_length_position = 0;
                    extractEdgeLen(nd, edge_length_str);
                    ++num_edge_lengths;
                    previous = Prev_Tok_EdgeLen;
                    }
                else
                    {
                    bool valid = (ch =='e' || ch == 'E' || ch =='.' || ch == '-' || ch == '+' || isdigit(ch));
                    if (!valid)
                        throw XStrom(boost::str(boost::format("Invalid branch length character (%c) at position %d in tree description") % ch % position_in_string));
                    edge_length_str += ch;
                    continue;
                    }
                }

            if (iswspace(ch))
                continue;

            switch(ch)
                {
                case ';':
                    break;

                case ')':
                    // If nd is bottommost node, expecting left paren or semicolon, but not right paren
                    if (!nd->_parent)
                        throw XStrom(boost::str(boost::format("Too many right parentheses at position %d in tree description") % position_in_string));

                    // Expect right paren only after an edge length, a node name, or another right paren
                    if (!(previous & RParen_Valid))
                        throw XStrom(boost::str(boost::format("Unexpected right parenthesisat position %d in tree description") % position_in_string));

                    // Go down a level
                    nd = nd->_parent;
                    if (!nd->_left_child->_right_sib)
                        throw XStrom(boost::str(boost::format("Internal node has only one child at position %d in tree description") % position_in_string));
                    previous = Prev_Tok_RParen;
                    break;

                case ':':
                    // Expect colon only after a node name or another right paren
                    if (!(previous & Colon_Valid))
                        throw XStrom(boost::str(boost::format("Unexpected colon at position %d in tree description") % position_in_string));
                    previous = Prev_Tok_Colon;
                    break;

                case ',':
                    // Expect comma only after an edge length, a node name, or a right paren
                    if (!nd->_parent || !(previous & Comma_Valid))
                        throw XStrom(boost::str(boost::format("Unexpected comma at position %d in tree description") % position_in_string));

                    // Check for polytomies
                    if (!canHaveSibling(nd, rooted, allow_polytomies))
                        throw XStrom(boost::str(boost::format("Polytomy found in the following tree description but polytomies prohibited:\n%s") % newick));

                    // Create the sibling
                    curr_node_index++;
                    if (curr_node_index == _tree->_nodes.size())
                        throw XStrom(boost::str(boost::format("Too many nodes specified by tree description (%d nodes allocated for %d leaves)") % _tree->_nodes.size() % _tree->_nleaves));
                    nd->_right_sib = &_tree->_nodes[curr_node_index];
                    nd->_right_sib->_parent = nd->_parent;
                    nd = nd->_right_sib;
                    previous = Prev_Tok_Comma;
                    break;

                case '(':
                    // Expect left paren only after a comma or another left paren
                    if (!(previous & LParen_Valid))
                        throw XStrom(boost::str(boost::format("Not expecting left parenthesis at position %d in tree description") % position_in_string));

                    // Create new node above and to the left of the current node
                    assert(!nd->_left_child);
                    curr_node_index++;
                    if (curr_node_index == _tree->_nodes.size())
                        throw XStrom(boost::str(boost::format("malformed tree description (more than %d nodes specified)") % _tree->_nodes.size()));
                    nd->_left_child = &_tree->_nodes[curr_node_index];
                    nd->_left_child->_parent = nd;
                    nd = nd->_left_child;
                    previous = Prev_Tok_LParen;
                    break;

                case '\'':
                    // Encountered an apostrophe, which always indicates the start of a
                    // node name (but note that node names do not have to be quoted)

                    // Expect node name only after a left paren (child's name), a comma (sib's name)
                    // or a right paren (parent's name)
                    if (!(previous & Name_Valid))
                        throw XStrom(boost::str(boost::format("Not expecting node name at position %d in tree description") % position_in_string));

                    // Get the rest of the name
                    nd->_name.clear();

                    inside_quoted_name = true;
                    node_name_position = position_in_string;

                    break;

                default:
                    // Get here if ch is not one of ();:,'

                    // Expecting either an edge length or an unquoted node name
                    if (previous == Prev_Tok_Colon)
                        {
                        // Edge length expected (e.g. "235", "0.12345", "1.7e-3")
                        inside_edge_length = true;
                        edge_length_position = position_in_string;
                        edge_length_str = ch;
                        }
                    else
                        {
                        // Get the node name
                        nd->_name = ch;

                        inside_unquoted_name = true;
                        node_name_position = position_in_string;
                        }

                }   // end of switch statement
            }   // loop over characters in newick string

        if (inside_unquoted_name)
            throw XStrom(boost::str(boost::format("Tree description ended before end of node name starting at position %d was found") % node_name_position));
        if (inside_edge_length)
            throw XStrom(boost::str(boost::format("Tree description ended before end of edge length starting at position %d was found") % edge_length_position));
        if (inside_quoted_name)
            throw XStrom(boost::str(boost::format("Expecting single quote to mark the end of node name at position %d in tree description") % node_name_position));

        if (!_tree->_is_rooted)
            {
            // Root at leaf whose _number = 0
            rerootAt(0);
            }

        refreshPreorder();
        }
    catch(XStrom x)
        {
        clear();
        throw x;
        }

    }