simoneb's blog

about design, development, profession and avoidance of fluff

Labeling Trees - Introduction

A few years ago I came across a video interview on Channel 9 starring Brian Beckman on a topic which I had never heard of before. The interview is still available online (part 1 and part 2) but you don’t need to watch it to follow me as I’ll start from the beginning and guide you gently through the whole topic.

Although I then considered myself an already proficient C# developer I was quite confused, I just couldn’t wrap my head around the crazy concept of passing state around invisibly, although it was done in my favorite language. The interview comes with full source code and an assignment containing 10 exercises, which I was quite disappointed I could not even tackle. The main topic of the interview is labeling a binary tree, and Brian showed how to do it in several ways, which we’ll see and try to understand before moving on to the exercises.

Time has passed and I can finally go back to that interesting topic and hopefully help someone else out there. The topic, needless to say, is monads, and not even the simplest of the monads, the state monad. This post is the first of a series where I’ll introduce the concept and then go through the exercises, hopefully solving them correctly.

Without further ado, here’s the class which will accompany us throughout the journey, during which it will undergo only minor changes: a binary tree in C#.

A binary tree (binary-tree.cs) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
public abstract class Tree<T>
{
    public abstract void Show(int indent = 0);
}

public class Leaf<T> : Tree<T>
{
    public T Value { get; private set; }

    public Leaf(T value)
    {
        Value = value;
    }

    public override void Show(int indent)
    {
        Console.Write(new string(' ', indent * 2) + "Leaf: ");
        Console.WriteLine(Value.ToString());
    }
}

public class Branch<T> : Tree<T>
{
    public Tree<T> Left { get; private set; }
    public Tree<T> Right { get; private set; }

    public Branch(Tree<T> left, Tree<T> right)
    {
        Left = left;
        Right = right;
    }

    public override void Show(int indent)
    {
        Console.WriteLine(new string(' ', indent * 2) + "Branch:");
        Left.Show(indent + 1);
        Right.Show(indent + 1);
    }
}

As you can see a tree can either be a leaf, containing a Value, or a branch, whose left and right sides will be other trees (and thus either branches or leaves). The entire class hierarchy is generic for no special purpose except that it will allow us to give a meaningful representation to the leaves via their T Value property. I’ve also included a Show(int) method to easily ask the tree to dump itself to the output stream. A couple of helper methods to create branches and leaves will also come useful:

Helper methods to create trees (helper-methods.cs) download
1
2
3
4
5
6
7
8
9
public Tree<T> leaf<T>(T value)
{
    return new Leaf<T>(value);
}

public Tree<T> branch<T>(Tree<T> left, Tree<T> right)
{
    return new Branch<T>(left, right);
}

Now creating and displaying a tree of type string is straightforward:

A simple tree (a-simple-tree.cs) download
1
2
3
4
5
6
7
8
9
10
                                //  tree.Show() outputs:
var tree = branch(              //  Branch:
            leaf("a"),          //    Leaf: a
            branch(             //    Branch:
              branch(           //      Branch:
                leaf("b"),      //        Leaf: b
                leaf("c")),     //        Leaf: c
              leaf("d")));      //      Leaf: d

tree.Show();

Manual labeling

First of all we need to know what labeling a tree means. In our case it means to navigate the tree and assign a integer label to every leaf. Tree navigation can usually be done in two ways, depth-first (DFS) or breadth-first (BFS), we’ll use the former here. In order to store the label in the leaves we need to enrich its structure. We’ll do that by making the generic parameter of the tree a Labeled<T>, defined as follows.

Label (labeled-class.cs) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
public class Labeled<T>
{
    public int Label { get; private set; }
    public T Value { get; private set; }

    public Labeled(int label, T value)
    {
        Label = label;
        Value = value;
    }

    public override string ToString()
    {
        return new { Label, Value }.ToString();
    }
}

public Labeled<T> labeled<T>(int label, T value)
{
    return new Labeled<T>(label, value);
}

In practice, labeling a structure of type Tree<T> means transforming it into a Tree<Labeled<T>>. With this new knowledge labeling our tree manually is as simple as recreating the same tree structure and replacing the value of the leaves with a Labeled<string> instance, incrementing the value of the label manually every time we visit a leaf.
The value, or state, of the current label at each point in time lives in our brains: we’ll have to remember the previous value and increment it every time we visit a leaf. Not a big deal as long as the tree is small.

Manually labeled tree (manually-labeled-tree.cs) download
1
2
3
4
5
6
7
8
9
10
                                                 // labeledTree.Show() outputs:
var labeledTree = branch(                        // Branch:
                    leaf(labeled(0, "a")),       //   Leaf: { Label = 0, Value = a }
                    branch(                      //   Branch:
                      branch(                    //     Branch:
                        leaf(labeled(1, "b")),   //       Leaf: { Label = 1, Value = b }
                        leaf(labeled(2, "c"))),  //       Leaf: { Label = 2, Value = c }
                      leaf(labeled(3, "d"))));   //     Leaf: { Label = 3, Value = d }

labeledTree.Show();

Functional labeling

As you can see labeling a tree manually is not very useful, imagine to have a tree with more than a bunch branches and leaves, it becomes quickly inconvenient. What if we could label it programmatically? We indeed can, and we’ll have to come up with a function, which given our initial unlabeled Tree<T>, gives us back a Tree<Labeled<T>>.
Let’s call Tree<Labeled<T>> Label<T>(Tree<T>) such a function.

Label function (label-function-empty.cs) download
1
2
3
4
5
public Tree<Labeled<T>> Label<T>(Tree<T> tree)
{
    // what to write here?
    return null;
}

One constraint that we want to have is that we don’t want to store the current label in shared memory and update it whenever we visit a leaf. We would like to do it in a purely functional way. This means that the state needs to be passed around explicitly, with a starting value of 0. This suggests that we’ll need an additional support function that will thread the state via its arguments and return the updated state via its return value.
Given the structure of the data we’re handling this function will also most likely call itself recursively.

Label and Label1 functions (functional-label.cs) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public Tree<Labeled<T>> Label<T>(Tree<T> tree)
{
    return Label1(0, tree).Item2;
}

public Tuple<int, Tree<Labeled<T>>> Label1<T>(int label, Tree<T> tree)
{
    if(tree is Leaf<T>)
    {
        var lf = tree as Leaf<T>;

        return Tuple.Create(label + 1, leaf(labeled(label, lf.Value)));
    }
    else
    {
        var br = tree as Branch<T>;

        var left = Label1(label, br.Left);
        var right = Label1(left.Item1, br.Right);

        return Tuple.Create(right.Item1, branch(left.Item2, right.Item2));
    }
}

Label1 accepts an integer value representing the current label and a tree to be labeled. Its return value is a tuple containing the new label value and the labeled tree.

Let’s go through the code to see what it does exactly. You can see that it does two different things according to whether the tree to be labeled is a leaf or a branch.
In the first case it labels the leaf with the current value of the label, bumps it and returns a tuple containing the new label value and a leaf labeled with the value of the label before bumping it.
In the second case it first calls itself recursively to label the left branch of the tree, which returns again a tuple with the current label value and the labeled sub-tree. Then it does the same with the right branch by passing the just-returned label value as the current label value. Finally it returns a tuple containing the final state of the label and a newly constructed labeled branch containing the previously labeled left and right branches.
You might notice that it is this way of recursing that enables the depth-first approach.

Finally, Label calls Label1 passing 0 as the initial state and returning only the second item in the tuple, which is the labeled tree.

The interesting thing about this approach is that it does not rely on any shared state to do the labeling. The current value of the label is passed as a function argument and the new value is returned as part of the result, versus storing the label value in a variable accessible from the function and incremented every time.

In the next post we’ll see how to make this state-passing less explicit, while still avoiding shared memory to store it.

Comments