Monday, April 27, 2009

Polymorphic Binary Trees

Hello all,

This is the week before the week that is before the week exams start, so like all good college students with exams approaching, I'm doing something completely off topic. I'm discussing Polymorphic Binary Search Trees.

For those who don't know what a Tree is, read ahead. For those who do, you can skip this next section to the part where we discuss how to implement these structures.

Trees
A tree is exactly what it sounds like - a data structure that holds objects within nodes. Each node is pointed to by a parent node and can have multiple children nodes. Binary trees are such a data structure, where each node is restricted to two children.

An Example of a Binary Search Tree (If you were to print it out):
10
8 14
4 9 12 15

In this example, 10 is known as the root node. 8 and 14 are the left and right children respectively. Both 8 and 14 branch again to have two children of their own.

Note: It is not required for a Binary Tree to be balanced, although it does optimize operations.

Such binary trees have a multitude of uses - most importantly they store data in sorted order (As shown above). If you were to add another value to the tree, such as 1, the tree would add it to the end of one of it's nodes that has an open child. In this case, 1 would be appended as the left child of 4.

Implementations:
There are a few ways to implement binary trees - the most common way is to define a node class that would look something like this (in Java):

public class node{
  private node left;
  private node right;
  private T data;
  /*Constructors and methods*/
}

Where left and right are the children nodes and data is the object stored in each node. Now there is another class that holds the root node, or the first node. From this one node each other node is processed and added. The tree class would look something like (in Java):

public class tree{
private noderoot;
  /*Constructors and methods*/
}

Now the tree class would have various methods including, add, remove, contains etc. These have very straight forward implementations that can be done both using loops and recursion. In fact, trees lend themselves quite nicely to recursive implementations of method.

While the above method is quite commonly used, there is another implementation that we will be demonstrating here, and that is know as the Polymorphic Binary Tree.

The Polymorphic Binary Tree requires on several classes - the Tree abstract class, the EmptyTree class, and the NonEmptyTree class. Both Empty and NonEmpty extend the abstract Tree class. This implementation focuses on the fact that Empty and NonEmpty trees extend the Tree class, and can therefore be switched as children.

The basic idea is this - NonEmpty trees have references to two children Trees, which can either be the EmptyTree class or the NonEmptyTree class. Hence the name Polymorphic.

Now, each child node in the ParentTree will begin as an EmptyTree, replacing itself with a NonEmptyTree when instructed to do so. The abstract Tree class will look like this (in Java):

public abstract class Tree{
  public Tree add(T data);
  public Tree remove(T data);
  public boolean contains(T data);
  public int size();
  public T max();
  public T min();
  /*Any other methods you need in your tree*/
}

Each of these methods will be implemented differently by each child class. You'll note that the add and remove methods each return a Tree. This is how trees replace themselves when transforming from Empty to NonEmpty. If add is called on an EmptyTree, it returns an instance of NonEmptyTree with the given object. If a NonEmptyTree is a leaf and is to be removed, it returns an EmptyTree, replacing itself.

The add method for a NonEmptyTree will insert a new object based on the compareTo method that all classes in Java have. It will check each node's data object and compare against the data to be inserted. If the new data is greater than the current trees data, it will call add to it's right child. If it's less than the current trees data, it will call add to it's left child. If it's equal, it will update the data, and then return itself.

A call of add to an EmptyTree will return a new NonEmptyTree with the given data.

The contains method for a NonEmptyTree will use the same movement described in the add method to track through the tree in search of the given data. It should also be done recursively, returning the call on either the right or the left until the current Tree has the given data.

Obviously a call of contains to an EmptyTree should return false.

Max and min for a NonEmptyTree should use the same movement described above to track through the tree until they reach an EmptyTree. Max should keep moving to the right until it no longer can. Min should keep moving to the left until it no longer can.

These are the two methods that actually require EmptyTree to do something. There are two ways you can approach this. The first is to have EmptyTree.max/min simply return null to let the calling NonEmptyTree know that it has reached an EmptyTree and therefore return it's own value. The other is to define an exception that is thrown to inform the calling tree that an EmptyTree has been discovered. You would envelope the calls to left and right in a try/catch statement, and if the exception is caught you would simply return the calling NonEmptyTree's data.

The delete method always seems to confuse people (or at least it confused the hell out of people in my programming class) so I'll give a somewhat in depth overview of how to do it. This implementation will be a recursive method. I will not provide code.

The first thing to do is to determine if you have found the data to be removed. If you haven't, use the compareTo method to determine which direction to go from there. Continue until you have found the data, or you reach a EmptyNode.

If, when you find the data, you are in a leaf node (Both right and left children are Empty) simply return an EmptyTree

If you have found the data, you need to find some value to replace it with. There are two ways to do this - you can use a value from either the right or the left. If you use the the former, you will replace the value with the min on the right side. If you use the latter, you will replace the value with the max on the left side. How do you find these values? You use the handy-dandy min/max methods already defined. Once you have replaced this value, you need to remove the value that used to define the new values position in the tree.

To do this, simply call remove on the left or right (whichever you chose to do). This value must be a leaf tree, because it was the min or max. When you have done this, return this to update the entire tree.

Obviously if you call remove on an EmptyTree, it will do nothing.

I can do no more:
Well, I can. I can actually do a ton more. I can write Binary Tree's using both implementations, I can write methods that do all sorts of neat things with them, as well, but I won't. The above methods can be used to as a basis for the other implementation I mentioned, one using recursive helper methods within the parent Tree class.

Interesting Things you can do with Binary Search Trees:
Sort data in O(n log(n)) - Very optimised search. Make sure you insert in random order, otherwise it will degenerate into a linked list.
Sub sets - You can find sub sets between two values very easily using Binary Trees and recursion.

And much more! Comment any questions you have, and I'll try to get back to you. I am however hesitant to write out code because it could be used on closed project homework assignments.