Sunday, November 25, 2012

Binary Trees!

Hello everyone. I know neither John or myself haven't posted in some time and I do apologize, but I am currently taking two advanced courses on top of work (bad decision), and it is catching up with my free time. I can't believe I'm already nearing the end of the 4th week of the class, though. That's ridiculous. It feels like it has been one week at most. Still, in the short amount of time I have learned a lot. Since I don't have any new game programming developments, this is going to be a fairly technical post in computer science for those up to the challenge. If not, and you're only here to see game updates, this entry likely won't interest you. So if you're interested, press on and click the jump, if not, I understand!



The thing to realize is that game development is still a subset of computer science, and some of the topics still mesh together. This is one of those concepts, and one that will help tremendously in game design as we begin to branch out into data structures that are primarily meant for space partitioning as opposed to just holding and sorting data.

This topic brings me to the ever lovely binary tree! This will be the first part in a few part series about binary trees, extending to binary search trees and height balanced trees, also known as AVL trees. Finally, I'll end with the knapsack problem that I am doing in class. I am choosing to do this because I am learning this in more detail as we speak, so I'll be giving a first hand glimpse at what I learn and how I learn it. Before we really get started though, I am assuming a familiarity with basic data structures, including the linked list. As a preface, a linked list is simply a structure of data where a node in the list literally links to the other nodes. So if we have 5 elements, we could label it like so:


The basic concept is that, for each node (A box represented in the image) has allocated space in computer memory that stores the location of where the next node (another box!) is in memory. So the terribly drawn illustration above shows 5 nodes linked together. The difference between this and an array for example is that array is fixed length and contiguous in memory. There's no need to hold a link because the memory address is literally bytes away  by a fixed amount. 32 bit integers, for example, are 4 bytes. So the next integer is 4 bytes away. For a challenge though, can you tell me how Strings are represented? Since a String is typically not the same size for every element, how would an array be handled? I'll come back to this in another entry if there's enough interest. For now, we go back to binary trees.

A binary tree extends the Linked List concept such that it has not one, but two links to data elements. This is fascinating because how could adding this extra link help us in any way? Well, the truth is that without any semblance of balance or order, it doesn't. An arbitrary binary tree doesn't tell us how to actively find an element in a sequence that we're looking for. For example, take a look at the tree below:



Consider that the text values within the nodes are their value. So this is a sequence of integers, but if we wanted to find 10, we'd still have to check all values, or n values where n is the number of elements, because we don't know where to check. In this case, that's 10 elements. It's not much at that level, but for large data sets in the thousands or millions, such as data warehouses, it's a bit much to expect good performance at a linear rate. So this doesn't really help us because it has the same search time as a linked list. In a computer, if we wanted to find a certain value in a linked list, we'd have to search all elements until we find it. On average, this is less than n, but for our purposes we're considering the worst case in searching.

So how do we make binary trees better? We add some order, and call them by a [not so] new name, the binary search tree! Take a look at the revised tree with the same numbers. Note that I rearranged them to fit a specific insertion order, but the insertion order is arbitrary for this example, and I'll explain that more in a moment.


So tell me if you notice anything odd about this tree. What I did was that for every node, the left child is less than the parent node, and the right child is greater than. This assumes that duplicates are not allowed. A little explanation might be needed. A parent is the node that is "linking" to it, a node can have only 1 parent, and in a binary tree it can have at most 2 children, so for example the parent of 7 is 18, and the children of 18 are 7, which is the left child, and 54 which is the right child. 

This is useful because we can cut the amount of items we search roughly in half each time. So if we wanted to find the node with the value 2, we start at 18, and we now know to traverse the left branch because 2 is less than 18. Arriving at 7, we again go left, and we find 2 here in only 3 comparison checks (including 2 itself). 

As for insertion, and deletion, in binary search trees, I am linking to a PowerPoint as this can be somewhat difficult to explain by text and without images. This should also help with search, because a search must first take place before inserting or deleting. The PowerPoint should show the process relatively clearly, and it can be found by clicking --> This Link <--

Note one other thing, though. Notice how the left branch of 18 is strongly weighted as compared to the right branch. These are also called the left sub tree and right sub trees, since any tree can have sub trees at its branches. For example, removing 18 and 54, the node with 7 could be its own tree. This is called a "root". So in this tree, 18 is the root node, but 7 is also the root node of the node with 18's left sub tree. Don't worry if that doesn't make complete sense. For now, realize that the left of 18 has many more nodes than the right of it.

To discuss this, we'll need a few more terms. The height of a tree, is the length of the path from the root node to its longest leaf. Now, a leaf is a node that has no left or right children. That is, it is childless and that part of the tree terminates with its leaf node. In the above tree, the height of the tree at the node with the value 18, is 5 (The height is actually 4 if you come from a discrete math background, but we'll use the Computer Science interpretation here). A tree with only one node is of height 1, so by itself, the node with 16 has a height of 1. The node with 12 has a height of 2, and so on. So what we say here, is that the left sub tree of 18 has a height of 4, and the right sub tree only has a height of 1. That is, it is a leaf of the tree. This is an unbalanced tree. The reason this is harmful is because a binary tree could become linear. What if we had an insertion list that was successively bigger than the last item? We'd end up with a tree that is always attaching to the right branch of the next node. We'll look at how to solve that in the next post or the post after that in case I want to clarify any of the concepts discussed here. 

For now, this is a lot of information at a really fast pace. My intention was to cover very high level concepts, because sometimes looking above all the details can help things click for anyone struggling with the theory. The post on AVL trees will dive into some theory

No comments:

Post a Comment