Binary Search Trees
Now that we know about Binary Search, let’s dig into the Binary Search tree!
Binary Tree
First, let’s define a simple Binary Tree. In programming, a tree is a data structure composed of nodes. A node contains a value and can have a reference to children nodes. A Binary Tree has a maximum of 2 children, a left and a right.
A node that does not have any successor is called a leaf.
The first node of a tree is called the root.
It looks like this:
Binary Search Tree
A Binary Search Tree (BST) is a particular type of Binary Tree, that makes searching for a value an easy task.
It has the following additional properties:
- The left subtree of a node contains only values inferior to the node’s value
- The right subtree of a node contains only values superior to the node’s value
- There can be no duplicate value
- The left and right subtrees of a node also respect the previous conditions, so they both are Binary Search Tree
In the previous example, we can see clearly that this is not a BST. Indeed, with a root of 1, the only children possible on the left side are 0 and negative numbers. But here, 2 is the left child. So not a BST.
This one, however, is a BST:
To represent a BST in code, we can follow this design:
Height of a tree
Before continuing further, we’ll have to define the concept of height. First, we need to define the height of a node.
In any tree — not just binary or BST- the height of a node is the length of the longest path between that node and a leaf. A node that does not have children has a height of 0.
This then leads to the definition of the height of a tree:
The height of a tree is the height of its root node.
For instance, in the previous example, the height of the tree is 2
An algorithm to compute a tree height would look like this:
Searching in a BST
Searching if a value exists in a BST is easy, thanks to the previous properties.
Using the previous tree, let’s search if 8 is in the tree. For this, we start at the root:
The root is 10. 8 is smaller than 10. We know that on a BST, all nodes on the left side are smaller than the current node. So we move to the left child.
The left child is 5. 8 is bigger than 5. We know that on a BST, all nodes on the right sides are bigger than the current node, so we move to the right child.
The right child is 8. Which is the value we are looking for. So we return true.
If we were to describe it as an algorithm, it would take the following steps:
- Choose root as the current node
- While the current node is not null, repeat the following:
- - If the current node’s value is equal to the search value, return true
- - if the searched value is less than the current node value, set the current node to its left child
- - else set the current node to its right child
- If the loop ends, then return false, the value is not present
Which can translate into this pseudo code:
Insertion
For inserting a value in a BST, the basic idea is pretty much the same as with a search.
Starting from the root, we navigate from left to right children until we cannot go down anymore, and add the value where we end up.
Complexity
In the previous search example, we made only 3 comparisons to find the correct value. In the worst-case scenario, you’ll have to make H comparisons, where H is the tree's height. The same goes for inserting as it follows pretty much the same pattern
Depending on how the tree is looking, this can become a problem. For instance, if you have all numbers from 1 to 100, in order, with 1 as the root, you can quickly see that the tree is only a single branch going to the right, and the tree has a height of 99. So for n nodes, we might have a case where we have to perform n comparisons, hence a complexity of O(n).
But there is a way to counteract that and go back to a search of O(log(n)), and it is to use Balanced trees. This is what we’ll see in another article.