Trees
Binary Search Tree
- To build a BST with a given array input, the best time complexity would take O(NlogN) for N elements, and worst case would be O(N^2).
- The worst case is when a the array is in ascending order or descending order.
- This is because for each element, since the BST is skewed, it would take N comparisons to find the position for the corresponding element.
- The expected time to build a BST is asymptotically same as the running time of quicksort.
- Although BST sort performs the same comparisons as quicksort, the overall order might differ.
- The best case of average time complexity could be derived by looking at the average node depth, which is equal to the number of comparisons made for each element.
- Assuming all input permutations are equally likely, we can calculate the average node depth like the figure below.
- However since this does not guarantee that a randomly built BST has a height of O(lgN).
- In order to prove that the height of random binary search tree has a height of O(lgN) in average, we should prove very formally in the following procedure.
1) Prove Jensen's inequality
2) Analyze the exponential hieght of randomly built BST on n nodes.
3) Prove mathematical induction to derive E[Xn] = O(lgN)
Formal proof of average height of random BST
1) Prove Jensen's inequality
2) Analyze the exponential hieght of randomly built BST on n nodes.
Let's define Xn as the height of a randomly built binary search tree on n nodes.
Let's define Yn=2^(Xn) which is the BST's exponential height.
If root of the tree has rank k, then Xn = 1 + max{Xk-1, Xn-k}
Also, we could derive, Yn = 2 * max{Yk-1, Yn-k}
Let's define Znk = 1 if root has rank k, 0 otherwise.
Since Pr{Znk = 1} = E[Znk] = 1/n assuming random BST.
Then,
(Znk works like a onehot vector in Deep Learning, which is a Dummy indicator variable in math)
Using the recurrence formula we derived, we could analyze the running time using Mathematical Induction.
Taking log on each side, E[Xn] <= 3lgN + O(1)
Therefore, it is proved that a random BST has a average height of lgN.
Red Black Tree
- As mentioned above, a random BST does not guarantee a height of lgN even in worst case.
- A balanced search tree is a search-tree data structure that has this guarantee.
- For example there are AVL trees, 2-3 trees, 2-3-4 trees, B-trees, and Red-Black Trees.
- A red black tree is a BST that maintains the Red-black properties.
- If the Red-Black property is satisfied, we could have a BST with height at most lgN.
- This could be proved using Induction.
- Also, SEARCH, MIN, MAX, SUCCESSOR, and PREDECESSOR all run in O(lgN) time.
- When Inserting and deleting items from the Red Black Tree, we should modify the overall tree itself, meaning that colors of nodes could change and tree structure could change via rotation.
- A rotation can be performed in O(1) time.
- Insert take O(lgN) time and delete also takes same symptotic running time so therefore O(lgN) time.