were originally designed to work on direct access secondary storage (Magnetic
disk). In a computer system the memory capacity is consists of two parts: primary
memory (uses memory chips) and secondary storage (based on magnetic disk). When
dealing with large datasets, it is often impossible to maintain a whole dataset
in the primary memory. Instead, a small part of the dataset is maintained in
the primary memory and the remaining data is read from a secondary storage 1.
is a data structure used for sorting huge datasets for fast retrieval from a
disk. A typical B-tree has a single-digit height, even with millions of entries
2. Which means
that few disk accesses are needed to find the place where data is stored in the
tree. At most two accesses are required to search for any node. “From a
practical point of view, B-trees, therefore, guarantee an access time of less
than 10 ms even for extremely large datasets.” —Dr. Rudolf Bayer,
inventor of the B-tree
save time by using nodes with many branches (children). Unlike binary tree, in
which each node has only two children. When a node has many children a record
can be found easily by passing through fewer nodes than if there are two children
per node. See the example in Figure 1.
in Figure 1 shows that the red node in a binary tree has depth of three while
the corresponding node in the B-tree has depth of one. Clearly, B-trees locates
desired records faster, assuming that all system parameters are identical. The
difference in depth between binary tree and B-tree is greater in a practical
database than in the example in Figure 1. Real world B-trees height depends on
the number of records in the database.
tradeoff is that the decision process at each node in B-tree is more
complicated compared with a binary tree. A complex program is required to
execute B-tree’s operations, but this program runs fast because it is stored in
B-tree of order m (the maximum number of children for each node) is a
self-balancing search tree.
3.1 Node Structure
a B-tree node’s structure in Figure 2.
Figure 2: Node structure
K : is a search key. Search operation depends on the key value.
According to its value we can find the record that we are searching for. All
keys in a node are in ascending order: k1 < k2 ....... < kn. - DP : is a data pointer, which can take us to the whole record depending on the search key value that is associated with. - P : is a pointer, which might be connected to a sub-tree. p1 is connected to a sub-tree that contains values which are all less that k1 value. p2 is connected to another sub-tree that contains values which are all bigger than k1 value and less than k2 value. 3.2 B-tree Properties A B-tree satisfies the following properties: · Every non-leaf node has at most m children. · Every non-leaf node (except root) has at least m?2 children. · A non-leaf node with m children contains m-1 keys. · The root has at least two children if it is not a leaf node. · All leaves appear in the same level. Figure 3 shows a B-tree of order four. Each node contains up to three keys, and internal nodes have up to four children 3. Figure 3: A B-tree of order four. The height of an n-key b-tree T of height h with a minimum degree t greater than or equal to 2, , For n The worst case height is O(log n). Since the branches of a B-tree can be large compared to other balanced trees, the base of the algorithm tends to be large. But the number of visited nodes during a search tends to be smaller than required by other trees. Although this is does not affect the worst case height, B-tree tends to has smaller heights than other trees 4. 4. Basic Operations Conventions: · The root of a B-tree is in the main memory (it is not required to perform Disk-Read operation on the root). · Any node passed as a parameter must have had a Disk-Read operation performed on them. · Top down algorithm, starts at the root of the tree. 4.1 Searching on a B-tree Searching a B-tree is similar to searching a binary tree but instead of making two way branching decision at a node, must make multi-way branching decision depending on the number of a node's children. To search for a key in a B-tree, a linear search must be performed on the values of a node. After finding the value which is greater than or equal to the desired value, the child pointer to the immediate left of that value is followed. If all values are less than the desired value, the rightmost child pointer is followed. The search is terminated as soon as the desired node is found 5. 4.1.1 B-TREE-SEARCH pseudo-code Inputs: is a pointer to the root node of a sub-tree. is a key to be searched in the sub-tree. B-TREE-SEARCH(x, k) 1 i 1 2 while i nx and k keyix 3 do i i + 1 4 if i nx and k = keyix 5 then return (x, i) 6 if leaf x 7 then return NIL 8 else DISK-READ(cix) 9 return B-TREE-SEARCH(cix, k) Lines 1-3 : Using a linear search procedure, find the smallest i such that k ? keyix, or else they set i to nx + 1. Lines 4-5 : Check if the key is discovered, then return it. Lines 6-9 : Either end the search unsuccessfully if x is a leaf or recurs to search the appropriate sub-tree of x, after performing necessary DISK-READ operation on that child 6. 4.1.2 B-TREE-SEARCH Running Time The number of disk pages accessed by B-TREE-SEARCH is ?(h) = (logt n), where h is the height of the B-tree and n is the number of keys in the B-tree. The while loop time in lines 2-3 for each node is O(t). The running time is O(th) = O(t logt n) 6. 4.1.3 B-TREE-SEARCH Examples Consider the B-tree in Figure 2. · Example 1 : Search for key = 7. By applying B-TREE-SEARCH: Starts at the root (17), is key = 17? Is key > 17? No, go to left sub-tree.
Next node (4,9), is key = 4? Is key > 4? Yes, move to the next key.
Is key = 9? Is key > 9? No, go to sub-tree left of 9.
Next node (7,8), is key = 7? Yes, found.
Example 2 : Search for key = 11.
By applying B-TREE-SEARCH:
Starts at the root (17), is key = 17? Is
key > 17? No, go to left sub-tree.
Next node (4,9), is key = 4? Is key > 4? Yes, move to the next key.
Is key = 9? Is key > 9? Yes, go to the right sub-tree (there is no
more keys in the node to check).
Next node (12), is key = 12? Is key >12? No, the left sub-tree is a
leaf. Which means that 11 is not in the tree.
4.2 Creating an empty B-tree
B-TREE-CREATE operation builds an empty B-tree by first creating a new empty root
node and then call B-TREE-INSERT to
add new keys. B-TREE-CREATE and
B-TREE-INSERT operations are
both using ALLOCATE-NODE which
allocates one disk page to be used as a new node in O(1) time. When a node is
created by ALLOCATE-NODE it
does not require DISK-READ,
because there is no useful information stored on the disk for that node yet.
1 x ALLOCATE-NODE()
2 leafx TRUE
3 nx 0
B-TREE-CREATE Running Time
time is O(1).
4.3 Splitting a node in a B-tree
a node becomes full it is necessary to perform split operation. B-TREE-SPLIT-CHILD moves the
median key of node y up into its parent x (it has to be non-full node), where y is
the ith child of x. All keys in y right
of the median key are moved to the new node. The keys left of the median key
remain in the original node y. The
new node z becomes the right child of the median key that was moved to
its parent x. The original
node y becomes the left child of the median
key that was moved to its parent x 5.
Inputs: is a non-full
is an index.
the ith child of x and is the node being split.
Node y originally has 2t – 1 children but is
reduced to t – 1 children by this operation.
1 z ALLOCATE-NODE()
2 leafz leafy
3 nz t – 1
4 for j 1 to t – 1
5 do keyjz
6 if not leaf y
7 then for j 1 to t
8 do cjz
9 ny t – 1
10 for j nx + 1 downto i + 1
11 do cj+1x
13 for j nx downto i
14 do keyj+1x
16 nx nx + 1
Lines 1-8 : Create a new node z and give it the larger t –
1 keys and corresponding t children of y.
Line 9 : Adjusts the key count for y.
Lines 10-16 : Insert z as a child of x. To separate y from z,
move the median key from y up to x, and adjust the
key count for x.
Lines 17-19 : write out all modified disk page 6.
B-TREE-SPLIT-CHILD Running Time
The running time is ?(t).
Insert key (j) to the following B-tree of order 5 in Figure 4.
j > F,
Insert key (j) to the right sub-tree. See Figure 5.
Figure 5: The B-tree after
inserting ‘j’ ( B-Tree-Split-Child Example )
It is a full node, By applying B-TREE-SPLIT-CHILD:
The median key which is (j) moves up to the parent node because the
parent not is non-full. All keys on the left of the median key (G,H) remain on
the original node. All keys on the right of the median key (K,M) move to the
new node. See Figure 6.
Figure 6: The B-tree after
Splitting a node ( B-Tree-Split-Child Example )
into a B-tree
Inserting a key into a B-tree T of height h is done
in a single pass down the tree. The B-TREE-INSERT procedure uses
B-TREE-SPLIT-CHILD to guarantee
that the recursion never occurs to a full node 6.
Input: is a key to
1 r rootT
2 if nr
= 2t – 1
3 then s ALLOCATE-NODE()
Lines 1: Root node r is created.
Lines 2-9: Handle the case where the root node r is full, the root is split and a new node s (having two children)
becomes the root. The height of the B-tree is increased at the top. The
procedure finishes by calling B-TREE-INSERT-NONFULL to
perform the insertion of key k in the tree rooted at the non-full root
Lines 10: Recurses and
guarantees that the node to which it recurs is not full by calling B-TREE-SPLIT-CHILD.
The recursive procedure B-TREE-INSERT-NONFULL
inserts key k into node x, assumed to be non-full. The
operation of B-TREE-INSERT and the recursive operation of B-TREE-INSERT-NONFULL guarantee
that this assumption is true 6.
1 i nx
2 if leafx
3 then while i
1 and k < keyix 4 do keyi+1 x keyix 5 i i - 1 6 keyi+1x k 7 nx nx + 1 8 DISK-WRITE(x) 9 else while i 1 and k < keyix 10 do i i - 1 11 i i + 1 12 DISK-READ(cix) 13 if ncix = 2t - 1 14 then B-TREE-SPLIT-CHILD(x,i,cix) 15 if k > keyix
i i + 1
Lines 3-8: Handle the case where x is a leaf node by inserting key k
into node x. If x is not a leaf node, then insert k into
the appropriate leaf node in the sub-tree rooted at internal node x,
in this case go to lines 9-11.
Lines 9-11: Determine the child of node x to which the
Lines 13: Detects whether the recursion would descend to a full child, in
this case go to line 14.
Lines 14: Uses B-TREE-SPLIT-CHILD to
split that child into two non-full children.
Lines 15-16: Determines which one of the two children is the correct is one to
Line 17: Then recurses
to insert k into the appropriate sub-tree 6.
B-TREE-INSERT Running Time
The running time is O(t1ogt n).
Insert the following keys (70, 90, 25) to the B-tree
of order 6 in Figure 7.
Figure 7: a B-tree of order
6 ( B-Tree-Insert Example )
By applying B-TREE-INSERT :
First compare the key (70) with all key
values, it is bigger so it is inserted after key (50).
Then compare the key (90) with all key values, it is
bigger so it is inserted after key (70). See Figure 8.
Figure 8: The B-tree after
inserting ‘70,90’ ( B-Tree-Insert Example )
The node is full, to insert the key (25)
a split procedure must be applied first. Take the median key (50) and promote
it as the new root node and perform split operation on the original node. All
keys on the left of the median key which are (10) and (30) stay on the original
node and this node become the left child of the median key (50) which is the
new root node. All keys on the right of the median key which are (70) and (90)
move to new node and become the right child of the median key (50) which is in the
new root node.
Now insert the key (25) by first
comparing it with the root node key (50) which is bigger, so go to the left
sub-tree. The left sub-tree has two keys (10,30), first compare the key (25)
with the first key (30) which is bigger so the key (25) is inserted before the
key (30). See Figure 9.
Figure 9: The B-tree after
inserting ’25’ ( B-Tree-Insert Example )
4.5 Deleting from a B-tree
There are three possible cases for
deletion in a B-tree. Let k be the key to be deleted and x is the
node containing the key k 7.
The three cases are:
If the key
is a leaf node then simply remove the key to be deleted.
Key k is in node x and x is a
leaf, simply delete k from x.
Figure 10: a B-tree of order
4 ( B-Tree-Delete-key Case-? Example )
Delete the key (6) from the B-tree of
order 4 in Figure 10.
Figure 11: The B-tree after deleting ‘6’ (
B-Tree-Delete-key Case-? Example )
See the B-tree after deleting key (6) in Figure 11.
If key k is in node x and x is
an internal node, there are three cases to consider:
If the child y that precedes k
in node x has at least t keys (more than the minimum), then find
the predecessor key k’ in the sub-tree rooted at y. Recursively
delete k’ and replace k with k’ in x 7.
If the child node z that follows k
in node x has at least t keys, find the successor k’ and
delete then replace as in Case-??-a. Note that finding k’ and deleting it can be
performed in a single downward pass 7.
· Case-??-b Example
Figure 12: a B-tree of order
4 ( B-Tree-Delete-key Case-?|-b Example )
key (13) from the B-tree of order 4 in Figure 12.
B-tree after deleting key (13) in Figure 13.
Figure 13: The B-tree after deleting ’13’ (
B-Tree-Delete-key Case-?|-b Example )
If both y and z have only t?1
(minimum number) keys, merge k and all of z into y, so
that both k and the pointer to z are removed from x. y
now contains 2t ? 1 keys, and subsequently k is deleted 7.
· Case-??-c Example
Figure 14: a B-tree of order
5 ( B-Tree-Delete-key Case-?|-c Example )
key (7) from the B-tree of order 5 in Figure 14.
B-tree after deleting key (7) in Figure 15.
Figure 15: The B-tree after
deleting ‘7’ ( B-Tree-Delete-key Case-?|-c Example )
If key k is not in an internal
node x, determine the root of the appropriate sub-tree that must contain
k. If the root has only t ? 1 keys, execute either of the
following two cases to descend to a node containing at least t keys. Then
recurs to the appropriate child of x 7.
If the root has only t?1 keys and has a
sibling with t keys, give the root an extra key by moving a key from node x
to the root. Moving a key from the roots immediate left or right sibling up
into node x, and moving the appropriate child from the sibling to node x
· Case-???-a Example
Figure 16: a B-tree of order 6 ( B-Tree-Delete-key
Case-?|?-a Example )
key (2) from the B-tree of order 6 in Figure 16.
Figure 17: The B-tree after
deleting ‘2’ ( B-Tree-Delete-key Case-?|?-a Example )
B-tree after deleting key (2) in Figure 17.
If the root and all of its siblings have t?1
keys, merge the root with one sibling. Then move a key down from node x
into the new merged node to become the median key for that node.
· Case-???-b Example
Figure 18: a B-tree of order
6 ( B-Tree-Delete-key Case-?|?-b Example )
Delete the key (4) from the B-tree of
order 6 in Figure 18.
Figure 19: The B-tree after
deleting ‘4’ ( B-Tree-Delete-key Case-?|?-b Example )
B-tree after deleting key (2) in Figure 19 and Figure 20.
Figure 20: The B-tree after
deleting ‘4’ ( B-Tree-Delete-key Case-?|?-b Example ) – 2
1 if not leafx then
2 y ? Preceding-Child(x)
3 z ? Successor-Child(x)
4 if ny > t
? 1 then
5 k’ ?
6 Move-Key(k’, y, x)
7 Move-Key(k, x, z)
9 else if nz
> t ? 1 then
10 k’ ?
11 Move-Key(k’, z, x)
12 Move-Key(k, x, y)
15 Move-Key(k, x, y)
16 Merge-Nodes(y, z)
18 else (leaf node)
19 y ? Preceding-Child(x)
20 z ? Successor-Child(x)
21 w ? root(x)
22 v ? RootKey(x)
23 if nx > t
? 1 then Remove-Key(k, x)
24 else if ny
> t ? 1 then
25 k’ ?
26 Move-Key(k’, y,w)
27 k’ ?
28 Move-Key(k’,w, x)
30 else if nw > t ? 1 then
31 k’ ? Find-Successor-Key(w,
32 Move-Key(k’, z,w)
33 k’ ?
34 Move-Key(k’,w, x)
37 s ? Find-Sibling(w)
38 w’ ? root(w)
39 if nw’ = t
? 1 then
44 Move-Key(v,w, x)
Preceding-Child(x): Returns the left child of key x.
Move-Key(k, x, z): Moves key k from node x to
Merge-Nodes(y, z): Merges the keys of nodes y and z
into a new node.
Find-Predecessor-Key(k, x): Returns the key preceding key k
in the child of node x.
Remove-Key(k, x): Deletes key k from node x.
x must be a leaf node.
memory, increasing the branching factor will reduce the height of the B-tree and
will make each level of the B-tree hold more nodes. Fewer levels means fewer
disk access (if any) and better performance 8.
But if the branching
factor in a B-tree is so big, it means that the first view levels cannot be
held in the memory. Which means accessing each level can cause disk swapping and
poor performance 8.
There are two major concerns:
memory: a B-tree parameters are limited by the available memory 8. All search starts
from the top of a B-tree, so the first level of nodes need to be in memory. If the
number of nodes fits in main memory, it means that there is no need for disk
storage. But scaling physical memory is an expensive proposition and infeasible
beyond a point 8.
solution to this problem is sharding. Sharding a B-tree storage enables a
B-tree to hold a limited number of keys. Sharding the database along with its
b-tree indices into different physical machine is one of the most effective
scaling solutions for growing data volumes