Inserting a value into an existing B-Tree so that it remains a valid B-Tree turns out to be amazingly easy for so complex a data structure. If there is still room in the page then insertion really is trivial. The only problem arises if the page is full, i.e. already has 2n items. In this case the full page is split into two new pages at the same level each containing n items and one of the items is inserted into the page above - see Figure 8.

Figure 8: Inserting an item into a B-Tree

Of course there is always the possibility that the item that you have to insert in the page above will need that page to be split and so on, propagating splits perhaps even as far up the tree as the root page.

Notice that the splitting operation propagating back to the root is the only way that a B-Tree ever gets any deeper - which is a weird way to grow a tree. You can work out a similar operation for deleting elements from a B-Tree.

The advantages of the B-Tree form of index are reasonably obvious:

You only need access at most as many pages as there are levels in the tree.

As long as pages are the same size as a disk sector, reading or writing a page only involves a single disk access.

At worst only 50% of the disk space allocated to the index is empty.

Updating the index requires each page involved in the update to be accessed and manipulated only once. A less obvious advantage is the local nature of the update.

In a multi-user system or LAN only the pages involved in the modification have to be locked making it possible to share an index efficiently.

There are subtle ways of improving the performance of a B-Tree by redistributing items between pages to achieve a better balance but this is icing on the tree.

Many database packages proclaim the fact that they are better because they use B-Trees - now you know why. If you need to make use of B-Trees yourself then you can program everything from scratch but there are plenty of B-Tree subroutine libraries that will save you hours of coding and now you understand why you need one and how they work.

Credits

I have to admit that my account of B-Trees is based on the one given by Niklaus Wirth in his classic book Algorithms+Data Structure=Programs. If you need a more complete but more abstract approach complete with code fragments then try to get hold of a copy. Alternatively turn to The Algorithm Design Manualor one of other books in the side panel.

Glossary

ancestor

a node above

binary tree

each node has at most two descendants

descendant

a node below

degree of node

the number of direct descendants

degree of tree

the maximum degree of any node in the tree

depth of tree

the maximum level of any node

interior node

any node that isn't a terminal node

internal path

length the sum of all the path lengths

leaf

a final node

level

the root is at level one, its direct descendants

are at level 2 and so on

ordered tree

one in which the order of the descendants from each node is important

path length of node

the distance of the node from the root

root

the first node

sub-tree

a complete set of nodes connected to any given node

With 2012 being designated Alan Turing Year, you may find you are asked to explain just what a Turing Machine is and why it is so important. Here is an illustrated guide.

Part II of our look at data takes us into more sophisticated structures that are fundamental to computing - stacks, queues, deques and trees. If you don't know about these four then you are goin [ ... ]