database Index

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space.

Indexes can be created using one or more columns of a database table.

In a relational database, an index is a copy of one part of a table. Some databases extend the power of indexing by allowing indexes to be created on functions or expressions.

Index architecture

Index architectures can be classified clustered or unclustered.

In any index, whether clustered or nonclustered, the leaf level contains every key value (or combination of values, for a composite index), in key sequence. The biggest difference between clustered and nonclustered indexes is what else is in the leaf level.

Clustered Indexes

Levels of a clustered index

The leaf level of a clustered index contains the data pages, not just the index keys. So the answer to the question "What else is in the leaf level of a clustered index besides the key value?" is "Everything else"that is, all the columns of every row are in the leaf level. Another way to say this is that the data itself is part of the clustered index. A clustered index keeps the data in a table ordered around the key. The data pages in the table are kept in a doubly linked list called a page chain. (Note that pages in a heap are not linked together.) The order of pages in the page chain, and the order of rows on the data pages, is the order of the index key or keys. Deciding which key to cluster on is an important performance consideration. When the index is traversed to the leaf level, the data itself has been retrieved, not simply pointed to.

Most tables should have a clustered index. If your table will have only one index, it generally should be clustered. Many documents describing SQL Server indexes will tell you that the clustered index physically stores the data in sorted order. This can be misleading if you think of physical storage as the disk itself. If a clustered index had to keep the data on the actual disk in a particular order, it could be prohibitively expensive to make changes. If a page got too full and had to be split in two, all the data on all the succeeding pages would have to be moved down. Sorted order in a clustered index simply means that the data page chain is logically in order. If SQL Server follows the page chain, it can access each row in clustered index key order, but new pages can be added simply by adjusting the links in the page chain. I'll tell you more about page splitting and moving rows later in the chapter when I discuss data modification.

Nonclustered Indexes

Levels of a nonclustered index

In a nonclustered index, the leaf level does not contain all the data. In addition to the key values, each index row in the leaf level (the lowest level of the tree) contains a bookmark that tells SQL Server where to find the data row corresponding to the key in the index. A bookmark can take one of two forms. If the table has a clustered index, the bookmark is the clustered index key for the corresponding data row. If the table is a heap (in other words, it has no clustered index), the bookmark is a row identifier (RID), which is an actual row locator in the form File#:Page#:Slot#.

The presence or absence of a nonclustered index doesn't affect how the data pages are organized, so you're not restricted to having only one nonclustered index per table, as is the case with clustered indexes. Each table can include as many as 249 nonclustered indexes, but you'll usually want to have far fewer than that.