SQL Server is the engine that allows you to store, modify, and retrieve data. Working with databases is a very different prospect from working with application programs, which tend to perform one (or a few) very small tasks sequentially. Requests to a database server often incur massive retrieval and sort operations under the covers, even when the outcome is a very small result set.
In addition to good database design and effective query writing, one of the most effective and essential elements to an efficient database system is indexing.
A database index is very much like the index in a book: the book index has an alphabetized list of topics with page numbers to the location of the data. A database index has an ordered list of values (made up of one or more table columns), with pointers to the row in which the value and its corresponding data reside.
Without indexes, any query or data modification causes the SQL engine to search the referenced tables from the top down. This is akin to searching for a piece of information in a book by reading it from page 1. A single well-placed index can shorten your query time from dozens of minutes to under a second.
There are two kinds of indexes in SQL Server: clustered and nonclustered.
A table can only have one clustered index, because the clustered index sorts the rows in the table itself. Every table in the database should have a well-chosen clustered index to aid data retrieval and modification.
Any column or group of columns can make up a clustered index, but ideally a clustered index should be:
- Small (of a small data type) –The clustered index key is the pointer contained in each nonclustered index. If you therefore have a clustered index key that is large – for example, a 16 bit UNIQUEIDENTIFIER – indexes will take up much more space than if the clustered index key were smaller (e.g., a 4 bit INT).
- Unique or highly selective – The more selective an index, the more efficient.
- Ever-increasing – The clustered index orders the rows in the table. If the clustered index key is ever increasing, new rows are added to the end of the table. Otherwise, new rows are inserted in the middle of the table, and the database engine must reorganize the data on disk more often.
- Static – A frequently changing clustered index key will cause rows to be reordered within the table, causing unnecessary overhead.
Note: The clustered index is not necessarily the same as the primary key. While they do tend to go hand in hand, the purpose of the primary key is to enforce uniqueness within the table. For more on primary keys, see the Petri IT Knowledgebase article How to Design and Build SQL Server Tables – Part 2 of 2, and the Microsoft SQL Server Books Online article PRIMARY KEY Constraints.
A nonclustered index is a separate physical structure from the underlying table. It contains the values for the included columns – called index keys – along with pointers back to the corresponding table row. On a table that has a clustered index, each nonclustered index’s pointer is the clustered index key. Note that a clustered index is ordered, but it does not alter the order of the rows in the table.
There are few hard and fast rules for indexing. You have to see what works for your database over time. There are whole books dedicated to indexing strategies. Here are a few general indexing guidelines:
- Each table should have a clustered index that is (ideally) small, selective, ever increasing, and static. (Note that a table without a clustered index is called a heap.)
- Implement nonclustered indexes on foreign key relationships – in other words, on columns that are likely to be used in JOINs.
- Implement nonclustered indexes on columns that are frequently used in WHERE clauses.
- Do not implement single-column indexes on every column in a table. This will take up space unnecessarily, and cause high overhead on INSERTs and UPDATEs.
- In multi-column indexes, list the most selective (nearest to unique) first in the column list. For example, when indexing an employee table for a query on social security number (SSN) and last name (lastName), your index declaration should be
CREATE NONCLUSTERED INDEX ix_Employee_SSN
ON dbo.Employee (SSN, lastName);
- For the most often-used queries, consider a covering nonclustered index. (A covering index is one that contains all the columns requested from a table.)