Overview
When it comes to retrieving data from your tables, indexes play a fundamental role in finding that data you require and finding it fast. Pretty much every table you create will have an index of one type or another; whether it is a regular index, a primary key or a foreign key.
Primary Keys
When creating your tables, it is often the case that the table will have a unique identifier of one sort or another to allow you to uniquely identify a specific row within a table. The column can either be a numeric value like an auto incrementing integer, a unique identifier (a GUID) or even a character based value like a CHAR(#). You have added this column so that you can access all the data in a specific row and, as it stands, you can just use the WHERE clause to have SQL Server search through the rows to find the value request. However, you can greatly improve the performance of this type of request by adding a Primary Key to the table that indexes this column. Things you should be aware of when creating a primary key:
- Unique Values – A primary key can only be applied to a column that will contain unique values. If you try to insert a value into a primary key column that has already been used, SQL will raise a Primary Key Constraint exception.
- Not Just One Column – Every now and then, the need arises where your unique identifier for the row is actually a combination of two or more columns. A primary key can be set to encompass more than one column. A classic example of this is when you have a Many-to-Many relationship table (also known as a Cross Reference Table) which does nothing more than contain the ids from two different tables (for instance you have a table containing users and one containing departments – the M2M table would hold a user id and a department id so that they were linked. A user can be in many departments and a department can have many users – many-to-many).
- Column Type – The best column type for using as a primary key is always one of a numeric base. Numbers index far better than characters due to the way they are stored.
- Not Null – A primary key will not allow you to insert nulls.
Foreign Keys
A foreign key is used to create an index bridge between two tables (also known as a relationship but has nothing to do with relational data in the true sense). When a foreign key is available, the server is able to join two tables together much faster than without. Here are some things you should know about foreign keys:
- Table Relationships – If a table has relationship with another table (for instance a table containing contacts would have a relationship with a table containing the company the contact works for) then a foreign key should be created so that when performing a join between the two tables, the server has an index to help it quickly find the rows and join them up.
- Constraints – It is possible to add constraints to a foreign key that stops orphaned rows (contacts that don’t link to a company). You can state that a company cannot be deleted if there are contacts linked to it or if you delete a company it automatically deletes any contacts that are linked to it (also known as Cascade Deleting). Both of these should be used with care (especially the latter). Any stored procedures that you create should perform checks to handle this beforehand as it is much more elegant to handle the issue rather than have SQL Server raise a Foreign Key Constraint error.
- Value Must Exist – When you insert a value into a column that links to the id field of another table (say you try to insert a contact that links to a company that does not exist), SQL Server will raise a Foreign Key Constraint error. The parent record must exist.
Indexes
Separate indexes can be created on columns when needed. For example if your solution is likely to perform a lot of searches on a particular column. Having an index on this column will increase the performance. Be careful though with the amount of indexes that you create. Some points to remember:
- Performance Hit – When you create an index on a column, you need to remember that every time a row is created, updated or deleted, the index has to be altered. As a result, you will experience a small performance hit while the index is being updated. Often this can be a sacrifice you are willing to make compared to the performance gain you get from having the index. However, the more indexes you have, the bigger the hit. When you no longer need an index you should drop it.
- Index Rebuild Job – If you have large indexes, you should look to having a scheduled maintenance job that rebuilds all your indexes. Over time, an index can become fragmented and it is not feasible for the server to rebuild the index every time it updates. Rebuilding the index every now and then can aid in performance. When an index is being rebuilt it can put a lot of strain on the database so it is worth performing this task at a time when the database load is minimal.
- Unique – It is also possible to have an index that dictates the values contained within the columns being indexed are unique. This act similar to a Primary Key with the exception that you can create multiple unique indexes. A unique index will also allow you to store nulls.
- Clustered Index – When an index is clustered, it means that the rows are physically stored in the order of the index effectively reducing the size of the index (with a non-clustered index, the index has a second list that contains a pointer to the record) and increasing performance when reading the index. A table can only have one clustered index and if a primary key has been created before a clustered index, the primary key will be clustered.
Summary
That’s it for this post. As you can see, indexes are extremely useful but one should not just go banging indexes on to tables left, right and centre. SQL Management Studio has an option to include the execution script when running a command and can be quite useful as it will offer suggestions on where to place indexes and what kind of performance boost you will get from doing so (don’t just follow this blindly though as the suggestion is only based on the script it has just analysed and is not aware of the rest of the world). I hope this post is helpful and as usual, constructive critisism and suggestions are welcome.