SQL Server databases store all of your data in tables. Each table models something in the real world - in your business - like sales, customers, or intangibles like meetings or suggestions. Because tables model real life objects, the process of designing and creating a table must start on the business side, gathering the rules and real life use cases.
We will walk through an example to illustrate basic guidelines for good table design. In Part 1 we covered the first three guidelines:
- Name: The table and columns have descriptive names.
- Modeling the Business: The table models a thing (or idea), using columns.
- Data Types and NULLability: Columns are strictly typed.
In Part 2, we continue with our example with the final three:
- Primary Key Constraint: Uniquely identify rows
- No Repeating Columns
- Foreign Key Constraint: Child data isn't orphaned
Uniquely Identify Rows
In Part 1, we restricted the type of data allowed in each column with the use of data types. Now it's time to avoid another pitfall: duplicate rows. Our table represents the visitors to our website, and we want to make sure one visitor has only one row. If a visitor has two rows in the table, there is no way to know which row is correct, which row to update, which row we should link to related data.
Identify the unique column(s): With the data we've gathered, we can identify an individual visitor by his email address. This is a rule in our specific business, because we've made the visitor email their web login. One visitor, one email. We can enforce this with the use of a constraint - a rule on a table that the SQL engine enforces, to maintain data integrity.
Primary Key Constraints: Let's consider creating a Primary Key (PK) constraint on the email column. This ensures that no two emails are the same in the table, so each row can be identified uniquely by that email. The PK constraint also disallows NULL values in the email column.
Surrogate Primary Key: It's perfectly valid in this case to use email as the PK. But there are other considerations to choosing the PK:
- The PK is used as the identifier in other tables and structures. The email column is a VARCHAR(200) column, which is a fairly large data type (up to 202 bytes on disk) to be used as an identifier, especially when compared with the 4 byte INTEGER type.
- The table is sorted on disk by the PK column; when a row is inserted, the PK determines where in the table it is written. If our table is sorted based on email, we will experience overhead associated with inserting the row at (what is essentially) a random place in the table.
- Not all tables will have a natural key - a column (or set of columns) that naturally defined the row uniquely.
The solution is a surrogate key - a column created specifically to uniquely define the row. Ideally, the surrogate PK should be compact in size, and ordered. The most commonly used surrogate key is an integer column with the IDENTITY property, which supplies automatic, incremental values. For example:
Notice: We created the VisitorID as an IDENTITY column, and created a PK constraint on that column, to provide a useful and compact identifier for each row. We also created a UNIQUE constraint on the emailAddress column to enforce data integrity.
No Repeating Columns
There is another problem with the table that we need to address. The table currently tracks the number of minute spent on the website (minutesAtSite), and the average minutes on each page (avgMinutesOnEachPage), but only for ONE visit. We want to track this data for each visitor over multiple visits.
One inclination is to create additional columns, e.g. minutesAtSite_visit1, minutesAtSite_visit2, minutesAtSite_visit3 etc. This is unworkable, inflexible, and wastes space. Notice, too, that this doesn't model the visitor, but rather the visits. Let's remove these columns to a Visit table, which references back the visitor table:
Foreign Key Constraint
The last line of the code defines a Foreign Key (FK) constraint, which enforces the relationship between the Visit table and the Visitor table. The constraint provides that one Visitor may have many associated Visits. It also enforces that each visit must have an associated visitor; you can't delete a row from the Visitor table if there are rows in the Visit table that reference it.