SQL Server databases store all of your data in tables. Each table models something in the real world – in your business – like sales, customers, or intangibles like meetings or suggestions. Because tables model real life objects, the process of designing and creating a table must start on the business side, gathering the rules and real life use cases.
We will walk through an example to illustrate basic guidelines for good table design. In Part 1 we covered the first three guidelines:
In Part 2, we continue with our example with the final three:
In Part 1, we restricted the type of data allowed in each column with the use of data types. Now it’s time to avoid another pitfall: duplicate rows. Our table represents the visitors to our website, and we want to make sure one visitor has only one row. If a visitor has two rows in the table, there is no way to know which row is correct, which row to update, which row we should link to related data.
Identify the unique column(s): With the data we’ve gathered, we can identify an individual visitor by his email address. This is a rule in our specific business, because we’ve made the visitor email their web login. One visitor, one email. We can enforce this with the use of a constraint – a rule on a table that the SQL engine enforces, to maintain data integrity.
Primary Key Constraints: Let’s consider creating a Primary Key (PK) constraint on the email column. This ensures that no two emails are the same in the table, so each row can be identified uniquely by that email. The PK constraint also disallows NULL values in the email column.
Surrogate Primary Key: It’s perfectly valid in this case to use email as the PK. But there are other considerations to choosing the PK:
Notice: We created the VisitorID as an IDENTITY column, and created a PK constraint on that column, to provide a useful and compact identifier for each row. We also created a UNIQUE constraint on the emailAddress column to enforce data integrity.
There is another problem with the table that we need to address. The table currently tracks the number of minute spent on the website (minutesAtSite), and the average minutes on each page (avgMinutesOnEachPage), but only for ONE visit. We want to track this data for each visitor over multiple visits.
One inclination is to create additional columns, e.g. minutesAtSite_visit1, minutesAtSite_visit2, minutesAtSite_visit3 etc. This is unworkable, inflexible, and wastes space. Notice, too, that this doesn’t model the visitor, but rather the visits. Let’s remove these columns to a Visit table, which references back the visitor table:
The last line of the code defines a Foreign Key (FK) constraint, which enforces the relationship between the Visit table and the Visitor table. The constraint provides that one Visitor may have many associated Visits. It also enforces that each visit must have an associated visitor; you can’t delete a row from the Visitor table if there are rows in the Visit table that reference it.