SQL Server databases store all of your data in tables. Each table models something in the real world - in your business - like sales, customers, or intangibles like meetings or suggestions.
Because tables model real life objects, the process of designing and creating a table must start on the business side, gathering the rules and real life use cases before you create tables in SQL server 2008.
We will walk through an example to illustrate basic guidelines for good table design. In Part 1 we will cover the first three guidelines:
- Name: The table and columns have descriptive names.
- Modeling the Business: The table models a thing (or idea), using columns.
- Data Types and NULLability: Columns are strictly typed.
In Part 2, we will continue with our example with the final three:
- Primary Key Constraint: Rows don't duplicate
- Columns shouldn't duplicate
- Foreign Key Constraint: Child data isn't orphaned
A table name should be descriptive of the object or idea that it models. For example, "CustomerOrderDetail" is a clear, concise table name, while "Worksheet001" is not. Table names must be 128 characters or fewer, and are subject to the rules for SQL Server identifiers.
Naming conventions are somewhat subject to opinion within the database community, but we suggest you remain consistent, whatever you choose. For example:
- Singular vs plural table name, e.g. Bicycle vs Bicycles
- Capitalization, e.g. CustomerOrderDetail (preferred method) vs customerorderdetail vs CUSTOMERORDERDETAIL (not recommended)
- Use of underscores, e.g. CustomerOrderDetail vs Customer_Order_Detail
- Use of prefixes, e.g. CustomerOrderDetail vs tbl_CustomerOrderDetail vs tblCustomerOrderDetail. We do not recommend object name prefixes.
Modeling the Business
A table is made up of columns, which are attributes - pieces of data - that we want to capture about the business object we wish to model. The data you capture about customers for your business, for example, may differ from the data another business captures, because each company is interested in different data. Here is an example of the website visitor attributes that we might be interested in for our website:
- last name
- first name
- age range (for example, <12, 12-17, 18-24, 25-35, etc)
- gender (M or F)
- zip code
- email address
- site password
- date of first visit
- time spent at the website, in minutes
- average time spent on each web page, in minutes
- interested in newsletter
Once we have a list of attributes, we rename them to become table columns:
Now we have a set of columns that are attributes of a website visitor. Each row in the table represents one person who visits our website.
Data Types and NULLability
We've defined what we want to know about a visitor. The next step is to explicitly define what kind of data each of those attributes is, using a SQL Server data type. Each SQL Server data type holds a different type of data, different range and/or precision, and uses differing amounts of storage on disk.
One of the simplest methods of typing tables is to define each column as a string type (like varchar). While this is possible, it is NOT a good idea; doing this limits functionality, uses extra space on disk, and slows SQL engine operations. Use data types that are appropriate to the type of data to:
- Take advantage of data type-specific features, such as mathematical operations on numbers, and date range functions for dates.
- Save storage space on disk. For example, it takes one quarter the space to store an Age column as a TINYINT (range 0-255, 1 byte on disk), as it does to store it as an INT (range -32k to +32k, 4 bytes on disk).
- Make database operations as efficient as possible.
- Restrict the data to a specific type, at the most basic level. An attempt to insert alphanumeric data into an integer field should cause an error!
There are four categories of basic data types: numeric integers, numeric decimals, dates, and strings. These do not encompass all of the available SQL Server data types.
For each column, select the most appropriate data type. VARCHAR (variable character string) is generally the best choice for string data like names. For FirstVisitDate, SMALLDATETIME is a good choice. We don't need millisecond accuracy, like datetime, and the dates should fall within the SMALLDATETIME date range, which expires in 2079.
The table must also reflect the real world situation where we do not know all the information for a particular visitor. In SQL Server, "NULL" means "unknown" (or not applicable); we strongly recommend the use of NULL, as opposed to creating "special" values that mean "unknown". Please note that nothing can be equal to NULL (after all, does 3 = unknown?) and nothing can be unequal to NULL. SQL Server provides special functionality (IS NULL and ISNULL) to handle NULLs in queries.
Define which columns are allowed to be NULL, and which are absolutely required:
This is a very good start to our table design, but we still have three key elements to address: primary keys, duplicate columns, and foreign keys, which we cover in How to Design and Build SQL Server Tables Part 2.
- SQL Server 2008 Books Online: Identifiers
- SQL Server 2008 Books Online: Data types
- SQL Server 2008 Books Online: NULL
- MidnightDBA.com: This Blog IS About NULL