-
Do you Measure Up-Time?
If you measure up-time you can pro-actively inform your manager how successful you
have been as a DBA. You can do this in 2 ways:
Option 1: High Tech Solution - using Microsoft Operations Manager (MOM)
MOM allows you to monitor and generate reports on the total uptime of your SQL Server
and other service level exceptions. You need the following for these reports:
- Microsoft Operations Manager 2005 and SQL Server on the network when performing
a network scan
- Downloaded Microsoft
SQL Server Management Pack for Microsoft Operations Manager 2005 for free.
Figure: Show managers what a good job you're doing with Microsoft Operations
Manager and associated reporting service reports
Option 2: Low Tech Solution - using a recurring select as a heartbeat
- Run a query as a ping once every a 5 minutes something that takes about 2 seconds
- SELECT * FROM Orders Five times
- Log it with the time
- Graph - See uptime
- Graph See performance
-
Do you check your SQL Server is up-to-date?
Most patches are for security. SQL Slammer showed that it's no good waiting a month
before you decide to install a service pack. I would say wait one week and then
install the service pack (maximum delay should be 2 weeks)
Option 1: Manually check using @@version
Option 2: Run SSW Diagnostics and get all
green ticks (Recommended)
Figure: Use diagnostics to ensure your SQL is up-to-date
Note: To check all servers on my network I use
Net Ping
-
Do you design for change?
Many developers are frightened of making a change to the existing database because
they just don't know what applications are using it. This is especially a problem
when you are dealing with a databases that you did not create. Here are some approaches
to this issue:
- You could run around the office and find some one and hope they know (unbelievably
this seems this the most common method!)
- Trawl through source control, all network locations and all the source code around
to check what connection strings are being used
- You can have a zsApplication table and manually populate with application it uses
(Recommended). This can be populated with a run of a SQL profiler over a period
of a week so all usage is captured.
Figure : Add a zsApplication table to make applications that use it visible
to all developers
- Keep a constantly running login Audit with a SQL Server Profiler Trace that saves
to a table - and make sure all applications have an application name in their connection
string. This method is the most comprehensive option but is not recommended because
you get a constant performance hit from SQL Profiler running.
Figure: SQL Profiler can help you design for change with auditing of Login events
by giving you a guide on what applications are connecting to your database.
-
Do you script out all changes?
Every time a change is made to your product's SQL Server Database, script out the
change. You can use Enterprise Manager, VS.NET or Query Analyzer but every time
you make changes you must save the change as a .sql script file so any alterations
are scripted. Everything at SSW is usually done three times, once on Development,
once on Staging and once on Production. Change control is one of the most
important processes to ensuring a stable database system.
Keep the scripts in a separate directory to any other scripts or files. This way
you can always go back to them and find out what alterations you have made to the
database in version xxx to find errors. If you have all the scripts you are able
to rebuild the database from scratch. At SSW we name this folder SQLChangeScripts so as to not confuse it with other script folders.
Figure 1: A list of change SQL scripts, each file name is in the correct format.
The script file format should be: <version>_<description>.sql
The <version> should be a number which is padded with leading zeros (0) on
the right to firm 3 or 4 digits (however long we need).
What if you are using a code generator?
Every time we use Next Generation
, it creates its own Generated Stored Procs in the Database Project
of our Solution. The folder it is kept in is called "Auto-Generated Stored
Procedures".
The scripts found within this folder are as follows:
- 010_ViewsForStoredProcedures.sql
- 020_StoredProcedures_Select.sql
- 030_StoredProcedures_Insert.sql
- 040_StoredProcedures_Update.sql
- 050_StoredProcedures_Delete.sql
After re-generation of code in the solution, these scripts will be updated with
the required stored procs for new Database Objects found in the application. The
problem is, however, that every time a re-generation occurs these files must always
be added to the large list of scripts in the "SQLChangeScripts" folder as
shown above.
To solve the issue of continually piling up these scripts every time you use
Next Generation
, it is recommended that the scripts are copied over to the "SQLChangeScripts" folder, and the names should not be changed.
By only modifying the first three numbers accordingly for the correct script sequence,
you will be able to find all other Next Generation stored procs, as shown in Figure
2.

Figure 2: Previous NextGen scripts can be removed except the last NextGen script
file e.g 008_StoredProcedures_Delete.sql should not be deleted as it may be the
last script in a previous version which SQL Deploy may need for reference.
Since the previous NextGen Scripts are considered outdated with the newly generated
scripts; deleting the previous NextGen scripts will not affect the Database Objects
found in the application.
Deleting these scripts will in fact decrease the list of scripts significantly and
save a very large amount of time when upgrading the database using SQL Deploy, especially
when the generated scripts contain a lot of SQL commands.
After the Upgrade, you should do a check on the database with the scripts just to
make sure they Reconcile.
|
We have a program called SSW SQL Deploy which allows
you to run scripts automatically
|
-
Do you configure all your SQL Server Services
to use a Domain Account rather than a local service account?
SQL Server 2000 and 2005 have several different services that support them.
- SQL Server
- SQL Server Agent
- SQL Server Reporting Services
- SQL Server Integration Services
- SQL Server Fulltext search
- SQL Server Analysis Services
In the service properties window for these services, ensure that the Service Startup
Account is run as "This Account" and not as "Built-in Account" (SQL 2005) or "Service
Account" (SQL 2000). Otherwise, you won't get all the functionality by default such
as the ability to use Replication, Linked Servers, connect to other machines or
use SQL Server mail.
For security you should not have this domain acccount in the Administrators group.
Figure : Run as Account should use a domain account rather than a built-in account
(Good)
Figure : This service is using a built-in local service account (Bad)
-
Do you name your SQL Server Domain Account as
'SQLServerMachineName' so it is easily identified when performing network operations?
When you create the domain accounts under which SQL Server services run, you should
name this domain account as "SQLServerMachineName". e.g. SSW2000\SQLServerDragon.
If one of the SQL Server services updates a file on the network, then you can then
determine which server wrote to the file.
-
Do your SQL Server Services run under non-Administrator
accounts?
You should always run all SQL Server services with the lowest possible priviledges
allowed in case the account is compromised. In SQL 2000 this was a bit of a pain
as you would have to manually create a minimal account with the bare minimum required
priviledges. SQL Server 2005 setup makes the whole process of granting priviledges
a whole lot easer than in SQL 2000 - because it automatically creates groups with
all the neccessary permissions for you!
Figure : SQL 2005 now creates groups for all the SQL Server services with
the bare minimum permissions for you
For good old SQL 2000, you must manually give the SQL Server Service accounts the
following permissions:
- Ability to log on as a service
- Ability to access and change the MSSQL directory
- Ability to access and change applicable .mdf, .ndf, and .ldf files
- Ability to read and write to certain registry keys under:
HKEY_LOCAL_MACHINE\Software\Microsoft\MSSQLServer.
-or- for any named instance: HKEY_LOCAL_MACHINE\Software\Microsoft\Microsoft SQL
Server.
HKEY_LOCAL_MACHINE\System\CurrentControlset\Services\MSSQLServer.
-or- for any named instance: HKEY_LOCAL_MACHINE\System\CurrentControlset\Services\MSSQL$Instancename.
HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Perflib.
If you are running any SQL Server Service in an user account that has administrator
privileges a user that compromises the account could do anything that administrator
could do - including playing around with the registry with procedures like xp_regdeletevalue.
So, if you use an Administrator account, you're in effect giving away the keys to
the house. Is this something you want to do?
-
Do you use Database Mail (not SQL Mail) in SQL 2005?
SQL Server 2005 includes Database Mail, a replacement for SQL Mail. Database Mail
solves many of the problems inherent in SQL Mail, including:
- HTML messages are now natively supported - so there's no need to use 3rd party dlls
anymore
- There's no need for outlook or MAPI profiles on server - communication is directly
with SMTP server
- Multiple profiles and accounts are supported to specify multiple SMTP servers or
different email infrastructure situations
- SQL Server queues messages even when the external mailing process fails
- High security - users and roles have to be granted permission to send mail
- Logging and auditing
- Attachment size regulations and file extension requirements can now be implemented
Figure : Using SQL Mail (Bad)
|
EXEC master.dbo.xp_smtp_sendmail
@FROM = N'your@email.com',
@FROM_NAME = N'Sophie Belle',
@TO = 'recipient@email.com',
@subject = 'Vendor List',
@message = 'The list of vendors is attached.',
@type = N'text/html',
@server = N'mail.company.com.au'
|
Figure: Avoid using SQL Mail - you need to have Outlook on the server
and there is no built-in logging (Bad)
Figure : Use Database Mail (Good)
|
USE msdb
Execute dbo.sp_send_dbmail
@profile_name = 'UTS',
@recipients = 'your@email.com,
@body = 'The list of vendors is attached.',
@query = 'USE AdventureWorks; SELECT VendorID, Name FROM Purchasing.Vendor',
@subject = 'Vendor List',
@attach_query_result_as_file = 1
|
Figure: Use database mail for scalability, built-in logging and HTML capability
(Good)
For a more in-depth comparison of SQL Mail vs Database Mail, see this
intro to Database Mail in SQL 2005
-
Do you turn on all the default alerts?
Even if you don't have Microsoft Operations Manager, SQL Alerts are valuable because
they can alert administrators of imminent SQL Server failures. e.g. when the msdb
log file is full. To enable, you should change the settings under SQL Server Agent.
Figure : Default Alerts in SQL 2000 were disabled after install. Enable
them.
SQL 2005 on the other hand has no default alerts. You will have to create them,
and I recommend that you add all the fatal level exceptions to alerts.
Figure : SQL 2005 alerts - We recommend that you add the fatal exceptions
as alerts
-
Have you created your own alerts?
In addition don't forget to add your own alerts, such as sending an alert for the
exception "is outstanding amount > $15000". Many peple are not aware of this functionality
and write triggers or get 3rd party products to the same job.
-
Backup - Do you setup a complete Maintenance Plan?
It is surprising how many IT staff create backup plans and then leave it at that.
To have a complete maintenance plan, you should also consider the following:
- Checking database integrity
- Shrinking Databases
- Reorganizing Indexes
- Rebuilding Indexes
- Updating Statistics
- Cleaning up old maintenance histories
- Performing automatic backups
- Backing up System databases
- Last but not least - you should regularly check that the maintenance plans have
been running successfully. Otherwise all your backup and maintenance efforts are
pointless.
Figure : SQL 2005 - A Complete Weekly Maintenance Plan
-
Backup - Do you back up scripts?
Scripts are an important component in the operation of any database. This is why
you should back up all your scripts and historical schema snapshots - so you can
track the scripts that have been run and those that need to be deployed to test
and production databases. We typically store these in source control such as VSS
or Team Foundation Server as a Visual Studio Database project. You should regularly
generate full scripts of all objects changed, keeping the following points in mind:
- Don't encrypt your database objects if you can avoid it - otherwise they can't be
scripted.
- Use the
- Enterprise Manager Generate Scripts Wizard Or
- SQL DMO object model to script out the objects Or
- Try a third party utility called
SQL Scribe (Recommended) to generate your schema snapshot scripts.
-
Backup - Do you take Restoration seriously?
Restoration is vital, we all agree. But rapid restoration with minimal down time
is just as important. Run practice restorations on a regular basis, as you don't
want to find out that your back-up doesn't restore when a problem has already occurred.
This restoration process is so important that you should have a termination clause
in an employee's contract if the restore doesn't work when something goes wrong.
-
Backup - Do you have a Restoration Standard?
If the restoration process is not clear and too complicated, then it will cost you
both time and money when you are already in a pressure situation. Make sure you
have a step by step and comprehensive restore standard with screenshots for every
step of the way so there are no nasty little surprises when your system goes down.
-
Are you aware of all the log files?
SQL Server stores vital error and performance information in several different
logs. You should be aware of all of them:
- SQL Server Error Logs
- Configure how many you want to keep
- You should Back up your SQL Server error logs with your other scripts
- Sp_cycle_errorlog
- SQL Server Agent Error Log
- Recycles after every service restart
- Job History Logs
- Agent properties, Job System tab
- Probably too low by default
- DBMaint history logs
- (Event Viewer) - Issues
-
Do you secure your server by changing the 'defaults'?
- Disable defaults
- Disable Administrator and Rename it, then create a new "honeypot" Administrator
account with no permissions.
- Disable Guest on the SQL
- Change Port 1433
- Delete the sample databases - (AdventureWorks, Northwind and Pubs). These have a
Public Role which is a security risk and allow Massive SQL Statements
- Other security issues
- Use a service account with a strong password
- Dont run SQL Server service as an administrator
- Run in integrated security mode
- Run on NTFS file system - Encrypt the data files
-
Do you turn on security auditing?
- Configure login security auditing
- Not on by default
- Configure on the security tab of Server Properties in Enterprise Manager
- Enable for Failure
- View using the Windows Event Viewer
Figure : Enable Auditing for SQL Server logins
Note: You can turn on a trace for SQL DDL operations statements
-
Do you increase the Log Size of your Event Viewer?
Change the defaults from 512KB and "Overwrite events older than 7 days" to 64000KB
and Overwrite as needed. This will allow the users to view Security audits and errors
much further into the past with a minimal increase in space - and it will never
bloat your server.
Figure : Change from this ridiculously small log size (Bad)...
Figure : ... To a much more reasonable log size (Good)
-
Do you remove unnecessary permissions on databases?
What goes for permissions to Windows objects also goes for SQL Server objects. Remove
all permissions but the bare minimum required to operate your application.
-
Do you use Performance Alerts?
Performance alerts work well for problems that need to be discovered before they
occur.
For example, one problem that you may encounter is database file growth. Since databases
are set to grow to a certain percentage, you needed to configure an alert to let
you know when my database would draw close to that threshold. you can configure
a performance alert that fired off when it reached 80% of that threshold. Here is
an example of what you can do:
To configure an alert to trigger an application, perform the following steps:
- Start the Microsoft Management Console (MMC) Performance snap-in (Start, Programs,
Administrative Tools, Performance).
- Expand Performance Logs and Alerts, and select Alerts.
- Right-click in the right pane, and select New Alert Settings.
- Enter a name for the setting that reflects what the alert will monitor, and click
OK.
- On the General tab, add the counter the alert will monitor and specify the values
that will trigger the action.
- On the Action tab, select the Run this Program checkbox.
- Click the Browse button, and select the name of the application you want to run.
- Click OK.
You have just configured an application to run in response to an alert. Unfortunately,
because the program doesn't interact with the desktop, it runs in the background,
visible only in Task Manager. To enable the program to run interactively, perform
the following steps:
- Start the MMC Services snap-in (Start, Programs, Administrative Tools, Services).
- Right-click Performance Logs and Alerts, and select Properties.
- On the Log On tab, specify the "Local System account" and select the "Allow service
to interact with desktop" checkbox.
-
Do you make sure you use a consistent Collation server-wide?
Collation is the combination of language and sort orders, and you typically don't
notice it until you start running cross database queries.
It would make development simpler if the whole world spoke one language but even
if you are using English, you will still encounter collation issues if you are not
careful. The most common issue is the dreaded 'Cannot resolve collation conflict
for equal to operation' error when joining on columns that have different collation
orders. Collation is a great feature for international companies, but if you are
not consciously using it then you should have ALL the objects in ALL the databases
on ALL the servers using a consistent collation.
Flexibility with collation orders has increased a lot since SQL 7.0:
- SQL 7: Back in SQL Server 7, you could only define the collation at the server level
and, once it was set, you could not change it without rebuilding the master database.
- SQL 2000: This added the ability to have Column level collation which allows you
to set it at the database or column level.
However, with this column-level flexibility come additional issues. It is ideal
for those who only want the column name 'FirstName' to be represented in accent
insensitive sort order. However, one of the side effects, if you are not taking
notice of collation, is that you end up with many different collations on many different
databases.
We feel that the only time you need inconsitent collations is when you have a rogue
3rd Party application like Microsoft Great Plains that enforces its own collation.
See these Knowledge Base articles for more information about the issues you will
encounter when you have inconsistent collations:
- Q211874 - Why do I get the
error 'Cannot resolve collation conflict for equal to operation'?
The database collation differs from the SQL Server default collation because it
was attached or created with a different collation order. This causes issues when
you attempt to join tables in databases that have different collation orders. For
example, if your tempdb database and Northwind each have a different collation you
will get the following error 'Cannot resolve collation conflict for equal to operation'
when you attempt to do a join between tables from these databases
- Q711843 - How do I change
the collation order in my SQL Server 2000 or 7.0 database?
There is no 'recommended' collation as different collations will be used in different
countries but as a guideline, installations in the United States and installations
that require compatibility with SQL Server 7 databases should use the SQL_Latin1_General_Cp1_CI_AS
collation. Non-United States installations in English speaking countries should
use the Latin1_General_CI_AS collation.
Figure : Setting the collation in SQL 2005 Setup - Choose Case Insensitive(CI),
Accent Sensitive (AS)
Use our product
SSW SQL Auditor to automatically implement this rule for you. And you can configure in Tools->Options, Option [Current database(Recommended)] for check your selected database; Option [All database] for check all databases on your selected server;
Download it and give it a go.
|
-
Do you create new databases in the default data directory?
When trying to create a database in SQL Server 2005 from an existing create script
written for SQL Server 2000, we came across a problem. Our create script was trying
to determine the path to save the database file (the path to the default data store)
by using the sysdevices table in the Master database; however, the schema for the
Master database had changed in 2005 and our script could no longer find the column
it relied on to determine this path.
Rather than creating a new script specific to 2005, we found that by removing the
optional FILENAME attribute all together, both SQL Server 2000 and 2005 were happy
and the database files were saved into the default data directory which is what
we were after.
The moral of the story is - keep it simple.
When using a create script to create a new database, let SQL Server determine the
filename and path from its default settings. This will help make the script simpler,
more flexible, and ready to use with utilities such as MS OSQL and
SSW SQL Deploy.
DECLARE @device_directory NVARCHAR(520)
SELECT @device_directory = SUBSTRING(phyname, 1,
CHARINDEX(N'master.mdf', LOWER(phyname)) - 1)
FROM master.dbo.sysdevices
WHERE (name = N'master')
EXECUTE (N'
CREATE DATABASE [DatabaseName]
ON PRIMARY
(
NAME = N''[DatabaseName]'',
FILENAME = N''' + @device_directory + N'[DatabaseName].mdf''
)
LOG ON
(
NAME = N''[DatabaseName]_log'',
FILENAME = N''' + @device_directory + N'[DatabaseName].ldf''
)
COLLATE SQL_Latin1_General_CP1_CI_AS
'
)
Go
Figure: FILENAME Parameter used to specify database path - Bad
CREATE DATABASE [DatabaseName]
COLLATE SQL_Latin1_General_CP1_CI_AS
Go
Figure: Generic CREATE DATABASE used - Good
|
We have a program called Code Auditor
that checks for this rule.
|
-
Are you aware of compatibility issues between SQL Server 2000 and 2005?
The SQL 2005 generated scripts are not compatible to SQL 2000, so use SQL 2000 to
generate your scripts if you want to make your scripts work well on both versions.
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ProcessTarget]') AND type in (N'P', N'PC'))
drop procedure [dbo].[ProcessTarget]
Figure: script only works on SQL 2005, because 'sys.objects' is only available
in this version.
IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[ProcessTarget]') AND OBJECTPROPERTY(id, N'IsProcedure') = 1)
drop procedure [dbo].[ProcessTarget]
Figure: script works on both SQL 2000 and SQL 2005
-
Do you have a general Contact Detail table?
It is common to have a Contact Detail table to store your contact information such
as phone numbers. Below is an example of Contact Detail table and its related tables.
This is bad because the PartyPhone table is too specific for phone number and you
have to add a new table to save email or other contact information if this is needed
in the future.
-

- Figure:
a too specific Contact Detail table
we normally have a general Contact Detail table that includes all the different
categories of phone numbers, whether is is shared or primary plus emails all in
the same table.
-

- Figure: a general Contact Detail table
We use a Contact Detail Category table to store these categories.
-

- Figure: details of Contact Detail Category table
-
Do you know how to provide best database schema document?
You should not provide a database schema via several screen captures - it has little
information of the details. A well formatted Word document may be providing more
details information, but it is not easy to maintain the document to keep it up-to-date.
The best way is to automatically generate your document with a tool.
We recommend and use
Red-Gate SQL Doc to produce chm help files or html pages of the database
schema. SQL Doc also allows you to run via the command line so you can include the
generation in your build process to be automatically created.
We have also have used these other available tools in the past:
SQL Scribe and
Apex SQL Doc.
-
Do you avoid collation errors?
You don't want this error:
"120_ClientInvoice_ClientIDRequired.sql...Column 'dbo.Client.ClientID' is not of
same collation as referencing column 'ClientInvoice.ClientID' in foreig..."
When you write a stored proc - it must work regardless of the users collation. When
you are joining to a temp table - meaning you are joining 2 different databases
(eg. Northwind and TempDB) they wont always have the same collation.
The reality is that you can't tell a user what collation to run their TempDB - we
can only specify the collation Northwind should be (we don't even want to specify
that - we want that to be the their default (as per their server))
Here is what you need to do:
SELECT
#ClientSummary.ClientID,
DateOfLastReminder = MAX(ClientDiary.DateCreated),
DaysSinceLastReminder = DATEDIFF(day,MAX(ClientDiary.DateCreated),getdate())
INTO #RecentReminderList
FROM
ClientDiary INNER JOIN #ClientSummary
ON ClientDiary.ClientID = #ClientSummary.ClientID COLLATE
database_default
WHERE
ClientDiary.CategoryID LIKE 'DEBT-%'
GROUP BY
#ClientSummary.ClientID
-
Don't allow Nulls in text fields
NULLs complicate your life. To avoid having to constantly differentiate between
empty strings and NULLs, you should avoid storing NULLS if you can.
Why? Well what is wrong with this?
SELECT ContactName
FROM Customer
WHERE ContactName <> ''
Figure: Selecting on empty string
Nothing if your data is perfect, but if you allow Nulls in your database, then statements
like this will give you unexpected results. To get it working you would have to
add the following to the last line:
WHERE ContactName <> '' OR ContactName Is Null
Figure: Allowing null strings makes queries more complex
What about only allowing empty strings? Well we choose to block Nulls because it
is a lot easier to check off a check box in Enterprise Manager than it is to put
a constraint on every field that disallows empty string ('').
|
|
Figure: Don't allow Nulls
|
Not allowing Nulls will give you the following benefits:
* Don't have to enforce every text field with a CHECK constraint such as ([ContactName]<>'').
* Make your query simpler, avoid extra checking in stored procedures. So you don't
have to check for NULLs and empty strings in your WHERE clause.
* SQL Server performs better when nulls are not being used.
* Don't have to deal with the pain in middle tier to explicitly check DBNull.Value,
you can always use contactRow.ContactName == String.Empty. Database Nulls in the
.NET framework are represented as DBNull.Value and it cannot implicitly typecast
to ANY other type, so if you are allowing NULLs in ContactName field, the above
comparing will raise an exception.
* Avoid other nasty issue, a lot of controls in the .NET framework have real problems
binding to DBNull.Value. So you don't have write custom controls to handle this
small thing.
However, you should always be aware that Nulls and empty strings are totally different,
so if you absolutely have to have them, they should be used consistently. In the
ANSI SQL-92 standard, an empty string ('') is never equated to Null, because empty
string can be significance in certain applications.
For example, you have Address1 and Address2 in your database, a Null value in Address2
means you don't know what the Address2 is, but an empty string means you know there
is no data for Address2. You have to use a checkbox on the UI to explicitly distinguish
Null value and empty string:

Figure: A check box is required if you want to allow user to use Null value on the
UI
Some people are not going to like this rule, but this is how it works in Oracle
and Access:
*In Oracle, empty strings are turned into Nulls (which is basically what this rule
is doing). Empty strings per se are not supported in Oracle (This is not ANSI compliant).
*And talking of legacy systems :-) be aware that using Access as a data editor is
a "No-No". Access turns empty strings into a Null.
Finally always listen to the client, Nulls have meaning over an empty string - there
are exceptions where you might use them - but they are rare.
So follow this rule, block Nulls where possible, update your NULLs with proper information
as soon as possible, and keep data consistent and queries simple.
|
We have a program called
SSW SQL Auditor to check for this rule:
o
Alternative #1 (recommended)
- Use NOT NULL constraint and allow empty strings instead of allowing null values
o
Alternative #2 (not recommended)
- Use NULL constraint and disallow empty strings with CHECK constraint
|
|
|
Figure: What
the Customers table will look like after applying the rule above 'Alternative #1
(recommended)'
|
|
|
Figure: What
the Customers table will look like after applying the rule above 'Alternative #2
(not recommended)
Alternative #1 is always preferred, but if you do
have some exceptions in your application, please put comment in
the field "Ignored by SQL Auditor: [your reasons]".
Eg. Ignored by SQL Auditor: Null means unknown addresses. I have provided a checked
box in the UI so the user can signify that.
|
-
Don't allow NULLs in number fields if it has
the same meaning as zero
NULLs create difficulty in the middle-tier because you need to add further handling.
So avoid them where you can, eg. For a Discount field, make the default 0 and don't
allow NULLs.
This rule should not be applied when a NULL value is valid data. Often times data
such as a percent earnings rate on a super fund is nullable because it may not be
supplied or relevant. This is very different to it being zero and you have no way
to determine real zero values from not supplied data. The hit of doing the work
in code is often offset in this case by the validity of query results.
As a general rule of thumbs, don't use NULL if you cannot distinguish it from another
value.
Q. What is the difference between NULL and 0 in discount field?
A. No difference, so don’t allow Nulls.
Q: What is the difference between NULL and 0 in Tumor size?
A: Null means unknown and 0 means no tumor, so allow Nulls.
Note: Nulls are evil, but don't go crazy removing nulls. Never invent your
own constant eg. -999 to represent a Null.
-
Don't start data in character columns with
empty line
Character columns (char, varchar, text, nchar, varchar, text) can store data as
<Ctrl>+<Enter> in the first line and the rest of data in the second
line.
Note: If the front-end is Microsoft Access, then the data in the
second line is not shown.
|
SQL Auditor will check this rule and generate a script
to remove an empty line from character type columns where it is the first character
|
-
Don't start data in character columns with
spaces
Text in character columns (char, varchar, text, nchar, varchar, text) can start
with spaces which is usually data entry error.
|
SQL Auditor will check this rule and generate a script
to remove leading spaces from character data
|
-
Use Identities in SQL Server (but don't use Autonumbers
in Access)
This one is going to be a controversial one. But the bottom line is every now and
then you want to do something and then you curse and wish your database didnt have
an identities. So why use them? Let's look at the problems first:
Cons:
- You cant manually change a Primary Key and let the Cascade Update do its work, eg.
an InvoiceID
- Hassles when importing data into related tables where you want to control the Primary
Key eg. Order and Order Details
- Replication you will get conflicts
In Microsoft Access you have autonumbers and there is no way around them so never
use them.
But in SQL Server you have identities and we have these procs:
- DBCC CHECKIDENT - Checks the current identity value for the specified table and,
if needed, corrects the identity value
- SET IDENTITY_INSERT { table } { ON | OFF } - Allows explicit values to be inserted
into the identity column of a table
Pros:
- Less programming - letting the database take care of it
- Replication (identities are supported by SQL Server with ranges so when you want
replication, no coding
- Avoiding concurrency errors on high INSERT systems so no coding
So the only Con left is the importing of data but we can use one of the above procs
to get around it. See grey box.
The best way to import messy data into SQL Server (with Identities)
Eg. inserting data to the Orders and Orders Details table:
- Use an .adp to copy the first record to Excel
- Get the data into the same column orders
- --
- SET IDENTITY_INSERT Orders ON --this will allow manual identity INSERTS
- Copy and Paste Append the Orders
- SET IDENTITY_INSERT Orders OFF --as it can only be on for one table at a time
- --
- SET IDENTITY_INSERT [Order Details] ON --this will allow manual identity INSERTS
- Copy and Paste Append the [Order Details]
- SET IDENTITY_INSERT [Order Details] OFF
Automatic Identity Range Handling
The simplest way of handling identity ranges across replicas is to allow SQL Server
2000 to manage identity range handling for you. To use automatic identity range
handling, you must first enable the feature at the time the publication is created,
assign a set of initial Publisher and Subscriber identity range values, and then
assign a threshold value that determines when a new identity range is created.
For example, assigning an identity range from 1000 through 2000 to a Publisher,
and a range from 2001 through 3000 to an initial Subscriber a range from 3001 to
4000 to the next publisher etc.
-
Don't delete records - just flag them as Inactive
When users are deleting a lot of records as part of normal operations - they can
and do make mistakes. Instead of the painful process of having to go to a backup
to get these records, why not simply flag the records as Inactive?
Advantages
- You do not have to delete all related records e.g. Customers, Orders, Order Details.
Instead, you can just flag the parent record as deleted with an "IsDeleted" bit
field.
- You do not lose historical data e.g. how many products one of your previous clients
purchased
- You can actually see who deleted the record, as your standard audit columns (e.g.
DateUpdated, UserUpdated are still there. The record does not just vanish
- It is simple to implement - particularly when using a code generator. For example
- our code generator produces views and stored procedures, and all data access layer
code. With all data access done through the data layer views, we simply had to add
a filter to all views ("WHERE IsActive = 0"). Our autogenerated delete stored procedures
simply set the "IsActive" column to false.
Disadvantages
- Depending on your interface design, you may have to join to parent tables to ensure
that deleted child records do not appear. Typically, the interface would be designed
in such a way that you would not need be able to created new records based on the
deleted items (e.g. you cannot create a new order record for a customer that is
deleted). Performance of queries can potentially suffer if you have to do these
joins.
- While storage space is very cheap, you are not removing records from your database.
You may need to archive records if the number of deleted records becomes large.
Also see Using Audit Tools for alternatives to this
approach using 3rd party auditing tools.
-
Date - Make sure you have valid date data in your database
SQL Server dates can range from year 1900 up to year 9999. However, certain date
data in your database just wouldn't make any sense in the context of your business.
For example, if your company started trading in 2005 you should not have any dates
in your database before 2005 (unless you are tracking start dates of your clients,
but this is an exception). An invoice date of 2003 wouldn't make sense at all. You
should run validation queries to ensure no rubbush date data gets into your database.
|
SQL Auditor will check this rule and generate a script
to see data that violates user-defined date range
|
-
Date - DateTime fields must be converted to universal
time
Any DateTime fields must be converted to universal time from the application to
the stored procedures when storing data into the database.
When retrieving data from the database it must be converted back to the local time
of the user.
That way you get an accurate representation of e.g. the time someone entered data
into the database (i.e. the DateUpdated field).
The exception to this rule, however, is for already existing databases that deal
with DateTime as part of their queries.
e.g. SSW Time PRO.NET is an application that allows employees to enter their timesheet.
The table used for storing this information has an important field that has a DateTime
data type.
This cannot be converted to UTC in the database because that would mean:
- Converting every single entry since entries began being stored (in SSW's case since
1996) to keep information consistent;
- Other separate applications currently using the timesheet information in the database
for reporting will also have to be entirely modified.
Currently there will be an issue if for example someone from the US (Pacific time)
has 19 hours difference between her local time and our servers.
Example: Sally in the US enters a timesheet for the 21/04/05. (which will
default to have a time of 12:00:00 AM since the time was not specified)
Our servers will store it as 21/04/05 19:00:00 in other words 21/04/05 07:00:00
PM because the .NET Framework will automatically convert the time accordingly for
our Web Service.
Therefore our servers have to take the Date component of the DateTime and add the
Time component as 12:00:00 AM to make it stored in our local time format.
[WebMethod]
public double GetDateDifference(DateTime dateRemote)
{
DateTime dateLocal = dateRemote.Date;
return (dateRemote.TimeOfDay.TotalHours -
dateLocal.TimeOfDay.TotalHours);
}
Figure: When dateRemote is passed in from the remote machine, .Net Framework will
have already converted it to the UTC equivalent for the local server (i.e. the necessary
hours would have been added to cater for the local server time).
In the above code snippet, the .Date property would cut off the Time portion of
the DateTime variable and set the Time portion to "12:00:00 AM" as default.
This is for applications we currently have that:
- Consider the DateTime component integral for the implementation of the application
- That will be used world wide.
-
Do you use 3rd party audit tools to audit data changes?
In many cases, there are legal requirements to audit all updates to financial records.
In other cases, you will want to be able to track and undo deletes to your database.
Some solutions we have seen in the past are:
- Manually adding triggers on all database tables to log every table
- The business objects or stored procedures all write to 2 tables the main table such
as Customer and CustomerAudit
- Using a logging utility to audit database changes
For ease of reporting and the ability to undo, we recommend that you use a logging
utility such as Lumigent
Log Explorer. This means that you can devote your development time to areas
other than auditing. Also, unlike other utilities which use triggers (such as ApexSQL Audit), there
is no performance overhead because it relies upon log files already created by SQL
Server. If required, you can export the log information to SQL Server, so you can
perform advanced queries on it. It even allows you to recover previously deleted
tables.
-
Do you use invalid characters in object identifiers?
We believe it is not good that use invalid characters (most of are Symbol characters,
like ",;"\/(", etc.) in object identifiers. Though it is legal, it
is easy confused and probably cause error during run script on these objects.
|
We have a program called SQL Auditor
that checks for this rule.
|
-
Do you check invalid characters in character data?
Always avoid invalid characters in your data (most of are Symbol characters, like
",;"\/\n\r", etc.). You usually get them in your database by mistake people
usually get them from copy and pasting from Word.
They can be costly here is an example of an error you can get.

What could this be? Well in this case the html source of http://www.ssw.com.au/ssw/Download/Download.aspx?GroupCategoryID=5BUS
had this:
<script language="Javascript">
document.write('SSW Smart Tags for Word ');
...
</script>
There is an unwanted [return char] in the end of [SSW Smart Tags for Word]. So,
then you trace it to a database record and find that the CategoryName field is SSW
Smart Tags for Word? in that record. After you I remove the error chars, this bug
was fixed all very expensive and all very costly.
|
We have a program called SQL Auditor
that checks for this rule.
|
-
Do you use a URL instead of a image in your database?
We recommend that you use a URL instead of a image in your database, this will make
you
- avoid size of your database increasing too speedy (which may will bring a serial
of problems, like performance, log and disk space, etc);
- easy to validate and change the image;
|
SQL Auditor will check this rule and generate a report
of image columns
|
-
Only use Unicode datatypes (nchar, nvarchar and ntext)
in special circumstances
Columns defined using the nchar, nvarchar and ntext datatypes can store any character
defined by the Unicode Standard, which includes all of the characters defined in
the various English and Non-English character sets. These datatypes take twice as
much storage space per characters as non-Unicode data types.
It is not the disk space costs that are the concern. It is the 8060 limit, please
refer to Microsoft FAQ
for details.
If your database stores only English characters, this is a waste of space. Don't
use Unicode double-byte datatypes such as nchar, nvarchar and ntext unless you are
doing multilingual applications.
|
We have a program called SSW SQL Auditor that can
produce a report indicating unicode datatype columns in your database. It also can
generate sql script to change datatype of those columns from nchar to char and from
nvarchar to varchar. To change datatype of ntext columns we recommend to use Enterprise
Manager.

Figure: SQL Auditor will report all unicode columns like nvarchar and ntext. It will
give you a SQL script for the nvarchar and advise you to use Enterprise Manager
to change ntext columns to text.
|
-
Do you always use Varchar?
Use VARCHAR instead of CHAR, unless your data is almost always of a fixed length,
or is very short. For example, a Social Security/Tax File number which is always
9 characters. These situations are rare. SQL Server fits a whole row on a single
page, and will never try to save space by splitting a row across two pages. Running
DBCC SHOWCONTIG against tables shows that a table with fixed length columns takes
up less pages of storage space to store rows of data. General rule is that the shorter
the row length, the more rows you will fit on a page, and the smaller a table will
be. It allows you to save disk space and it means that any retrieval operation such
as SELECT COUNT(*) FROM, runs much quicker against the smaller table.
-
Do you have standard Tables and Columns?
- All tables should have the following fields:
|
Field
|
SQL Server Field Properties
|
Access Field Properties
|
|
Note
|
varchar(500) Allow Nulls=True
|
text(255) Required=False
|
|
DateCreated
|
smalldatetime Allow Nulls=False Default=GetDate()
|
date/time Required=True Default=Now()
|
|
UserCreated
|
varchar(150) Allow Nulls=False
|
text(150) Required=True
|
|
DateModified
|
smalldatetime Allow Nulls=False Default=GetDate()
|
date/time Required=True Default=Now()
|
|
UserModified
|
varchar(150) Allow Nulls=False
|
text(150) Required=True
|
|
Concurrency
|
timestamp Allow Nulls=False
|
integer Required=True Default=1
|
The first three are examples of bad table records. The last one is an example of
how this table structure should be entered.
Sample 1: Implementing in Access
'---------------------------------------------------------------------------
Private Sub Form_BeforeUpdate(Cancel As Integer)
'---------------------------------------------------------------------------
If gstrNetworkUserID = "" Then Dummy = UserLogin(DLookup("EmpID", "ControlLocal"))
ctlEmpUpdated = gstrNetworkUserID
ctlDateUpdated = Now()
End Sub
|
Note: Never set the DateCreated field - instead use a default Now() in Access
or a default getdate() in SQL Server.
Sample 2: Implementing in Windows Forms - VB.NET
Sample 3: Implementing in Web Forms - ASP.NET
Note: the EmpUpdated is the IPAddress and the web page.
- All databases should have a table with one record to store application Defaults.
This table should be called 'Control'.
If the settings are not application-wide, but just for that user then an XML (do
not use an INI file) for simple stuff might be better. Examples are saving the 'User'
for logon, 'Select Date Range' for a report, form positions, etc.
.NET programs have an Application.Configuration which exports to XML file (app.config)
automatically. It works very well, and deployment is very simple. It's integrated
right into the Visual Studio.NET designer as well.
- All databases should have a version table to record structural changes to tables.
See SSW Rules
to Better Code
- Lookup tables that have just two columns should be consistent and follow this convention:
CategoryID (text 6) and CategoryName (text 100).
The benefit is that a generic lookup form can be used. You will just need the generic
lookup form pass in the TableName and Column1 and Column2.
Note #1: The problem with the naming is the primary keys don't match
Note #2: The benefit with the character primary key columns is that queries and
query strings have meaning eg.
http://www.ssw.com.au/ssw/Download/Download.aspx?GroupCategoryID=5BUS from
this URL I can guess that it is in the business category.
|
SQL Auditor will check this rule and generate a script
to add standard columns specified in Tools > Options
|
-
Do you use Bit/Numeric data type correctly?
- Bit data type
Bit data from 0 to 1 (2 values only). Storage size is 1 byte.
Columns of type bit cannot have indexes on them. Also, SQL Server 7
only allows True or False values in a bit column. SQL 2000 introduced the
ability to store NULL as well. Applications built for SQL Server 7 often does not
expect this behaviour, and may create subtle runtime errors.
[more information on bit data type]
Columns of type bit should be prefixed with "Is" or a "Should" ie. IsInvoiceSent
(y/n) or ShouldInvoiceBeSent (y/n) you can tell easily which way the boolean is
directed. [more information
on naming conventions]
This being said, fields of this type should generally be avoided because often a
field like this can contain a date i.e. DateInvoiceSent (Date/Time) is prefered
over InvoiceSent (y/n). If a date is inappropriate then we still recommend an int
field over a bit field anyway, because bits are a pain in the butt :-)
- Tinyint data type
Integer data from 0 through 255. Storage size is 1 byte.
- Smallint data type
Integer data from -2^15 (-32,768) through 2^15-1 (32,767). Storage size is 2 bytes.
- Int data type
Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31-1 (2,147,483,647).
Storage size is 4 bytes. The SQL-92 synonym for int is integer.
- Bigint data type
Integer (whole number) data from -2^63 (-9223372036854775808) through 2^63-1 (9223372036854775807).
Storage size is 8 bytes.
Recommended:
- Use smallint datatype instead of bit datatype - so it can be indexed;
- Use int datatype, where possible, instead of bigint datatype - for saving disk space;
- Use smallint datatype, where possible, instead of int datatype - for saving disk
space;
- Use tinyint datatype, where possible, instead of smallint datatype - for saving
disk space;
|
SQL Auditor will check this rule and generate a script
to replace with the proper data type;
|
-
What sort of primary keys do you use - Natural or Surrogate?
Now this is a controversial one. Which one do you use?
- A "Natural" (or "Intelligent") key is actual data
- Surname, FirstName, DateOfBirth
- An "Acquired Surrogate" (or "Artifical" or "System Generated") key is NOT derived
from data eg. Autonumber
- eg. ClientID 1234
- eg. ClientID JSKDYF
- eg. ReceiptID 1234
- A "Derived Surrogate" (or "User Provided") key is indirectly derived from data eg.
Autonumber
- eg. ClientID SSW (for SSW)
- eg. EmpID AJC (for Adam Jon Cogan)
- eg. ProdID CA (for Code Auditor)
- A "GUID" key automatically generated by SQL Server
The problems with Natural Keys:
- Because they have a business meaning, if that meaning changes (eg. they change their
surname), then that value NEEDS to changed. Changing a value with data is a little
hard - but a lot easier with Cascade Update.
- The main problem is that the key is large and combined and this needs to be used
in all joins
The Problem with Acquired Surrogate Keys:
- A surrogate key has no meaning to a user
- It always requires a join when browsing a child table eg. The InvoiceDetail table
The Problem with Derived Surrogate
- The user needs to enter a unique value
- Because they have a business meaning, if that meaning changes (eg. they change their
company name), then that value MAY NEED to changed. Changing a value with data is
a little hard - but a lot easier with Cascade Update
- More likely to have a problem with Merge Replication
The Problem with GUID key
We like GUID keys. However, GUID generation produces essentially random numbers
which cannot realistically be used as primary keys since new rows are inserted into
the table at random positions leading to extremely slow inserts as the table grows
to even a moderate size. Inserting into the middle of a table with clustered index,
rather than appending to the end can potentially cause the database to have to move
large portions of the data to accommodate space for the insert. This can be very
slow.
Recommendations
- We dont use Natural keys ever
- We use Acquired Surrogate for some tables
- eg. Invoice table
- eg. Receipt table
- a combination of Acquired Surrogate and Derived Surrogate for other tables
- eg. Customer table
- eg. Employee table
- eg. Product table
When we say combination because if the user doesn't enter a value then we put a
random value in (by a middle tier function, so it works with Access or SQL). eg.
ClientID JSKDYF
The user can then change the value to anything else and we validate it is not used,
and then perform a cascade update - or if it is more then 3 levels deep we execute
a stored proc. Unfortunately this is a complicated proc that cycles through all
related tables and performs an UPDATE. Here is an
example
The Derived Surrogate has the benefit being easy for people to remember and can
be used in the interface or even the query string
Over the years experience has lead me to the opinion that the natural vs surrogate
key argument comes down to a style issue. If a client or employer has a standard
one way or another, fine use it. If not, use whichever you method you prefer, recognizing
that there may be some annoyances you face down the road. But don't let somebody
criticize you because your style doesn't fit his preconceived notions.
Links
-
Maximum row size for a table
A tables' maximum row size should be less than the size of a single SQL Server
data page (8060 bytes). Otherwise, data entry forms can give errors is not validated
correctly.
-
Do you create primary key on your tables?
When you specify a PRIMARY KEY constraint for a table, SQL Server enforces data
uniqueness by creating a unique index for the primary key columns. This index also
permits fast access to data when the primary key is used in queries.
Although, strictly speaking, the primary key is not essential you can update tables
in SQL Enterprise Manager without it - we recommend all tables have a primary key
(except tables that have a high volume of continuous transactions). Especially,
when you have a client like Access, it would help you to avoid the problems.
-
Do you create clustered index on your tables?
You're allowed one clustered index per table, so unless you are never going to query
a table, you may as well choose a field to be part of a clustered index. Basically,
1) Every table should have a clustered index;
2) The clustered index should be a unique clustered index where possible;
3) The clustered index should be on a single column where possible;
So how do you choose the right field? Depending on the usage pattern of a table,
clustered indices should be created. If sets of related records are regularly retrieved
from a table in an application, a clustered index could dramatically improve performance.
For example, in an Order to OrderDetails relationship with OrderID as the joining
key, items in an order are regularly retrieved in a bundle. A clustered index on
the OrderID column in OrderDetails table will improve the performance of the application
significantly.
Another example, if a table is frequently used for reporting, and a date range is
used to define the time scope of the report, a clustered index on the date column
is suitable. In more technical terms, if queries such as
SELECT * FROM ReportTable WHERE ItemDate BETWEEN 1/1/2003 AND 1/2/2003
is executed frequently, ItemDate is a good candidate column for a clustered index.
[more information on Clustered Indexes]
|
SQL Auditor will check this rule and generate a script
to create a clustered index on the columns selected in the wizard
|
-
Most applications do not require the range and precision offered by the DateTime
data type. When was the last time you needed to enter an order past the year of
2079? So you end up with better data integrity. Most business applications never
need dates outside the range of 1900-2079.
More Information:
In addition (I don't really care about this) but I get a smaller database.
DateTime type takes up 8 bytes. It can store dates ranging from January 1, 1753,
to December 31, 9999, with time values rounded to increments of .000, .003, or .007
milliseconds.
A SmallDateTime type takes up only 4 bytes, as a consequence, it can only store
dates ranging from January 1, 1900, through June 6, 2079, with accuracy to the minute.
With a million records each with two date fields, you could save 8MB of storage
space. More space could actually be saved if you have indices on those columns.So
that is about 1 cent worth today :-)
|
SQL Auditor will check this rule and generate a script
to change the datatype of datetime columns to smalldatetime
|
-
Do not use indexes on RowGuid column
RowGuids (uniqueidentifier) are large fields (16 bytes) and are basically going
to ALWAYS be unique.
SQL Server adds a RowGUID column to all tables if you are using Merge Replication
(but doesn't add an index).
RowGuids in general slow things down. Some people may consider using a RowGuid as
their primary key. This is a bad idea because the index is going to be quite slow....
you are searching a large field. It goes without saying, NEVER have clustered index
on a RowGuid column.
Another little annoyance with RowGuids is when you are searching for one. You can't
use > or < on a RowGuid column.
Note: There are not many cases where a RowGuid should have an index on it.
(Exception SSW SQL Total Compare which is a tool that compares data is in sync via
rowguids and this makes it lots faster).
Be aware that SQL server adds this column when you perform merge replication. There
are not many cases where this should have an index on it. An exception is if you
are using our utility SQL Total Compare
|
SQL Auditor will check this rule and generate a script
to drop any indexes on rowguid columns
|
-
Do you have a timestamp column?
The SQL Server timestamp data type has nothing to do with times or dates. SQL Server
timestamps are binary numbers that indicate the relative sequence in which data
modifications took place in a database.
All tables should have a timestamp column to aid concurrency checking. A timestamp
improves update performance because only one column needs to be checked when performing
a concurrency check (instead of checking all columns in a table for changes).
Be aware that when replicating with a SQL Server CE Pocket PC device using SQL server,
a timestamp column is added automatically.
|
SQL Auditor will check this rule and generate a script
to add a timestamp column to tables that do not have a timestamp column
|
-
Use FillFactor of 90% for indexes and constraints
Indexes should generally have a fillfactor of 90%. If the amount of data stored
in the database does not prohibit rebuilding indexes, a fillfactor of 90% should
be maintained to increase performance of inserts.
A table that expects a lot of insert operations could use a lower fillfactor.
|
SQL Auditor will check this rule and generate a script
to change FillFactor to 90% for indexes and constraints
|
-
Do not use table names longer than 24 characters
If a SQL Server table name is longer than 24 characters and is linked to an Access
front-end, characters after the 24th will be truncated
[more...]
-
Do you always have version tracking tables?
We always use two tables for tracking versioning information:
- _zsDataVersion tracks the schema changes, and which update script we are up to.
This helps tremendously in determining which version of the scripts are still required
between development, test, and production databases.
- _zsVersionLatest tracks which version the front-end client should be. This allows
us to give a warning to (or even deny) users who are connecting to the database
while not using the right version of the front-end client.
Please see "Is a Back-end structural
change going to be a hassle?" on our Rules to Successful Projects.
-
Do you validate your denormalized fields correctly?
90% of the databases that SSW works with make use of denormalized fields. We believe
this is with good reason. However, several precautions should be taken to ensure
that the data held within these fields is reliable. This is particularly the case
several applications are updating your denormalized data. To illustrate, let's say
that we want to show all Customers with a calculated field totalling their order
amount (ie Customer.OrderTotal).
With this example in mind, the main reasons we use denormalized fields are:
- reducing development complexity. A denormalized field can mean that all SELECT queries
in the database are simpler. Power users find it easier to use for reporting purposes
- without the need for a cube. In our example, we would not need a large view to
retrieve the data (as below).
SELECT Customer.CustomerID, SUM(SalesOrderDetail.OrderQty * (SalesOrderDetail.UnitPrice
- SalesOrderDetail.UnitPriceDiscount)) AS DetailTotal, Customer.SalesPersonID, Customer.TerritoryID,
Customer.AccountNumber, Customer.CustomerType, Customer.ModifiedDate, Customer.rowguid
FROM Customer INNER JOIN SalesOrderHeader ON Customer.CustomerID = SalesOrderHeader.CustomerID
INNER JOIN SalesOrderDetail ON SalesOrderHeader.SalesOrderID = SalesOrderDetail.SalesOrderID
GROUP BY Customer.CustomerID, Customer.SalesPersonID, Customer.TerritoryID, Customer.AccountNumber,
Customer.CustomerType, Customer.ModifiedDate,Customer.rowguid ORDER BY Customer.CustomerID
Figure: A view to get customer totals when no denormalized fields
are used
If we had a denormalized field, the user or developer would simply have run the
following query:
SELECT Customer.CustomerID, Customer.OrderTotal
AS DetailTotal
FROM Customer
ORDER BY Customer.CustomerID
Figure: Queries are much simpler with denormalized fields
Note that this is not a particularly complicated example. However, you can see why
it can simplify development greatly when working with a large number of tables
- Performance is better for read-intensive reports - particularly when reporting on
data with a cube
- when there a multiple tables in a SQL Server view, they can not be updated in one
hit - they must be updated one table at a time
- It is a built-in validation device. For example, if records are accidentally deleted
directly in the database, there is is still a validation check for the correct totals.
The value of this is mitigated when there is a full audit log on the database
However, there are reasons against using denormalized fields
- they have to be maintained - and can potentially get out of synch. This can makes
them unreliable - particularly if several applications are incorrectly updating
the denormalized fields. UPDATE, INSERT, DELETEs are more complicated as they have
to update the denormalized fields
- they can be seen as an unneccessary waste of space
All in all, we choose to still use denormalized fields because they can save development
time. We do this with some provisos. In particular, they must be validated correctly
to ensure the integrity of the data.
Here is how we ensure that this data is validated:
- Change the description on any denormalized fields to include ";Denormalized" in
the description - ";Denormalized: Sum(OrderTotal) FROM Orders" in description in
Enterprise Manager
- Create a view that lists all the denormalized fields in the database - based on
the decription field.
CREATE VIEW dbo.vwValidateDenormalizedFields
AS
SELECT OBJECT_NAME(id) AS TableName, COL_NAME(id, smallid) AS ColumnName,
CAST([value] AS VARCHAR(8000)) AS Description,
'procValidate_' + OBJECT_NAME(id) + '_' + COL_NAME(id, smallid) as
ValidationProcedureName
FROM dbo.sysproperties
WHERE (name = 'MS_Description') AND (CAST([value] AS VARCHAR(8000))
LIKE '%;Denormalized:%')
Figure: Standard view for validation of denormalized fields validators
- Create a stored procedure (based on the above view) that validates whether all denormalized
fields have a stored procedure that validates the data within them
CREATE PROCEDURE procValidateDenormalizedFieldValidators AS
--Checks whether there is a validator for the denormalized field (DDK 6/6/2005)
SELECT ValidationProcedureName as MissingValidationProcedureName
FROM
vwValidateDenormalizedFields
WHERE
ValidationProcedureName
NOT IN
(
SELECT
ValidationProcedureName
FROM
vwValidateDenormalizedFields
LEFT JOIN
sysobjects
ON vwValidateDenormalizedFields.ValidationProcedureName =OBJECT_NAME(sysobjects.id)
WHERE id IS NOT NULL
Figure: Standard stored procedure for validation of denormalized
fields validators
*Note: Moved to
http://sharepoint.ssw.com.au/Standards/SoftwareDevelopment/RulesToBetterSQLServerSchemaDeployment/Pages/DoYouCheckYourDenormalizedFieldIsStillThereWithprocValidate.aspx
-
Do you avoid using user-schema separation?
User-schema separation is a new feature introduced in SQL 2005.
In SQL 2000
- All objects are owned by users
- If a user is deleted, all these objects must be deleted or have the owner reassigned
- In script the naming convention is databaseName.ownerName.objectName
- You need to update all scripts when a user changes.
User-schema separation solves this problem by adding another level of naming, and
shifting ownership of database objects to the schema, not the user. So, is it worth
doing? Unless you are working with a very large database (100+ tables), the answer
is "no". Most smaller databases have all objects with owner "dbo", which is fine
in most cases.
-

- Figure: AdventureWorks using user schema - instead,
keep it simple and avoid using user schema unnecessarily
-

- Figure: Adventure works with user schema cleaned
out (Good). Much simpler and more readable
-
Do you use triggers for denormalized fields?
I believe that de-normalised fields are not a bad thing. When used properly and
sparingly, they can actually improve your application's performance. As an example:
- I have an Orders table containing one record per order
- I also have an OrderItems table which contains line items linked to the main OrderID,
as well as subtotals for each line item
- In my front end I have a report showing the total for each order
To generate this report, I can either
- Calculate the Order total by summing up every single line item for the selected
Order every time the report is loaded, or
- Store the Order subtotal as a de-normalised field in the Orders table which gets
updated using trigger
The second option will save me an expensive JOIN query each time because I can just
tack the denormalised field onto the end of my SELECT query.
- Code: Alter Orders table
ALTER TABLE Orders
ADD SumOfOrderItems money NULL
- Code: Insert trigger
Alter Trigger tri_SumOfOrderItems
On dbo.OrderItems
For Insert
AS
DECLARE @OrderID varchar (5)
SELECT @OrderID = OrderID from inserted
UPDATE Orders
SET Orders.SumOfOrderItems = Orders.SumOfOrderItems +
(SELECT isnull(SUM(ItemValue),0) FROM inserted WHERE inserted.OrderID = Orders.OrderID)
WHERE Orders.OrderID = @OrderID
- Code: Update trigger
Alter Trigger tru_SumOfOrderItems
On dbo.OrderItems
For Update
AS
DECLARE @OrderID varchar (5)
SELECT @OrderID = OrderID from deleted
--Could have used inserted table
UPDATE Orders
SET Orders.SumOfOrderItems = Orders.SumOfOrderItems
+ (SELECT isnull(SUM(ItemValue),0) FROM inserted WHERE inserted.OrderID = Orders.OrderID)
- (SELECT isnull(SUM(ItemValue),0) FROM deleted WHERE deleted.OrderID = Orders.OrderID)
WHERE Orders.OrderID = @OrderID
- Code: Delete trigger
Alter Trigger trd_SumOfOrderItems
On dbo.OrderItems
For Delete
AS
DECLARE @OrderID varchar (5)
SELECT @OrderID = OrderID FROM deleted
UPDATE Orders
SET Orders.SumOfOrderItems = Orders.SumOfOrderItems -
(SELECT isnull(SUM(ItemValue),0) FROM deleted WHERE deleted.OrderID = Orders.OrderID)
WHERE Orders.OrderID = @OrderID
- Code: Maintenance stored procedure
--Stored Procedure for Maintenance
Alter Procedure dt_Maintenance_SumOfItemValue
As
UPDATE Orders
SET Orders.SumOfOrderItems = Isnull((SELECT SUM (ItemValue) FROM OrderItems WHERE OrderItems.OrderID = Orders.OrderID),0)
-
Do you create a consistent primary key column on
your tables?
Make sure you created a consistent primary key column named [TableName]+"ID" on
your tables.
-
Employee.EmployeeID
-
Figure: good example.
-
Employee.ID, Employee.Employee_Code, Employee.Employee
-
Figure: bad example.
-
Do you use separate lookup tables rather than one large lookup table for your lookup
data?
Advantage: Simplifies ORM Mapping
We prefer multiple lookup tables so they make more sense in ORM tools. E.g. you
could have either:
1. OrderType
Or
2. LookupTable
But when you are obtaining the OrderType for an order, you would have
Either
Order.OrderType.OrderTypeID (Good)
Or
Order.LookupTable.Value (Not great as it is not clear what the nature of the lookup
table is). If you have multiple lookups to the one table, you would need
to do your mappings manually rather than using a tool.
Advantage: Maintains Complete Referential Integrity without the need for triggers
Advantage: Maintains Complete Referential Integrity without the need for triggers
The other advantage of having separate lookup tables rather than one large one is
that referential integrity is maintained.
One issue with having one large table is that you can still enter invalid values
in the Order.OrderTypeID column. E.g. if Order TypeIDs range from 1-3 and CustomerTypeIDs
range from 4 to 10.
If I put OrderTypeID = 10, then I will not get referential integrity errors (even
though I should) because I have entered a value which exists in the lookup table
(even though it is for the wrong type).
If I want to enforce referential integrity so I can only enter the correct type
for my lookup table, then I would need to resort to triggers or a (fallible) coded
data tier.
Advantage: You can add new columns specific to each lookup table
For example, if a Lookup table (e.g. CustomerType) has an associated value (e.g.
the field MaximumDebtAmount), we don't need to add a field that is irrelevant to
all the other lookup tables. We can just add it to the individual lookup table.
Disadvantage: Multiple tables make maintenance slightly more difficult, especially
when making changes directly via Management Studio.
It is simpler to Administer one table than multiple tables, but you can reduce this
problem with a good Generic Administration Page UI.
-
Do you avoid de-normalized fields with computed columns?
We should always use computed columns (in SQL Server 2005 and later they can be
persisted) to avoid these types of denormalized columns.
-
-
Figure: bad example.
-
-
Figure: good example.
Computed columns has some limitations - they cannot access fields in other tables,
or other computed fields in the current table.
We use user defined functions (UDF) to encapsulate our logic in reusable functions,
this allows one computed column to use a function to call another function.
Use the suffix Computed to clearly distinguish that this field is a computed field.
-
ALTER FUNCTION [dbo].[udfEmpTime_TimeTotalComputed]
(
@TimeStart as DateTime,
@TimeEnd as DateTime
)
RETURNS DECIMAL(8,6)
AS
BEGIN
-- This function returns the time difference in hours - decimal(8,6)
RETURN (round(isnull(CONVERT([decimal](8,6),@TimeEnd - @TimeStart,(0))*(24),(0)),(2)))
END
-
Figure: This is the user defined function.
-
-
Figure: Setting up computed column in table designer.
-
Do you add zs prefix to table name?
Any type of table in a database where that does not contain application data should
be called zs. So when the other application (e.g. SSW SQL Deploy) or the programmer
populates the table then it should be called zs (e.g. zsDate - the program populates
it, zsVersion - the programmer populates it).
-
Do your stored procedures return a value indicating
the status?
Make sure your stored procedures always return a value indicating the status. All
stored procedures should return the error number (if an error) or a 0 to indicate
no errors (ie success).
-
Do you standardize on the return values of stored
procedures for success and failures?
Standardize on the return values of stored procedures for success and failures.
-
If you need to return the value of variables, do
you use OUTPUT parameters?
The RETURN statement is meant for returning the execution status only, but not data.
If you need to return value of variables, use OUTPUT parameters. There is a compelling
reason for this - if you use return values rather than output values to return data,
money values that you return will silently be truncated.
For more information, see the following KB article
-
Do you check the global variable @@ERROR after
executing a data manipulation statement?
Always check the global variable @@ERROR immediately after executing a data manipulation
statement (like INSERT/UPDATE/DELETE), so that you can rollback the transaction
in case of an error (@@ERROR will be greater than 0 in case of an error). This is
important, because, by default, SQL Server will not rollback all the previous changes
within a transaction if a particular statement fails. This behaviour can be changed
by executing SET XACT_ABORT ON. The @@ROWCOUNT variable also plays an important
role in determining how many rows were affected by a previous data manipulation
(also, retrieval) statement, and based on that you could choose to commit or rollback
a particular transaction.
-
Do you use SCOPE_IDENTITY() to get the most recent row
identity?
When inserting a row in a stored procedure, always use SCOPE_IDENTITY() if you want
to get the ID of the row that was just inserted. A common error is to use @@IDENTITY,
which returns the most recently created identity for your current connection, not
necessarily the identity for the recently added row in a table. You could have a
situation where there is a trigger that inserts a new record in a Logs Table, for
example, when your Stored Procedure or INSERT SQL Statement inserts a record in
the Orders Table. If you use @@IDENTITY to retrieve the identity of the new order,
you will actually get the identity of the record added into the Log Table and not
the Orders Table, which will create a nasty bug in your data access layer. To avoid
the potential problems associated with someone adding a trigger later on, always
use SCOPE_IDENTITY() to return the identity of the recently added row in your INSERT
SQL Statement or Stored Procedure.
Behold this example from SQL Server Books online.
USE tempdb
GO
CREATE TABLE TZ (
Z_id int IDENTITY(1,1)PRIMARY KEY,
Z_name varchar(20) NOT NULL)
INSERT TZ
VALUES ('Lisa')
INSERT TZ
VALUES ('Mike')
INSERT TZ
VALUES ('Carla')
SELECT * FROM TZ
--Result set: This is how table TZ looks.
Z_id Z_name
-------------
1 Lisa
2 Mike
3 Carla
CREATE TABLE TY (
Y_id int IDENTITY(100,5)PRIMARY KEY,
Y_name varchar(20) NULL)
INSERT TY (Y_name)
VALUES ('boathouse')
INSERT TY (Y_name)
VALUES ('rocks')
INSERT TY (Y_name)
VALUES ('elevator')
SELECT * FROM TY
--Result set: This is how TY looks:
Y_id Y_name
---------------
100 boathouse
105 rocks
110 elevator
/*Create the trigger that inserts a row in table TY
when a row is inserted in table TZ*/
CREATE TRIGGER Ztrig
ON TZ
FOR INSERT AS
BEGIN
INSERT TY VALUES ('')
END
/*FIRE the trigger and determine what identity values you obtain
with the @@IDENTITY and SCOPE_IDENTITY functions.*/
INSERT TZ VALUES ('Rosalie')
SELECT SCOPE_IDENTITY() AS [SCOPE_IDENTITY]
GO
SELECT @@IDENTITY AS [@@IDENTITY]
GO
Notice the difference in the result sets. As you can see, it's crucial that you
understand the difference between the 2 commands in order to get the correct ID
of the row you just inserted.
SCOPE_IDENTITY
4
/*SCOPE_IDENTITY returned the last identity value in the same scope. This was the insert on table TZ.*/
@@IDENTITY
115
/*@@IDENTITY returned the last identity value inserted to TY by the trigger. This fired because of an earlier insert on TZ.*/
-
Do you SET NOCOUNT ON for production and NOCOUNT
OFF off for development/debugging purposes?
You should use SET NOCOUNT ON for production and NOCOUNT OFF off for development/debugging
purposes (i.e. when you want the rowcounts to display as the messages from your
T-SQL).
According to SQL Server Books Online:
"For stored procedures that contain several statements that do not return much actual
data, this can provide a significant performance boost because network traffic is
greatly reduced."
Example: Procedure that returns a scalar value (ClientID generated by an insert
statement) should use OUTPUT keyword (not RETURN) to pass back data. This is how
you should return a generated ClientID from the procedure, and also return a status
value
CREATE PROCEDURE procClientInsert
/*
'----------------------------------------------
' Copyright 2001 SSW
' www.ssw.com.au All Rights Reserved.
' VERSION AUTHOR DATE COMMENT
' 1.0 DDK 17/12/2001
'
'Calling example
'DECLARE @pintClientID int
'DECLARE @intReturnValue int
'exec @intReturnValue = procClientInsert 'TEST Entry',
@pintClientID OUTPUT
'PRINT @pintClientID
'PRINT @intReturnValue
'----------------------------------------------
*/
@pstrCoName varchar (254),
@pintClientID int OUTPUT
AS
--IF ONE THING FAILS, ROLLBACK
SET XACT_ABORT ON
--THE COUNT WILL NOT NORMALLY DISPLAY IN AN APPLICATION IN PRODUCTION.
--GET RID OF IT BECAUSE IT IS EXTRA TRAFFIC, AND CAN CAUSE
PROBLEMS WITH SOME CLIENTS
SET NOCOUNT ON
--Generate a random number
SET @pintClientID = (SELECT CAST(RAND() * 100000000 AS int))
INSERT INTO Client (ClientID, CoName) VALUES (@pintClientID ,
@pstrCoName)
SET XACT_ABORT OFF
IF @@ROWCOUNT = 1
RETURN 0 -- SUCCESS
ELSE
BEGIN
IF @@ERROR=0
RETURN 1 -- FAILURE
ELSE
RETURN @@ERROR -- FAILURE
END
SET NOCOUNT OFF
This procedure will display 0 or the error to indicate success or failure. You should
base you actions on this return code.
This separates return values from actual data so that other programmers know what
to expect.
Note:
If you are using SQL Server stored procedures to edit or delete data using a SqlDataAdapter, make sure that you do not use SET NOCOUNT ON in the stored procedure definition. This causes the rows affected count returned to be zero, which the DataAdapter interprets as a concurrency conflict. In this event, a DBConcurrencyException will be thrown.
-
Do you keep your Stored Procs simple?
If you are using the .NET Framework, put validation and defaults in the middle tier.
The backend should have the required fields (Allow Nulls = False), but no complicated
constraints. The following are examples that work with the Products table (with
an added timestamp field called Concurrency) from Northwind.
- Code: Select Procedure
ALTER PROCEDURE dbo.ProductSelect
@ProductID int
AS
SELECT ProductName, SupplierID, CategoryID, QuantityPerUnit, UnitPrice, UnitsInStock,
UnitsOnOrder, ReorderLevel, Discontinued, Concurrency
FROM Products
WHERE (ProductID= @ProductID)
- Code: Insert Procedure
ALTER PROCEDURE dbo.ProductInsert
@ProductName nvarchar(40),
@SupplierID int,
@CategoryID int,
@QuantityPerUnit nvarchar(20),
@UnitPrice money,
@UnitsInStock smallint,
@UnitsOnOrder smallint,
@ReorderLevel smallint,
@Discontinued bit
AS
INSERT INTO Products (ProductName, SupplierID, CategoryID, QuantityPerUnit, UnitPrice,
UnitsInStock, UnitsOnOrder, ReorderLevel, Discontinued)
VALUES (@ProductName, @SupplierID, @CategoryID, @QuantityPerUnit, @UnitPrice, @UnitsInStock,
@UnitsOnOrder, @ReorderLevel, @Discontinued, 1)
SELECT Scope_Identity() AS [SCOPE_IDENTITY] --If table has identity column
--SELECT @@ROWCOUNT --If table doesn't have identity column
-- Note: The middle tier must check the ROWCOUNT = 1
- Code: Update Procedure
ALTER PROCEDURE dbo.ProductUpdate
@ProductID int,
@ProductName nvarchar(40),
@SupplierID int,
@CategoryID int,
@QuantityPerUnit nvarchar(20),
@UnitPrice money,
@UnitsInStock smallint,
@UnitsOnOrder smallint,
@ReorderLevel smallint,
@Discontinued bit,
@Concurrency timestamp
UPDATE Products
SET ProductName = @ProductName,
SupplierID = @SupplierID,
CategoryID = @CategoryID,
QuantityPerUnit = @QuantityPerUnit,
UnitPrice = @UnitPrice,
UnitsInStock = @UnitsInStock,
UnitsOnOrder = @UnitsOnOrder,
ReorderLevel = @ReorderLevel,
Discontinued = @Discontinued
WHERE (Concurrency = @Concurrency) AND (ProductID= @ProductID) --Note the double criteria to ensure concurrency
SELECT @@ROWCOUNT
-- Note: The middle tier must check the ROWCOUNT = 1
- Code: Delete Procedure
ALTER PROCEDURE dbo.ProductDelete
@ProductID int,
@Concurrency timestamp
AS
DELETE FROM Products
WHERE (ProductID= @ProductID) AND (Concurrency = @Concurrency)
--Note the double criteria to ensure concurrency
SELECT @@ROWCOUNT
--Note: The middle tier must check the ROWCOUNT = 1
-
Do not start user stored procedures with system
prefix "sp_" or "dt_"
System stored procedures are created and stored in the master database and have
the sp_ prefix. System stored procedures can be executed from any database without
having to qualify the stored procedure name fully using the database name master.
It is strongly recommended that you do not create any stored procedures using sp_
as a prefix. SQL Server always looks for a stored procedure beginning with sp_ in
this order:
- The stored procedure in the master database.
- The stored procedure based on any qualifiers provided (database name or owner).
- The stored procedure using dbo as the owner, if one is not specified.
Therefore, although the user-created stored procedure prefixed with sp_ may exist
in the current database, the master database is always checked first, even if the
stored procedure is qualified with the database name.
Important: If any user-created stored procedure has the same name
as a system stored procedure, the user-created stored procedure will never be executed.
|
SQL Auditor will check this rule and generate a script
to replace user-defined stored procedures prefix of 'sp_' or 'dt_' with prefix defined
in Tools > Options
|
-
Use company standard description in your stored
procedures
All stored procedures Must Have Company Description.
-
Do you avoid using SELECT * when inserting data?
Using a statement like "INSERT tableName SELECT * FROM otherTable", makes your stored
procedures vulnerable to failure. Once either of the two tables changs, your stored
procedure won't work. Not only that, when the inserting table has an identity column,
such a statement will cause an error - "An explicit value for the identity column
in table ParaRight can only be specified when a column list is used and IDENTITY_INSERT
is ON."
-
USE [ParaGreg]
GO
/****** Object: StoredProcedure [dbo].[procMove] Script Date: 08/08/2008 12:18:33 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[procMove]
@id AS Char,
@direction AS INT
AS
IF @direction = 0
BEGIN
INSERT INTO ParaRight
SELECT * FROM ParaLeft
WHERE ParaID = @id
DELETE FROM ParaLeft
WHERE ParaID = @id
END
ELSE IF @direction = 1
BEGIN
INSERT INTO ParaLeft
SELECT * FROM ParaRight
WHERE ParaID = @id
DELETE FROM ParaRight
WHERE ParaID = @id
END
-
Bad example: Using SELECT * when inserting data. Besides, this stored procedure
should have an Else section to raise error when no condition is satisfied.
-
USE [ParaGreg]
GO
/****** Object: StoredProcedure [dbo].[procMove] Script Date: 08/08/2008 12:18:33 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[procMove]
@id AS Char,
@direction AS INT
AS
IF @direction = 0
BEGIN
INSERT INTO ParaRight
SELECT Col1,Col2 FROM ParaLeft
WHERE ParaID = @id
DELETE FROM ParaLeft
WHERE ParaID = @id
END
ELSE IF @direction = 1
BEGIN
INSERT INTO ParaLeft
SELECT * FROM ParaRight
WHERE ParaID = @id
DELETE FROM ParaRight
WHERE ParaID = @id
END
ELSE BEGIN PRINT "Please use a correct direction"
END
-
Good example: Using concrete columns instead of * and provide an Else section
to raise errors.
-
Do you use transactions for complicated stored procedures?
A transaction means an atomic operation, it assures that all operations within the
transaction are successful, if not, the transaction will cancel all operations and
roll back to the original state of the database, that means no dirty data and mess
exists in the database, so if a stored procedure has many steps, and each step has
relation with other steps, it is strongly recommended that you encapsulate the procedure
in a transaction.
-
ALTER PROCEDURE [dbo].[procInit]
AS
DELETE ParaLeft
DELETE ParaRight
INSERT INTO ParaLeft (ParaID)
SELECT ParaID FROM Para
-
Bad example: No transaction here, if any of operations fail, the database
will only partially update, resulting in an unwanted result.
-
ALTER PROCEDURE [dbo].[procInit]
AS
BEGIN TRANSACTION
DELETE ParaLeft
DELETE ParaRight
INSERT INTO ParaLeft (ParaID)
SELECT ParaID FROM Para
COMMIT
-
Good example: Using a transaction to assure that all operations within the
transaction will be successful, otherwise, the database will roll back to original
state.
-
Do you know SQL stored procedure names should be prefixed with the owner?
Always specify the schema prefix when creating stored procedures. This way you know that it will always be dbo.procedure_name no matter who is logged in when it is created.
There are 2 other benefits to including the schema prefix on all object references:
- This prevents the database engine from checking for an object under the users schema first
- Also avoids the issue where multiple plans are cached for the exact same statement/batch just because they were executed by users with different default schemas
Aaron Bertrand agrees with this rule - My stored procedure "best practices" checklist .
-
CREATE PROCEDURE procCustomer_Update
@CustomerID INT,
…..
BEGIN
- Figure: Bad example
-
CREATE PROCEDURE dbo.procCustomer_Update
@CustomerID INT,
…..
BEGIN
- Figure: Good example