Welcome to Julian Kuiters Friday, October 19 2018 @ 09:00 AM AEDT

LEN(NULL) != 0

Database Design

Its pretty often that I run across SQL code that supposed to handle empty or null varchar values. But a lot of the time it doesn't. And usually its because a developer has made the false assumption that null has a length of 0. It doesn't.


LEN(NULL) does not = anything.
LEN(NULL) does not != anything.

If you want to handle rows where there is a possibility of the value being null or an empty string, you have to handle both conditions:

WHERE column IS NULL or LEN(column) = 0

Here's an example to showing you LEN(column) = 0 wont return null rows.

-- Create a table with nullable varchar column
CREATE TABLE dbo.lentest (
col1 VARCHAR(32) NULL,
col2 VARCHAR(128) NULL);

-- Insert some mixed value rows
INSERT INTO dbo.lentest (col1, col2) VALUES ('Some text', 'A String of Text');
INSERT INTO dbo.lentest (col1, col2) VALUES ('', 'An empty String');
INSERT INTO dbo.lentest (col1, col2) VALUES (NULL, 'Row with a null Value');

-- Notice the Col1Length values ?
-- LEN(NULL) is actually NULL
SELECT col1,
Len(col1) AS col1length,
FROM dbo.lentest;

-- This only returns one row
SELECT col1,
Len(col1) AS col1length,
FROM dbo.lentest
WHERE Len(col1) = 0;

-- and the negative logic also returns only one row
SELECT col1,
Len(col1) AS col1length,
FROM dbo.lentest
WHERE Len(col1) != 0;

-- hey! what happened to that third row!?
SELECT col1,
Len(col1) AS col1length,
FROM dbo.lentest
WHERE Len(col1) IS NULL ;

-- the above is really the same as saying
SELECT col1,
Len(col1) AS col1length,
FROM dbo.lentest

-- so to get all the rows that have a null or empty value
SELECT col1,
Len(col1) AS col1length,
FROM dbo.lentest
WHERE col1 IS NULL OR Len(col1) = 0;

You can download this sample.

How Common Are SQL Injection Vulnerabilities?

Database Design

Michael Sutton has run a test to see how many websites have SQL injection issues and finds that 11% of the websites he tests are potentially vulnerable.

The article is well written in that he discusses the methods used to select urls for testing, the change he uses to attempt a non-destructive SQL injection, and how he determines a vulnerability exists.

Michael isn't just testing SQL Server (read the Comments as well) the method will work with just about any database driven website.

You should remember that SQL Injection Attacks can occur whenever the application layer does not correctly verify and restrict input parameters. In the case of the websites Michael found vulnerabilities in, there are actually two big issues:

  1. User input was not correctly tested and typecast (id values should have been converted to a number, single quotes should have been escaped)
  2. Error messages were printed out for the user to see (an error handling routine should have hidden this)

In the case of the IIS Webservice in Windows you can create a default 500:100 error page which is displayed whenever an error occurs in your web application. This page should not show the error message to the user, just a polite "sorry we had a problem, please try again". The page should also alert the website administrator of the error and full details should be recorded so the problem can be correctly handled in the future. Using a 500:100 page can help catch any errors a developer may have missed.

Spitting out an internal error message from SQL or any programming language is going to confuse a user. It also tells hackers information about your servers you really don't want them to know.

When MONEY is not enough

Database Design

SQL Server's MONEY data type is where you should be looking first if you are storing something like prices and order totals. But what about pesky GST and VAT values? These sometimes need to extend out to 6, 8 or even more decimal places in order to calculate the correct tax inclusive price.

Along comes the DECIMAL data type. It's configurable in that you get to choose how many digits maximum are in your number, and how many of those are after the decimal place.

From SQL Server Books Online "Money Data Type, Overview":

Monetary data can be stored to an accuracy of four decimal places.
If a greater number of decimal places are required, use the decimal data type instead.

DECIMAL is an acceptable substituate for the MONEY / SMALLMONEY data types, as it represents an exact number. DECIMAL is synonymous with the NUMERIC and DEC datatypes.

The DECIMAL data type is defined as:

Precision is the total number of digits allowed in the number, both left and right of the decimal point.
Scale is the maximum number of digits allowed to the right of the decimal point. (The number of decimal places).

NEVER USE FLOAT OR REAL DATATYPES FOR MONETARY VALUES! Both FLOAT and REAL are approximate number data types. The number you put in, and the number you get out will almost never be the same.

Here are some example of how DECIMAL can be used:

-- 4 decimal places
SELECT CAST(1.1234567890123456789 AS DECIMAL(22,4))


-- 8 decimal places
SELECT CAST(1.1234567890123456789 AS DECIMAL(22,8))


-- 12 decimal places
SELECT CAST(1.1234567890123456789 AS DECIMAL(22,12))


Improving BCP / BULK INSERT Performance by Disabling Auto Statistics

Database Design

One of the tasks regularly performed in ETL processes is using BCP or BULK INSERT to import large amounts of data into a empty database. You might be using these tools to import a single file into a table, or many files into a table. Either way, SQL Server is going to want to update the statistics used by the SQL Server Query Optimiser for decision making.

Its important to keep up-to-date statistics for the optimiser to use, but if your ETL process calls for populating an empty database before using the data, you may get a performance increase from turning off automatic creation / updating of statistics. Why? Because SQL Server could decide to create/update stats in the background while you are populating tables, thus degrading BCP / BULK INSERT performance. This problem would be most obvious if you import multiple files into a table. After each import, SQL Server would determine the statistics to be out-of-date and potentially update them in between your BCP / BULK INSERT calls.

So how can you optimise your database for BCP / BULK INSERTs and still have stats for your queries post-population ?

Easy: Turn off Auto-create stats and Auto-update stats after creating the database, and use sp_createstats to create statistics afterwards.

Turning off Auto Create Stats and Auto Update Stats In SQL Server 2000

exec sp_dboption N'databasename', 'auto create statistics', 'false'

exec sp_dboption N'databasename', 'auto update statistics', 'false'

Note: SQL Server 2005 documentation stats the sp_dboption feature will be removed from future versions of SQL Server, and suggests using ALTER DATABASE instead.

In SQL Server 2005



Generating statistics after import process is complete In SQL Server 2000 and SQL Server 2005

-- to generate statistics quickly using a sample of data from the table
exec sp_createstats

-- to generate statistics using a full scan of the table
exec sp_createstats @fullscan = 'fullscan'

With the statistics created, you may want to turn the auto stats options back on. Your database should now be ready for querying.

More performance tips and details about statistics can be found here, over at Sql-Server-Performance.com

Changing a Database File Logical Name

Database Design

Applies to: SQL Server 2000, SQL Server 2005 (Yukon)

It's pretty often I see the need for a database to be renames. Sometimes its because a developer has given their database a very generic name, or sometimes becuase the use of the database has shifted.

Renaming a database is a pretty easy task (make sure you have renamed the physical files to reflect the new database name).

Restoring a database will also see it using the original logical filenames.

Most often however, people forget to change the logical filename for their database files. While this doesn't have any real implications, it can cause some confusion.

Say for example you create a database Flowers_Dev02 by restoring a backup of Flowers_Dev01. If you were to run select * from sysfiles on both databases, you might get something like:

fileid ... name                filename
------     ---------------------- -------------------
1          Flowers_Dev01_Data   f:/Flowers_Dev01.mdf
2          Flowers_Dev01_Log    f:/Flowers_Dev01.ldf

fileid ... name                filename
------     ---------------------- -------------------
1          Flowers_Dev01_Data   f:/Flowers_Dev02.mdf
2          Flowers_Dev01_Log    f:/Flowers_Dev02.ldf

Renaming the logical file name is a simple task, and doesn't require any down time of the database. The syntax of the t-sql command is simply:

alter database database_name modify file (name = current_logical_name, newname = new_logical_name)

So in our case we would simply run the following to fix our logical database file names:

alter database Flowers_Dev02 modify file (name = Flowers_Dev01_Data, newname = Flowers_Dev02_Data)
alter database Flowers_Dev02 modify file (name = Flowers_Dev01_Log, newname = Flowers_Dev02_Log)

Column Confusion

Database Design - Don't start column names with FK_
- Always name a FK column the same as the PK

I happened to glance a database model today that had a column naming scheme that scared me: column names that started with FK_.

I'm pretty sure the designer was intending to show that the column was actually used as part of a Foreign Key (FK), however the only object names that should start with FK_ are infact foreign keys. Having other types of objects starting with FK_ could create a lot of confusion, especially for developers who might be left wondering if they are supposed to populate this column or not.

This led directly into the next thing I noticed in the diagram: columns being renamed across tables. I'm talking about a Primary Key (PK) column in tableA called TableAID being used as the FK in tableB, but being called LinkToTableAPK. This makes it hard for someone glancing the database diagram to see the relationship between these columns. Whatever the column is named in TableA, it should be called in TableB.

Wherever you need to repeat or copy a column from one table to the next you should use the same name. Unless of course the data type is different, or it's meaning is different: in which case it's not the same column anyway.

Keeping the names the same, and not using FK_ in column names will keep your database diagram simple and easy to read.