Its pretty often that I run across SQL code that supposed to handle empty or
null varchar values. But a lot of the time it doesn't. And usually its because a
developer has made the false assumption that null has a length of 0. It doesn't.
LEN(NULL) IS NULL.
LEN(NULL) does not = anything.
LEN(NULL) does not != anything.
If you want to handle rows where there is a possibility of the value being
null or an empty string, you have to handle both conditions:
WHERE column IS NULL or LEN(column) = 0
Here's an example to showing you LEN(column) = 0 wont return null rows.
-- Create a table with nullable varchar column
CREATE TABLE dbo.lentest (
col1 VARCHAR(32) NULL,
col2 VARCHAR(128) NULL);
-- Insert some mixed value rows
INSERT INTO dbo.lentest
(col1,
col2)
VALUES ('Some text',
'A String of Text');
INSERT INTO dbo.lentest
(col1,
col2)
VALUES ('', 'An empty String');
INSERT INTO dbo.lentest
(col1,
col2)
VALUES (NULL,
'Row with a null Value');
-- Notice the Col1Length values ? -- LEN(NULL) is actually NULL SELECT col1,
Len(col1) AS col1length,
col2
FROM dbo.lentest;
-- This only returns one row
SELECT col1,
Len(col1) AS col1length,
col2
FROM dbo.lentest
WHERE Len(col1) = 0;
-- and the negative logic also returns only one row SELECT col1,
Len(col1) AS col1length,
col2
FROM dbo.lentest
WHERE Len(col1) != 0;
-- hey! what happened to that third row!? SELECT col1,
Len(col1) AS col1length,
col2
FROM dbo.lentest
WHERE Len(col1) IS NULL ;
-- the above is really the same as saying SELECT col1,
Len(col1) AS col1length,
col2
FROM dbo.lentest
WHERE col1 IS NULL ;
-- so to get all the rows that have a null or empty value SELECT col1,
Len(col1) AS col1length,
col2
FROM dbo.lentest
WHERE col1 IS NULL OR Len(col1) = 0;
Michael Sutton has
run a test to see how many websites have SQL injection issues and finds that
11% of the websites he tests are potentially vulnerable.
The article is well written in that he discusses the methods used to select
urls for testing, the change he uses to attempt a non-destructive SQL injection,
and how he determines a vulnerability exists.
Michael isn't just testing SQL Server (read the Comments as well) the method
will work with just about any database driven website.
You should remember that SQL Injection Attacks can occur whenever the
application layer does not correctly verify and restrict input parameters. In
the case of the websites Michael found vulnerabilities in, there are actually
two big issues:
User input was not correctly tested and typecast (id values should have
been converted to a number, single quotes should have been escaped)
Error messages were printed out for the user to see (an error handling
routine should have hidden this)
In the case of the IIS Webservice in Windows you can create a default 500:100
error page which is displayed whenever an error occurs in your web application.
This page should not show the error message to the user, just a polite "sorry we
had a problem, please try again". The page should also alert the website
administrator of the error and full details should be recorded so the problem
can be correctly handled in the future. Using a 500:100 page can help catch any
errors a developer may have missed.
Spitting out an internal error message from SQL or any programming language
is going to confuse a user. It also tells hackers information about your servers
you really don't want them to know.
SQL Server's MONEY data type is where you should be looking first if you are storing something like prices and order totals. But what about pesky GST and VAT values? These sometimes need to extend out to 6, 8 or even more decimal places in order to calculate the correct tax inclusive price.
Along comes the DECIMAL data type. It's configurable in that you get to choose how many digits maximum are in your number, and how many of those are after the decimal place.
From SQL Server Books Online "Money Data Type, Overview":
Monetary data can be stored to an accuracy of four decimal places. ... If a greater number of decimal places are required, use the decimal data type instead.
DECIMAL is an acceptable substituate for the MONEY / SMALLMONEY data types, as it represents an exact number. DECIMAL is synonymous with the NUMERIC and DEC datatypes.
The DECIMAL data type is defined as:
DECIMAL(precision,scale) Precision is the total number of digits allowed in the number, both left and right of the decimal point. Scale is the maximum number of digits allowed to the right of the decimal point. (The number of decimal places).
NEVER USE FLOAT OR REAL DATATYPES FOR MONETARY VALUES! Both FLOAT and REAL are approximate number data types. The number you put in, and the number you get out will almost never be the same.
Here are some example of how DECIMAL can be used:
-- 4 decimal places
SELECT CAST(1.1234567890123456789 AS DECIMAL(22,4))
------------------------
1.1235
-- 8 decimal places
SELECT CAST(1.1234567890123456789 AS DECIMAL(22,8))
------------------------
1.12345679
-- 12 decimal places
SELECT CAST(1.1234567890123456789 AS DECIMAL(22,12))
------------------------
1.123456789012
One of the tasks regularly performed in ETL processes is using BCP or BULK INSERT to import large amounts of data into a empty database. You might be using these tools to import a single file into a table, or many files into a table. Either way, SQL Server is going to want to update the statistics used by the SQL Server Query Optimiser for decision making.
Its important to keep up-to-date statistics for the optimiser to use, but if your ETL process calls for populating an empty database before using the data, you may get a performance increase from turning off automatic creation / updating of statistics. Why? Because SQL Server could decide to create/update stats in the background while you are populating tables, thus degrading BCP / BULK INSERT performance. This problem would be most obvious if you import multiple files into a table. After each import, SQL Server would determine the statistics to be out-of-date and potentially update them in between your BCP / BULK INSERT calls.
So how can you optimise your database for BCP / BULK INSERTs and still have stats for your queries post-population ?
Easy: Turn off Auto-create stats and Auto-update stats after creating the database, and use sp_createstats to create statistics afterwards.
Turning off Auto Create Stats and Auto Update Stats
In SQL Server 2000
Note: SQL Server 2005 documentation stats the sp_dboption feature will be removed from future versions of SQL Server, and suggests using ALTER DATABASE instead.
In SQL Server 2005
ALTER DATABASE databasename SET AUTO_CREATE_STATISTICS OFF WITH NO_WAIT
ALTER DATABASE databasename SET AUTO_UPDATE_STATISTICS OFF WITH NO_WAIT
Generating statistics after import process is complete
In SQL Server 2000 and SQL Server 2005
-- to generate statistics quickly using a sample of data from the table
exec sp_createstats
-- to generate statistics using a full scan of the table
exec sp_createstats @fullscan = 'fullscan'
With the statistics created, you may want to turn the auto stats options back on. Your database should now be ready for querying.
Applies to: SQL Server 2000, SQL Server 2005 (Yukon)
It's pretty often I see the need for a database to be renames. Sometimes its because a developer has given their database a very generic name, or sometimes becuase the use of the database has shifted.
Renaming a database is a pretty easy task (make sure you have renamed the physical files to reflect the new database name).
Restoring a database will also see it using the original logical filenames.
Most often however, people forget to change the logical filename for their database files. While this doesn't have any real implications, it can cause some confusion.
Say for example you create a database Flowers_Dev02 by restoring a backup of Flowers_Dev01. If you were to run select * from sysfiles on both databases, you might get something like:
- Don't start column names with FK_
- Always name a FK column the same as the PK
I happened to glance a database model today that had a column naming scheme that scared me: column names that started with FK_.
I'm pretty sure the designer was intending to show that the column was actually used as part of a Foreign Key (FK), however the only object names that should start with FK_ are infact foreign keys. Having other types of objects starting with FK_ could create a lot of confusion, especially for developers who might be left wondering if they are supposed to populate this column or not.
This led directly into the next thing I noticed in the diagram: columns being renamed across tables. I'm talking about a Primary Key (PK) column in tableA called TableAID being used as the FK in tableB, but being called LinkToTableAPK. This makes it hard for someone glancing the database diagram to see the relationship between these columns. Whatever the column is named in TableA, it should be called in TableB.
Wherever you need to repeat or copy a column from one table to the next you should use the same name. Unless of course the data type is different, or it's meaning is different: in which case it's not the same column anyway.
Keeping the names the same, and not using FK_ in column names will keep your database diagram simple and easy to read.
We have over 13 years experience designing and building websites. We can handle websites of all sizes, from a simple web presence to a fully customised database driven website designed to handle thousands of customers a minute.
We can create you a custom website from scratch - that meets your needs as a business. And we can update your existing website if you wish.
Our design services can produce stunning website, print and logo designs that will create a recognisable brand for your business and invoke customer loyalty.
Our aim is to increase your sales and increase your customers through brand recognition and targeted marketing