Monday, September 25, 2006

Loading Legacy Data Into parent/Child Tables With Identities

It is one of those cases when OUTPUT clause comes very handy.

CREATE TABLE Staging(EmployeeCode CHAR(30), FullName VARCHAR(30),
PhoneNumber VARCHAR(30))
INSERT INTO Staging VALUES('ABC1234567890XYZ0987654321', 'Doe, John',
'(123)456-7890 home')
INSERT INTO Staging VALUES('ABC1234567890XYZ0987654321', 'Doe, John',
'(312)456-7890 cell')
INSERT INTO Staging VALUES('ABD123FHSDJKFH367FHASJ4321', 'Doe, Jane',
'(567)456-7891 home')
go

CREATE TABLE Party(PartyID INT IDENTITY PRIMARY KEY, EmployeeCode
CHAR(30), FullName VARCHAR(30))
CREATE TABLE Phone(PartyID INT NOT NULL, PhoneNumber VARCHAR(30))
go

DECLARE @InsertedIds TABLE(PartyID INT, EmployeeCode CHAR(30))


INSERT Party(EmployeeCode, FullName)
OUTPUT INSERTED.PartyID, INSERTED.EmployeeCode INTO @InsertedIds
SELECT DISTINCT EmployeeCode, FullName
FROM Staging


SELECT * FROM Party


PartyID EmployeeCode FullName
----------- ------------------------------
------------------------------
1 ABC1234567890XYZ0987654321 Doe, John
2 ABD123FHSDJKFH367FHASJ4321 Doe, Jane


SELECT PartyID, EmployeeCode FROM @InsertedIds


PartyID EmployeeCode
----------- ------------------------------
1 ABC1234567890XYZ0987654321
2 ABD123FHSDJKFH367FHASJ4321


INSERT Phone(PartyID, PhoneNumber)
SELECT
--(SELECT PartyID FROM @InsertedIds i WHERE Staging.EmployeeCode =
i.EmployeeCode)
PartyID,
PhoneNumber
FROM Staging JOIN @InsertedIds i ON Staging.EmployeeCode =
i.EmployeeCode


SELECT * FROM Phone


PartyID PhoneNumber
----------- ------------------------------
1 (123)456-7890 home
1 (312)456-7890 cell
2 (567)456-7891 home

Tuesday, September 19, 2006

When Eliminataing Cursors Hurts Performance

Not every non-cursor so9lution is performant. I think there are at least two different situations:

1. You open a cursor and use it to modify the underlying table.
2. You open a cursor, do something, but you do not touch the underlying
at all.


I think that we need to distinguish between these two cases. I think
the popular rule of thumb that "cursors should be avoided whatever it
takes" not always applies in case 2.
Have a look at this simple script (Table Numbers stored numbers from 1
to 10000):


DROP PROCEDURE CursorTest
go
CREATE PROCEDURE CursorTest
AS
DECLARE @KeepLooping INT, @n INT, @d1 DATETIME, @d2 DATETIME
DECLARE Test1 CURSOR LOCAL FORWARD_ONLY
FOR SELECT Number
FROM dbo.Numbers
ORDER BY Number


SET @KeepLooping = 1


SET @d1 = GETDATE()


OPEN Test1


WHILE @KeepLooping = 1 BEGIN
FETCH Test1 INTO @n
IF @@FETCH_STATUS <> 0 BEGIN
SET @KeepLooping = 0
END
END


SET @d2 = GETDATE()
SELECT 'CursorTest' testname, DATEDIFF(ms, @d1, @d2) duration


CLOSE Test1
DEALLOCATE Test1
go
DROP PROCEDURE LoopTest
go
CREATE PROCEDURE LoopTest
AS
DECLARE @n INT, @d1 DATETIME, @d2 DATETIME


SET @n = -1
SET @d1 = GETDATE()


WHILE @n IS NOT NULL BEGIN
SELECT @n = MIN(Number)
FROM dbo.Numbers WHERE Number > @n
END
SET @d2 = GETDATE()
SELECT 'LoopTest' testname, DATEDIFF(ms, @d1, @d2) duration
go
EXEC LoopTest
EXEC CursorTest
EXEC LoopTest
EXEC CursorTest
EXEC LoopTest
EXEC CursorTest


testname duration
-------- -----------
LoopTest 30000


(1 row(s) affected)


testname duration
---------- -----------
CursorTest 106


(1 row(s) affected)


testname duration
-------- -----------
LoopTest 32190


(1 row(s) affected)


testname duration
---------- -----------
CursorTest 106


(1 row(s) affected)


testname duration
-------- -----------
LoopTest 29736


(1 row(s) affected)


testname duration
---------- -----------
CursorTest 106


(1 row(s) affected)

How To Quickly Verify That A Date Has No Time Part

I did a quick benchmarking. I always knew that DATEDIFF approach should
be faster than converting to CHAR, but I had no idea it is that faster (at least 7 times faster):


DECLARE @d1 DATETIME, @d2 DATETIME, @i INT, @cnt INT
DECLARE @d TABLE(ddd DATETIME)
SET NOCOUNT ON


SET @i = 0
WHILE @i<100000 BEGIN
INSERT @d VALUES('20060101')
SET @i = @i + 1
END
SET @d1 = GETDATE()
SET @i = (SELECT COUNT(*) FROM @d WHERE (convert(char(8), ddd, 112) =
ddd))
SET @d2 = GETDATE()
SELECT DATEDIFF(ms, @d1, @d2), 'char'


SET @d1 = GETDATE()
SET @i = (SELECT COUNT(*) FROM @d WHERE
(dateadd(d,datediff(d,'1990-01-01',ddd),'1990-01-01') = ddd))
SET @d2 = GETDATE()
SELECT DATEDIFF(ms, @d1, @d2), 'datediff'

----------- ----
346 char
----------- --------
46 datediff

Monday, September 18, 2006

Yet Another Index Covering Tip

Suppose you have a table:

create table a(email_alias varchar(20), full_name varchar(20), /*many other columns*/)

Suppose you need to make sure email_alias are unique. Obviously you will create a unique index on email_alias. Suppose you also need an index on (email_alias, full_name) to cover several frequently run queries. In SQL Server 2005 you can have one and the same index accomplish both goals. Use a new INCLUDE option in CREATE INDEX statement:

create unique index a1 on a(email_alias) include(full_name)

Both columns will be stored in the index, which will guarantee index covering, but the uniqueness of email_alias will be also preserved:

--- succeeds
insert into a values('jsmith', 'Jack Smith')
go
--- fails
insert into a values('jsmith', 'Jared Smith')

Msg 2601, Level 14, State 1, Line 1
Cannot insert duplicate key row in object 'dbo.a' with unique index 'a1'.
The statement has been terminated.

It is very important to keep the number of indexes as low as possible, and the new INCLUDE option comes very handy in accomplishing that goal.

Mimicking MERGE Statement in SQL Server 2005

Suppose you have a table

create table permanent(id int, d float, comment varchar(15))
go
insert into permanent values(1, 10., 'Original Row')
insert into permanent values(2, 10., 'Original Row')
insert into permanent values(3, 10., 'Original Row')
go

And you want to modify it with incoming values from a staging table

create table staging(id int, d float)
go
insert into staging values(2, 15.)
insert into staging values(3, 15.)
insert into staging values(4, 15.)
go

You want to modify rows 2 and 3 and to add a new row 4. In DB2 and oracle there is a very convenient MERGE statement which does precisely this. In SQL Server 2005 you can use OUTPUT clause of an UPDATE statement:

go
declare @updated_ids table(id int)
update permanent set d=s.d, comment = 'Modified Row'
output inserted.id into @updated_ids
from permanent p, staging s
where p.id=s.id

insert into permanent
select id, d, 'New Row' from staging where id not in(select id from @updated_ids)
go
select * from permanent
go
id d comment
----------- ---------------------- ---------------
1 10 Original Row
2 15 Modified Row
3 15 Modified Row
4 15 New Row

Populating a Column with Random Values

Suppose you have a table:

create table t(i int, d float)
insert into t values(1, 0.0)
insert into t values(2, 0.0)
insert into t values(3, 0.0)
insert into t values(4, 0.0)

Suppose you want to populate d with random values:

update t set d = rand()

Unfortunately it will assign one and the same value to all the rows:

select * from t

i d----------- ----------------------
1 0.430774201738828
2 0.430774201738828
3 0.430774201738828
4 0.430774201738828
(4 row(s) affected)


Here is the right way to do it:

create view wrapped_rand_view
as
select rand( ) as random_value
go
create function wrapped_rand()
returns float
as
begin
declare @f float
set @f = (select random_value from wrapped_rand_view)
return @f
end
go
update t set d = dbo.wrapped_rand()

select * from t

i d
----------- ----------------------
1 0.665934939858962
2 0.99919527046644
3 0.844039650356367
4 0.146401486624627
(4 row(s) affected)

Sunday, September 17, 2006

My DB2 UDB articles

1. When We Think That the Optimizer Doesn't Get It Right
Some simple tips to make sure the optimizer makes good decisions for your queries.

http://www.devx.com/ibm/Article/11179/0

2. Essential Facts about Index Covering in DB2 Universal Database
xamples to illustrate cases in which adding a column or two to an index can boost a query's performance by "covering" the columns used in a SELECT statement. The DB2 optimizer is very good in recognizing such situations and can choose a very efficient execution plan.

http://www-128.ibm.com/developerworks/db2/library/techarticle/0303kuznetsov/0303kuznetsov.html

3. Using DB2 UDB OLAP functions
Online analytical processing (OLAP) functions are very flexible and powerful. Using them, you may come up with simple solutions to some problems that would otherwise require either iteration through one or several cursors or recursion. In some other cases it is much easier to write a query using OLAP functions or auxiliary tables than to write an equivalent query not using them.

http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0401kuznetsov/index.html

4. Using Materialized QueryTables to Speed up Queries in DB2 UDB
Sometimes a simple change in physical database structure dramatically improves query performance. In addition to indexes, DB2 UDB provides you with materialized query tables, which in some cases are more efficient than indexes. This article provides ...

http://www-128.ibm.com/developerworks/db2/library/techarticle/0208kuznetsov/0208kuznetsov.html

My MS SQL Server Articles

My MS SQL Server Articles

1. Indexes on Computed Columns: Speed Up Queries, Add Business Rules
Using an index on a computed column, you can implement complex business rules or just give your SQL Server queries a performance boost.

http://www.devx.com/dbzone/Article/30786/0

2. Avoid These Common Business Rule Implementation Mistakes
Some solutions for implementing business rules can allow invalid data to get into the database. Find out where the loopholes lie in these all-too-common approaches.

http://www.devx.com/dbzone/Article/31985/0

3.Index Covering Boosts SQL Server Query Performance
Index covering is a quick and easy solution to many query performance problems. By just adding a column or two to an index you can significantly boost a query's performance.

http://www.devx.com/dbzone/Article/29530/0

4. The Less SQL Server Sorts, the Faster It Responds (with Rajiv Rajak)
Learn coding techniques to tweak your SQL queries so that SQL Server sorts less data and returns much faster.

http://www.devx.com/dbzone/Article/30149/0