Saturday, December 09, 2006

Maximum selectivity for bookmark lookups depends on row size.

Consider a range query on a column that has a non clusterd index. For narrow ranges the optimizer chooses to access the table via bookmark lookups. For wider ranges it just scans the clustered index, as it is cheaper than bookmark lookups. There is no hardcoded threshold - it depeneds on row size and bookmark size. A simple example - in all three cases the optimizer chooses bookmark lookups, but if you increase the range just a little bit, it scans the whole clustered index.

-- Assuming Numbers table with 10000 rows

CREATE TABLE WideRows(i INT NOT NULL PRIMARY KEY, j INT NOT NULL, c
CHAR(1000))
go

INSERT WideRows SELECT Numbers.Number, Numbers.Number + 1, 'asdf' FROM
dbo.Numbers
go

CREATE INDEX WideRows_j ON WideRows(j)

---- break even point close to 4%
SELECT i, j, c FROM WideRows WHERE j BETWEEN 1000 AND 1400
go

DROP TABLE WideRows
go

CREATE TABLE HugeRows(i INT NOT NULL PRIMARY KEY, j INT NOT NULL, c
CHAR(3500))
go

INSERT HugeRows SELECT Numbers.Number, Numbers.Number + 1, 'asdf' FROM
dbo.Numbers
go

CREATE INDEX HugeRows_j ON HugeRows(j)

---- break even point almost 20%
SELECT i, j, c FROM HugeRows WHERE j BETWEEN 1000 AND 2900
go

DROP TABLE HugeRows
go

CREATE TABLE NarrowRows(i INT NOT NULL PRIMARY KEY, j INT NOT NULL, c
CHAR(10))
go

INSERT NarrowRows SELECT Numbers.Number, Numbers.Number + 1, 'asdf'
FROM dbo.Numbers
go

CREATE INDEX NarrowRows_j ON NarrowRows(j)

---- break even point less than 1%
SELECT i, j, c FROM NarrowRows WHERE j BETWEEN 1000 AND 1040
go

DROP TABLE NarrowRows
go

3 Comments:

At 1:08 PM, Anonymous Anonymous said...

Hi Alex,

I think that the selectivity (number of row pulled based on the criteria) does not depend on the row size, but the IO cost associated to the bookmark does.

select *
from dbo.whatever_of_the_3_tables
where j BETWEEN 1000 AND 1400

will be 401 rows no matter the row size.

 
At 5:53 PM, Blogger Alex Kuznetsov said...

Hi,
yes the selectivity will be the same, in this case 4%. However, if an average row is wide, the optimizer will satisfy the query via a non-clustered index and a bookmark. On the other hand, if the table has narrow rows, the optimizer will choose to scan the whole table. In either case there will be some threshold selectivity at which the optimizer will stop choosing bookmark lookups. For a table with narrow rows this threshold will be much lower.

Cheers!

 
At 10:37 PM, Anonymous Anonymous said...

Nice tip Alex, thanks! Since I am fairly new to DB concepts i had to read up on selectivity - good article here:

Selectivity in index

 

Post a Comment

<< Home