I have a small table (600 rows) that is used in a query with a relatively large table (200,000 rows). Strangely, I get the best performance from the query by removing all indexes from the small table (including the primary key). This seems a little odd to me, and I'm wondering if this is a common scenario, and if there is some general rule that can be applied when indexing small tables.
I have tried changing the primary key index (clustered vs nonclustered) and adding other indexes to both tables, but I always get the best performance by removing all indexes from the small table. The performance difference is significant, execution time goes from 2 seconds to just over 1 second, and this is a query that gets executed quite a bit.
I can't delete the PK index as the table needs a PK for data integrity reasons. So I'm not sure how to achieve the non-index performance without removing the index.
Any suggestions would be much appreciated. By the way it's SQL Server 2005 Express.
Without indexes, you will always do a table scan. However, with such a small table, a table scan is probably much better than hopping through the index tree to get data. Though, I do not see how a clustered index would be a bad thing here. Can you show us some code (ddl for both table, including all keys/constraints)?
|||The small table:
create table EmployeeOffice (
EmployeeID uniqueidentifier not null,
OfficeID uniqueidentifier not null,
Sequence smallint not null,
CanAccessAllCases bit default 0 not null,
IsManager bit default 0 not null)
go
alter table EmployeeOffice
add constraint EmployeeOffice_PK primary key nonclustered (EmployeeID, OfficeID)
The large table:
create table CaseInfo (
CaseNumber int not null,
OfficeID uniqueidentifier not null,
EmployeeID uniqueidentifier null,
TeamID uniqueidentifier null,
... - there's about another 20 columns
InvoiceRequired" bit default 0 not null,
LastChanged" timestamp not null)
go
alter table CaseInfo
add constraint CaseInfo_PK primary key clustered (CaseNumber)
There's an Employee and an Office table as well with RI contraints between the four tables but they are not used in this particular query. The query is fairly large and involves other tables but this particular behaviour is specific to these two tables.
The query (simplified):
SELECT CaseNumber, InvoiceRequired,...
FROM CaseInfo
WHERE EmployeeID = 'BCA6F76E-A2EF-4FDA-AB38-061047ADBFDE'
or OfficeID in (Select OfficeID From EmployeeOffice Where EmployeeID = 'BCA6F76E-A2EF-4FDA-AB38-061047ADBFDE'
and (CanAccessAllCases=1 or IsManager=1))
In the test database ALL the records in the CaseInfo table have the same office, and the sub query returns no rows. The execution plan shows that when the index exists, it creates an "Index Spool" with 131,992 rows (which is the number of rows in CaseInfo where EmployeeID does not match. If I use a table hint to ignore the index on this table (or delete it) the "Index Spool" is not created.
|||
A few things:
1. I would rewrite the query to use join instead of subquery.
e.g.
Code Snippet
SELECT CaseNumber, InvoiceRequired
FROM CaseInfo ci JOIN EmployeeOffice eo ON ci.EmployeeID=eo.EmployeeID
WHERE ci.EmployeeID = 'BCA6F76E-A2EF-4FDA-AB38-061047ADBFDE'
union all
SELECT CaseNumber, InvoiceRequired
FROM CaseInfo ci JOIN EmployeeOffice eo ON ci.OfficeID=eo.OfficeID
WHERE eo.EmployeeID = 'BCA6F76E-A2EF-4FDA-AB38-061047ADBFDE'
and (eo.CanAccessAllCases|eo.IsManager=1)
2. Create an index on CaseInfo(EmployeeID,OfficeID)
3. Index spool is used to cache the data to allow faster lookup. This is not neccessarily a bad thing.
No comments:
Post a Comment