Careful what you subselect for
Posted by Chris Alcock on Thursday 11th January 2007 at 12:24 am | Tagged as: Database, Development, SQL
I quite often find myself using a subselect to get some metadata key value to use as part of a where clause. Some thing like the following:
select * from product where producttype_fkey=(select producttype_key from ProductType where TypeId='TEST')
I had been under the impression that the database engine would be clever enough to work out that the value the subselect returned was constant and that it would simply work in an equivalent way to this query:
declare @prodType int;
select @prodtype=producttype_key from ProductType where TypeId='TEST';
select * from product where producttype_fkey=@prodType
Infact this is not the case. The first query ends running as though the product table and the productType table were being inner joined together, giving an execution plan like this:
The second query (using the variable) has two parts to its execution, however the join is no longer there:
The significant thing about this is that when these two types of query are run together, the first query accounts for almost 60% of the execution, where as the two stage with variable version only accounts for a little over 40%, so it is significantly better to select the value out into a variable rather than subselecting (especially where the same value is required multiple times in a stored procedure.
The optimizer doesn’t know that the select will only ever return one record so it will be creating an execution plan that caters for the eventuality that more than one row is returned from the subselect.
Would be interesting to see what the effect of changing the subselect to “SELECT TOP 1 …” is.
I thought of that – all that happens is you get a top operation in the execution plan between the join and the subselect. Since its is impossible for the subselect to return multiple rows (if it does you get a SQL Server error 512 ‘Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.’), I don’t see any good reason the the optimizer to choose a join over simply treating the subselect as a constant.
I wonder if the optimizer in SQL 2005 is any better at this?