Indexes are very crucial part of a table and to be able to decide on which index is required, which one should be dropped and which one should be changed is one of the most important part of DBA’s job, especially if you are into performance tuning of an application connecting to your database.
In this post, we will cover how to arrive at indexing column(s) in a table when data is available however you are unsure about the query pattern hitting to a database. This is general rule and doesn’t apply to all the cases and may vary application to application however this would give you an idea on how can we design an index and which all columns should be part of it and why?
We are working on Stackoverflow database, you may download it from provided link and can start playing around with it for different query and indexes.
Let’s take few queries and see how would the execution plan look like or how will be logical reads(we are not focusing on CPU time for now):
Let’s see logical reads now, as you guessed it right – since, there is no non-clustered index on the table, it would go for full table reads.
We may create below index and above query will go super-fast:
create index Reputation on Users(Reputation)
With above index, Logical reads will come down to one thousand pages from hundred thousand pages:
Execution plan, it would go for index seek as it is equality operator and is on same column.
Now, imagine one more condition is added in the where clause to include Location column as below:
Logical read will not reduce from one thousand pages irrespective of condition since there is no other supportive index for Location column.
Let’s add another index to support Location as below:
create index Location on Users(Location)
With above index, logical read will come down to one hundred from one thousand
However, it will use both the indexes i.e. one on Location and other one of Reputation:
What if we add one more condition in the where clause and include a condition on DisplayName:
Again, logical read won’t come down lesser than hundred pages, what if we introduce one more index on DisplayName as below:
create index DisplayName on Users(DisplayName)
Now, its using all of our indexes as you would have expected in the execution plan:
But, that’s not the case. Since, its equality operator – Whichever is most selective, will be chosen first.
And it depends upon the value in the where clause and not the column alone.
To continue, to the first theory about using all our indexes – we will use something else in the where clause:
Isn’t it something different? Yes, lets check the execution plan now:
Any guesses? Why it didn’t use other indexes like one on Location or one on Reputation column. The reason lies on size of index and its underlying column – DisplayName column(nvarchar(40) is much smaller than Location column(nvarchar(100).
How about having only one index and drop all the previously created ones. New Index definition will be:
create index Reputation_Location_DisplayName on Users(Reputation, Location, DisplayName)
Check the logical reads – Isn’t it better than all other indexes and no need to check selectivity and column size however first column is very important. We should choose that column as a first column – which makes the result as much convergent.
I have a video tutorial on this topic and explained it with different examples: