Find Data Quickly … By Using Indexes

Frequently, when a person is interviewing for a database job (i.e. developer or administrator), someone asks them … What is an index?  Or they may ask, what is the difference between a clustered and non-clustered index?  Now, while I do feel that those are good questions to ask, I think they should take it a step or two further to see if a person really understands what an index is and how it is used by the database engine.  For example, a developer I worked with in the past knew that indexes were a good thing, and put one on every column of the table and combinations on columns of the table.  In his case, some of them were never used and unnecessary.

Indexing Basics

First of all, indexes should ALWAYS be used because they help us to access data quickly.  We could liken an index to a “card catalog” at a library … which the some may remember (http://en.wikipedia.org/wiki/Library_catalog).  Just imagine going to the library to find books on a certain subject and having to look at every book in the library.  That could take a tremendous amount of time and be overwhelming.  So, that’s where the benefit comes from using a “card catalog” or even an online versions of it.  They direct your attention to only a portion of the library that contains books relevant to your subject.

It is very similar with finding or retrieving data in a SQL Server.  Without an index, it is like searching through that whole library, or in our case, the whole table.  Of course, when the database is small you may not feel the pain.  You could liken this to searching a personal library in a home, where there are only a dozen books or so.  But, as that library or the table grows larger, with millions or billions of items to search through, this becomes time consuming and even unbearable.

So, that is where indexes come in.  They help SQL Server to find data quickly. 

Now, since there is so much information available regarding indexes, I am not going to spend a lot of time on this.  But, I just wanted to relate some basics about indexes (Create Index: http://msdn.microsoft.com/en-us/library/ms188783.aspx):

  • Clustered – An index in which the logical order of the key values determines the physical order of the corresponding rows in a table. The bottom, or leaf, level of the clustered index contains the actual data rows of the table. A table or view is allowed one clustered index at a time.
  • Nonclustered – An index that specifies the logical ordering of a table. With a nonclustered index, the physical order of the data rows is independent of their indexed order.
  • Unique – A unique index is one in which no two rows are permitted to have the same index key value.

Here’s a nice reference on SQL Server index basics that includes an illustration: http://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/

Maintaining Indexes

Secondly, indexes should ALWAYS be maintained or optimized.  Maintaining an index is like keeping the books in the library in order.  For example, imagine looking at the card catalog and have it directing you all over the library to find books on a certain subject.  You would conclude that the arrangement of books was inefficient and it could be a time consuming process.  It is similar with SQL Server.  A badly maintained or fragmented index will cause the SQL Server to spend more time trying to find its data.

So, how do we find and fix fragmentation?

An easy way to find fragmentation is by using the dynamic management view (DMV), sys.dm_db_index_physical_stats (MSDN: http://msdn.microsoft.com/en-us/library/ms188917.aspx).

Here’s an example that lists the schema, table, index, description, fragmentation percentage, and page count:

SELECT
schema_name = s.name,
table_name = o.name,
index_name = b.name,
a.index_type_desc,
a.avg_fragmentation_in_percent,
a.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL , 'LIMITED') a
INNER JOIN sys.indexes AS b
ON a.object_id = b.object_id AND a.index_id = b.index_id
INNER JOIN sys.objects o
ON a.object_id = o.object_id
INNER JOIN sys.schemas s
ON o.schema_id = s.schema_id
ORDER BY table_name, index_name

In order to fix fragmentation, we have some choices to make.  Depending on the fragmentation percentage and possibly the page count, you may choose to reorganize or rebuild the index.  Many sources cite reorganizing for indexes that are between 5% to 30% fragmented, and for fragmentation levels over that rebuild.

Here’s the basic syntax:

ALTER INDEX [IndexName] ON [TableName] REORGANIZE
ALTER INDEX [IndexName] ON [TableName] REBUILD

Of course, you would NEVER try to handle all your fragmentation in a manual process.  But, rather this should be automated and scheduled.  SQL Server includes built in maintenance plans and I know of a few third party software vendors who also have products.  However, a nice solution that I like to use is from Ola Hallengren (http://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html).

Is there a way to determine if there are any indexes that are missing or that would make the SQL Server more efficient and perform better?

Yes, and again we will look at the DMV’s for assistance.  The following code will list tables and columns that if indexed the SQL Server would have been able to use.  Additionally, it lists the impact and number of queries that it would have used the index on.

SELECT
table_name = OBJECT_NAME(mid.object_id, mid.database_id),
mid.equality_columns, -- table.column = constant_value
mid.inequality_columns, -- table.column > constant_value
mid.included_columns,
migs.avg_user_impact,
migs.user_seeks,
migs.user_scans
FROM sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_details mid
ON mig.index_handle = mid.index_handle
INNER JOIN sys.dm_db_missing_index_group_stats migs
ON migs.group_handle = mig.index_group_handle
WHERE mid.database_id = DB_ID()
ORDER BY migs.avg_user_impact  DESC

How do we determine if some nonclustered indexes are not being used?

As I mentioned at the outset, we don’t want to get carried away with our index creation.  They do use resources and can have a negative impact on performance if we are not aware of what we are doing.  The following query shows the usage of nonclustered indexes:

SELECT
schema_name = s.name,
table_name = o.name,
index_name = i.name,
ius.user_seeks,
ius.user_scans,
ius.user_lookups,
ius.user_updates
FROM sys.dm_db_index_usage_stats ius
INNER JOIN sys.objects o
ON ius.object_id = o.object_id
INNER JOIN sys.schemas s
ON o.schema_id = s.schema_id
INNER JOIN sys.indexes i
ON i.index_id = ius.index_id AND i.object_id = ius.object_id
WHERE OBJECTPROPERTY(ius.object_id,'IsUserTable') = 1
AND ius.database_id = DB_ID()
AND i.type_desc = 'NONCLUSTERED'
AND i.is_primary_key = 0
AND i.is_unique_constraint = 0
ORDER BY (ius.user_seeks + ius.user_scans + ius.user_lookups + ius.user_updates)

MSDN reference for index related DMV’s: http://msdn.microsoft.com/en-us/library/ms187974.aspx

So, make sure that you are using and maintaining indexes on your tables, especially if you want to find data quickly.

Advertisements
This entry was posted in DBA, Developer and tagged , , . Bookmark the permalink.

2 Responses to Find Data Quickly … By Using Indexes

  1. Pingback: Database Administrators or Developers … Who is Responsible for Indexing? | SQL Bad Boy – Michael Lowery

  2. Pingback: DBA Basics – A Review for the New Year | SQL Bad Boy – Michael Lowery

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s