Exploding Data Sizes … Better Try Data Compression

It is estimated that the amount of data in the digital universe is doubling every two years. By 2020, the amount of data is estimated to reach 44 zettabytes, or 44 trillion gigabytes (http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm). Of course, not all of this data is being put into database systems. But, I mention this to illustrate the point that our company’s data is probably growing at a similar rate. For example, at my company in the last year, the amount of used space in our database files for all our systems grew from about 82 TB to 190 TB. We literally doubled our data footprint in a year! This is great for the business because it means more revenue since we have hosting services, but this can be difficult for the technical team (storage and database admins).

Now, there is no way for anyone to really stop this data explosion, but we can try to contain or manage it. So, this is one reason where data compression could help (http://msdn.microsoft.com/en-us/library/cc280449.aspx).

SQL Server supports two types of compression:
Row compression (http://msdn.microsoft.com/en-us/library/cc280576.aspx) – This type of compression deals with changing the physical storage format of the data that is associated with a data type but not its syntax or semantics.
Page compression (http://msdn.microsoft.com/en-us/library/cc280464.aspx) – This type of compression actually implements multiple operations: row compression, prefix compression, and dictionary compression. Prefix compression searches a column for repeated values and dictionary compression searches the whole page.

Determine whether data compression is enabled
We determine whether compression is enabled by looking at the data_compression column of the sys.partitions table (http://technet.microsoft.com/en-us/library/ms175012.aspx). In a previous post (https://sqlbadboy.wordpress.com/2014/01/24/should-size-matter-in-database-decisions/), I gave some example code that showed not only how to find table sizes, but also data on the type of compression used. Here is a simplified version of that code:

SELECT
	s.name AS [schema_name],
	o.name AS [table_name],
	i.name AS [index_name],
	i.type_desc AS [index_type],
	p.data_compression_desc
FROM sys.schemas s
	INNER JOIN sys.objects o
		ON s.schema_id = o.schema_id
	INNER JOIN sys.indexes i
		ON o.object_id = i.object_id
	INNER JOIN sys.partitions p
		ON i.object_id = p.object_id
			AND i.index_id = p.index_id
WHERE o.type = 'U'
ORDER BY
	[schema_name],
	[table_name],
	[index_type],
	[index_name]

Find the space savings for data compression
In the Enterprise edition of SQL Server, which seems to be the only one which supports data compression, there is a stored procedure, sp_estimate_data_compression_savings (http://msdn.microsoft.com/en-us/library/cc280574.aspx). This stored procedure returns the current size of the requested object and estimates the object size for the requested compression state.

EXEC sp_estimate_data_compression_savings
  @schema_name = 'SchemaName',
  @object_name = 'ObjectName',
  @index_id = IndexId,
  @partition_number = PartitionNumber,
  @data_compression = 'DataCompressionType' -- NONE, ROW, PAGE

Here’s a useful post on estimating data compression savings for an entire database and includes a sample script (http://blogs.msdn.com/b/dfurman/archive/2011/02/25/estimating-data-compression-savings-for-entire-database.aspx).

Enabling data compression
In order to enable data compression, you have to rebuild the table or index (http://msdn.microsoft.com/en-us/library/hh710070.aspx). The following examples show the code for compressing a table or individual index. And if you want to disable data compression, you simply specify DATA_COMPRESSION = NONE.

ALTER TABLE SchemaName.TableName REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = ROW)

ALTER INDEX IndexName ON SchemaName.TableName REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE)

Although you can’t prevent exploding data sizes, you can try to manage it by using data compression. Additionally, by using data compression you may experience better performance since SQL Server will need to read less data, although it may increase CPU usage.

Advertisements
This entry was posted in DBA and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s