July 2008 Archives

Part of optimizing the IO subsystem of a machine running Microsoft SQL Server is deciding how many files a user database and tempdb should be broken up into. Denny Cherry provides some guidelines to determine this in an article that he published earlier this year. In it, he provides a couple of formulas that can be used to mathematically determine the appropriate number of files depending on the hardware being used. In the following discussion of these equations, let C, O, and P be defined as follows:

C = Number of CPUs
O = Number of cores per CPU
P = Physical files

According to Cherry, the number of files that should make up a user database can be computed using this formula:

P = C * O / 4

For example, if you have dual dual cores, you would have one physical file for the user databases:

P = C * O / 4
P = 2 * 2 / 4
P = 1

Or, with a dual quad core machine, the number of physical files is two as determined thus:

P = C * O / 4
P = 2 * 4 / 4
P = 2

To compute the number of files that makeup tempdb, use this formula:

P = C * O

For example, if you have dual dual cores, you would have 4 files for tempdb:

P = C * O
P = 2 * 2
P = 4

Or, if you have dual quads, the number of physical files is 8:

P = C * O
P = 2 * 4
P = 8

Keep in mind that I'm talking about files here not filegroups. Filegroups can purportedly be used to increase parallelism as noted on Microsoft's Web site; however, I would take that suggestion with a grain of salt. In his book, Inside SQL Server 2005: The Storage Engine Kalen Delaney has this to say about using file groups to increase parallelism:

Most SQL Server databases have a single data file in the default filegroup....As a user acquires greater database sophistication, she might want to use multiple devices to spread out the IO...The easiest way to do this is to create the database file on a RAID device. Still no need for filegroups...the user might decide that she really wants multiple files...In this case, she still does not need the filegroups — she can accomplish her goals by using create database with a list of files in separate drives.


More sophisticated database administrators might decide to have different tables assigned to different drives or to use the table and index partition feature of SQL Server 2005. Only then they will need to use filegroups...

There's usually no performance benefit in doing so.

To make this distinction clear, what I'm talking about here is a create statement such as this one:

CREATE DATABASE Archive
ON
PRIMARY
(NAME = Arch1,
FILENAME = 'W:\archdat1.mdf', -- Drive one: W
SIZE = 100MB,
MAXSIZE = 200,
FILEGROWTH = 20),
(NAME = Arch2,
FILENAME = 'X:\archdat2.ndf', -- Drive two: X
SIZE = 100MB,
MAXSIZE = 200,
FILEGROWTH = 20),
(NAME = Arch3,
FILENAME = 'Y:\archdat3.ndf', -- Drive three: Y
SIZE = 100MB,
MAXSIZE = 200,
FILEGROWTH = 20)
LOG ON
(NAME = Archlog1,
FILENAME = 'Z:\archlog1.ldf', -- Drive four: Z
SIZE = 100MB,
MAXSIZE = 200,
FILEGROWTH = 20),
(NAME = Archlog2,
FILENAME = 'Z:\archlog2.ldf', -- Drive four: Z
SIZE = 100MB,
MAXSIZE = 200,
FILEGROWTH = 20)

In doing this, you are instructing SQL Server to create different files to store your data. You are not telling it what objects to put in which files, however. This can only be done with filegorups. Rather, SQL Server decides which files to put which things in. If each of these files is on a different drive, or array of drives, you can achieve greater performance as Delaney points out.