SQL Server Customer Advisory Team - SQL Server Best Practices

Enabling SQL Server customers to navigate the most challenging frontiers of large scale data management.

Technical Notes

How many files should a database have? - Part 1: OLAP workloads

Author: Thomas Kejser 

Contributors and reviewers: Stuart Ozer, Sanjay Mishra, Lubor Kollar, Juergen Thomas, Lindsey Allen, Mike Ruthruff, Mike Kilpatrick, Burzin Pazel, Kevin Cox, Prem Mehra and Hermann Daeubler

 

The subject - how many files a database should have - is a question that comes up often. The answer is of course: it depends. But, what does it depend on?

Background information

If a filegroup in SQL Server contains more than one file, SQL Server will “stripe” allocations across the files by using a proportional fill algorithm. If the files in the group have the same size (which we recommend), the allocation is essentially a round-robin. The “stripe size” of this round-robin is by default one extent - 64KB. Hence, the first allocated extent goes to the first file in a filegroup, the second extent to the second and so on. This striping mechanism can be quite useful, because you can spread your I/O over several LUNs by allocating a data file to each. You should strive to have all files in a filegroup be of equal size. Using Files and Filegroups on TechNet rovides more background on filegroup and allocation.

Each file in the database has its own PFS, GAM and SGAM pages. These special “administration pages” track the free space and allocation in the in the file. Every new allocation in the file will have to access a PFS page and in some  cases also the GAM/SGAM pages. (For more background on this see "Inside SQL Server 2005: The Storage Engine” by Kalen Delaney).

In this Tech Note, we look at files from an OLAP / Data Warehouse workload perspective. Because OLAP and OLTP workloads differ greatly, different recommendation for file allocation apply.

Too few files in a filegroup

If a filegroup receives a lot of insert activity, the pressure on the PFS and GAM/SGAM page access becomes significant. At some point, this becomes a bottleneck, effectively slowing down the insert throughput. In a data warehouse load, the contention is typically on PFS pages.

If PFS contention is present in a workload, it will show up as waiting for PAGELATCH_UP in sys.dm_os_wait_stats. You can use sys.dm_os_waiting_tasks to see which pages you are waiting for. You will see something like the following:

 

 

The format of the resource_description column is: DBID:FILEID:PAGEID. You can use the resource_description to look up the page in sys.dm_os_buffer_descriptors to see if your wait is on a PFS page.

If you discover that you have many waits for PFS pages, you probably need to add more files to the affected filegroup. Because each file has its own administration pages – the presence of more files reduce contention on PFS pages.

Too many files in a filegroup

Increasing the number of files is useful if you use the files to “stripe” across LUNs or if you run into the PFS  bottleneck as described above.

However, there is a disadvantage in having too many files. Remember that SQL Server will stripe the extents over the files in stripe sizes of 64KB. Assume that you have great deal of insert activity on a single filegroup that contains many files. Since SQL Server distributes the extents across the files, your average I/O request will typically have a size of 64KB. If you instead had fewer files in the filegroup, SQL Server could “bundle” the allocations and hence, drive larger block sizes. Most I/O systems deliver a better throughput if you can drive large block sizes.

Testing with TPC-H LINEITEM data shows the following pattern when loading the database using minimally logged operations:
 

# Files in filegroup Avg I/O block size (KB)
1 256
2 196
4 132
8 64
16 64
32 64

The above numbers are for a single bulk stream to the file.

SQL Server is quite good at bundling I/O operations together in large blocks – a process known as scatter/gather. In our test, we tried to increase the concurrency of the bulk load to utilize this functionality. With 64 bulk streams, we were able to drive block sizes up to 196 KB, even with 32 files in a single filegroup. But still, with 1 file in each of multiple filegroups, we were getting a faster, 256 KB block sized I/O.

So, while adding more files can benefit performance by eliminating PFS contention, it can make the I/O pattern less efficient. You can measure the size of the block requests using the perfmon counter: Logical Disk: Avg Disk Bytes / Write to gauge how efficient your block size is.

Having many filegroups in a database adds an administrative burden: you now have to balance the space usage in the database between the filegroups. Therefore, you probably do not want to go overboard in optimizing your I/O block sizes by adding filegroups with few files and allocating your tables across them. In our TPC-H load test we only saw a 5% increase in disk throughput from optimizing block sizes and the benefit was only realized above several hundred MB / Sec insert speeds.

Another factor to consider, from an administrative perspective, is database startup recovery. File recovery after server restart or after a SET ONLINE operation on the database are done sequentially. If you have many files (hundreds) in your database, this recovery process can take a long time because each file is opened sequentially.

The –E startup flag

The SQL Server startup flag –E forces SQL Server to allocate 4 extents at a time to each file, essentially quadrupling the stripe size. In heavy insert scenarios, this drives larger block sizes to the disk. Also, your pages allocation will be more sequential with the same data file, allowing better sequential I/O for range and table scan operations (which are common in OLAP workloads).

This startup flag provides most, but not all, of the above mentioned benefits to I/O system – without the overhead of managing multiple filegroups. Be aware that this flag is supported only in 64-bit environments. You can find information about the –E startup flag in File allocation extension in SQL Server 2000 (64-bit) and SQL Server 2005 (KB329526).

So, how many database files should I have in my OLAP system?

To allocate the optimal number of files you must understand your database workload. The amount of insert activity is the determining factor. You also must balance the following factors:

  • PFS contention
  • need for SQL based striping
  • I/O pattern – block sizes
  • File recovery times 
PFS contention and SQL-based striping drives you towards allocating more files. Optimizing I/O requests and file recovery leans towards fewer files. If your work load is very insert heavily, you generally want more files, but in a controlled manner. We have seen benefits of having up to half the amount of files as you have cores – and even more in the case of tempdb. If your data load is more read intensive, having fewer files may benefit you, since PFS contention is not a problem in this case and your I/O will arrive in larger bundles.

In extreme cases, when inserting hundreds of MB / Sec, you can benefit from partitioning your table into filegroups, each with low number of files to bundle I/O requests together and create larger block sizes with sequential disk access. But, by doing this you assign performance priority over ease of administration.

So the answer to the question, “How many files should my data warehouse have?” is: as few as possible, without running into PFS contention and without sacrificing striping ability.

Comments

 

colin leversuch-roberts said:

This is very useful as a reference point to present when considering the use of multiple files in a filegroup. I note that although the -E is not support for x32 it is present - would it be useful for very heavy tempdb usage where I seem to be getting contention, even with multiple files?

March 13, 2008 8:53 AM
 

tkejser said:

Hi Colin

Thanks  for your comments. I am not 100% sure what you are asking regarding tempdb but here is a stab at an answer anyway.

If you have PFS or SGAM/GAM contention in tempdb, -E will probably not help you. You should instead look at adding more files or even use the trace flag -T1118 described here: support.microsoft.com/.../328551

March 14, 2008 6:48 PM
 

Impedance Mismatch said:

In questi giorni sono finalmente riuscito a sentire l'amico Tomas del SQL CAT (Customer Advisory

March 17, 2008 10:35 AM
 

EXEC dbo.LongTermMemory__Archive said:

A question I get asked quite a lot is "how many files / filegroups should my database have?". As we all

March 19, 2008 3:21 PM
 

EXEC dbo.LongTermMemory__Archive said:

A question I get asked quite a lot is "how many files / filegroups should my database have?"

March 19, 2008 4:36 PM
 

Allen Hermann » Blog Archive » How many files should a database have? - Part 1: OLAP workloads said:

Pingback from  Allen Hermann  » Blog Archive   » How many files should a database have? - Part 1: OLAP workloads

June 10, 2008 2:38 PM
 

ernst said:

Hi Thomas,

Great article!

Just a quick question about multiple raid-groups to complicate matters a bit.

In general more spindles means better performance. Many hardware vendors selling SAN disk sub systems have a limit to the optimal amounts of disks in a single raid-group. This means that in large databases you will create several raid-groups. Say we create 8 raid-groups. This means that we will present 8 LUNs to the operating system. To use all LUNs (all spindles) we will have to create at least 8 files in our filegroup (one per LUN). This means as I understand it that we in average will write with 64 I/O block sizes to the files (because we have 8 files in our filegroup).

I’m I correct in assuming this or does SQL Server take into account that we are writing to different LUNs and thus uses another algorithm?

Would the –E option make any sense? With the –E option we will write bigger blocks to our files but at the same time limit the amount of spindles over which we distribute the data.

Greetings,

Ernst.

February 16, 2009 3:50 AM
 

tkejser said:

Hi Ernst

Thanks for your feedback. The more spindles = more performance is not always fully true. Once a RAID array becomes very wide, performance on sequential I/O will not always scale linear.

Using multiple files, you can work around this limitation by driving sequential streams directly to a narrow groups. With the -E flag you can benefit even futher, because data is now grouped together.

You may find the I/O configurations in the fast track warehouse a useful read, futher detailing this subject:

msdn.microsoft.com/.../dd459146.aspx

February 26, 2009 5:03 AM
sql, server, best practices, whitepapers, analysis services, data mining, olap, datawarehouse, datawarehousing, availability, clustering, capacity, collation, data types, data warehouse, database, design, index, mirroring, optimization, partitions, performance, precision, processing, querying, scalability, security, reporting services, integration services
Copyright 2008 Microsoft Corporation. All Rights Reserved.