So you've identified that your SQL Server is under IO pressure. What can you do about it?
If disk is being used by SQL Server, try the following:
Use the Query Store view Top Resource Consuming Queries in SSMS. Look for high values of Logical Reads, Logical Writes and Physical Reads. These indicate IO intensive queries.
Here's a simple in-depth presentation on techniques to optimize SQL queries and reduce the IO required.
This solution is often a quick and easy solution that may be less expensive than more extensive optimization. If the cost is not overly high, it often provides a better return on investment than spending large amounts of effort on optimization.
Use the following query from the Microsoft Learn article Troubleshoot slow SQL Server performance caused by I/O issues to identify which database files are under pressure.
SELECT LEFT(mf.physical_name,100),ReadLatency = CASE WHEN num_of_reads = 0 THEN 0 ELSE (io_stall_read_ms / num_of_reads) END,WriteLatency = CASE WHEN num_of_writes = 0 THEN 0 ELSE (io_stall_write_ms / num_of_writes) END,AvgLatency = CASE WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0ELSE (io_stall / (num_of_reads + num_of_writes)) END,LatencyAssessment = CASE WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 'No data' ELSECASE WHEN (io_stall / (num_of_reads + num_of_writes)) < 2 THEN 'Excellent'WHEN (io_stall / (num_of_reads + num_of_writes)) BETWEEN 2 AND 5 THEN 'Very good'WHEN (io_stall / (num_of_reads + num_of_writes)) BETWEEN 6 AND 15 THEN 'Good'WHEN (io_stall / (num_of_reads + num_of_writes)) BETWEEN 16 AND 100 THEN 'Poor'WHEN (io_stall / (num_of_reads + num_of_writes)) BETWEEN 100 AND 500 THEN 'Bad'ELSE 'Deplorable' END END,[Avg KBs/Transfer] = CASE WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0ELSE ((([num_of_bytes_read] + [num_of_bytes_written]) / (num_of_reads + num_of_writes)) / 1024) END,LEFT (mf.physical_name, 2) AS Volume,LEFT(DB_NAME (vfs.database_id),32) AS [Database Name]FROM sys.dm_io_virtual_file_stats (NULL,NULL) AS vfsJOIN sys.master_files AS mf ON vfs.database_id = mf.database_idAND vfs.file_id = mf.file_idORDER BY AvgLatency DESC
This then gives somewhere to investigate.
You can find out what types of IO waits are occuring in SQL Server with the following query.
SELECT r.session_id, r.wait_type, r.wait_time as wait_time_msFROM sys.dm_exec_requests r JOIN sys.dm_exec_sessions sON r.session_id = s.session_idWHERE wait_type in ('PAGEIOLATCH_EX','PAGEIOLATCH_SH','PAGEIOLATCH_UP','WRITELOG','ASYNC_IO_COMPLETION','IO_COMPLETION','BACKUPIO')AND is_user_process = 1
If these waits exceed 10-15 milliseconds consistently, I/O is considered a bottleneck.
Occurs when a task is waiting on a latch for a data or index page (buffer) in an I/O request. The latch request is in the Exclusive mode. An Exclusive mode is used when the buffer is being written to disk. Long waits may indicate problems with the disk subsystem.
Occurs when a task is waiting on a latch for a data or index page (buffer) in an I/O request. The latch request is in the Shared mode. The Shared mode is used when the buffer is being read from the disk. Long waits may indicate problems with the disk subsystem.
Occurs when a task is waiting on a latch for a buffer in an I/O request. The latch request is in the Update mode. Long waits may indicate problems with the disk subsystem.
Occurs when a task is waiting for a transaction log flush to complete. A flush occurs when the Log Manager writes its temporary contents to disk. Common operations that cause log flushes are transaction commits and checkpoints.
Common reasons for long waits on WRITELOG are:
This is the most common cause of WRITELOG waits. Generally, the recommendation is to keep the data and log files on separate volumes. Transaction log writes are sequential writes while reading or writing data from a data file is random. Mixing data and log files on one drive volume (especially conventional spinning disk drives) will cause excessive disk head movement.
Too many virtual log files (VLFs) can cause WRITELOG waits. Too many VLFs can cause other types of issues, such as long recovery.
While large transactions can lead to blocking, too many small transactions can lead to another set of issues. If you don't explicitly begin a transaction, any insert, delete, or update will result in a transaction (we call this auto transaction). If you do 1,000 inserts in a loop, there will be 1,000 transactions generated. Each transaction in this example needs to commit, which results in a transaction log flush and 1,000 transaction flushes. When possible, group individual update, delete, or insert into a bigger transaction to reduce transaction log flushes and increase performance. This operation can lead to fewer WRITELOG waits.
Prior to SQL Server 2016, a single Log Writer thread performed all log writes. If there were issues with thread scheduling (for example, high CPU), both the Log Writer thread and log flushes could get delayed. In SQL Server 2016, up to four Log Writer threads were added to increase the log-writing throughput. See SQL 2016 - It Just Runs Faster: Multiple Log Writer Workers. In SQL Server 2019, up to eight Log Writer threads were added, which improves throughput even more. Also, in SQL Server 2019, each regular worker thread can do log writes directly instead of posting to the Log writer thread. With these improvements, WRITELOG waits would rarely be triggered by scheduling issues.
Occurs when some of the following I/O activities happen:
Occurs while waiting for I/O operations to complete. This wait type generally involves I/Os not related to data pages (buffers). Examples include:
Occurs when a backup task is waiting for data, or is waiting for a buffer to store data. This type isn't typical, except when a task is waiting for a tape mount.