I decided to write this off the back of a conversation I was having the other day around the SOS_SCHEDULER_YIELD wait type.
The conversation went something along the lines of “but David, I’m seeing SOS_SCHEDULER_YIELD, we must have CPU issues”.
Yes this particular customer had been CPU bound recently but was that really their problem now, what is SOS_SCHEDULER_YIELD really mean?
Occurs when a task voluntarily yields the scheduler for other tasks to execute. During this wait the task is waiting for its quantum to be renewed.https://docs.microsoft.com/
But What Does That Mean?
When a process runs on the processor in SQL, it has a quantum, or maximum time that it’s allowed to run, in the case of SQL that’s 4ms.
The processor scheduler is basically a first in first out queue (that’s not entirely true but for the most part, that’s how it works). When a process is ready to execute, it jumps on the scheduler in a RUNNABLE state. When it gets to the front of the queue, it changes to RUNNING and is able to execute.
Just like queing for the toilet, you’ve got to wait there until it’s free and then you’re good to do your thing.
That RUNNING query will carry on running until one of two things happens.
- The process has to stop and wait for something, perhaps for a page to come back from the IO subsystem or for a lock. At this point, it’ll jump off the scheduler and switch to the SUSPENDED state.
- It exceeds it’s 4ms quota and has to hop to the back of the queue and register a SOS_SCHEDULER_YIELD wait.
So Does SOS_SCHEDULER_YIELD Actually Matter?
Maybe, maybe not….
Think of the situation when our process is the only one on that particular scheduler. It still has to yield the scheduler when the 4ms quantum is up, even when no one else is waiting so will register a SOS_SCHEDULER_YIELD but it can then go straight back into the RUNNING state.
In this situation there’s no problem, the process, even though it’s having to chalk up a SOS_SCHEDULER_YIELD wait, it’s not actually having to wait on anything and can just get straight back on with doing what it was doing.
When it does become a problem is when we have a number of other processes already waiting on the scheduler. In that case, when our process exceeds it’s 4ms and yields the scheduler, it’ll need to go to the back of the queue and wait its turn before it can do any more processing. It’s this waiting inline that’s going to slow us down and indicate contention on CPU.
How To Spot SOS_SCHEDULER_YIELD Waits That We Do Care About?
There’s another number that we can look at in sys.dm_os_waitstats and that’s signal_wait_time_ms.
Signal_wait_time_ms is the time that the process has spend standing in that queue, waiting to execute. This means that if you’re seeing a large number of SOS_SCHEDULER_YIELD waits but the signal wait time is low then I’d tend not to worry, although you might want to look at why your queries are using as much CPU as they are, there could be some scope for optimisation somewhere.
But if you’re seeing big signal wait numbers then CPU contention is something that you might want to think about.
If you want to take a look at what’s actually waiting on the schedulers you can query sys.dm_os_tasks
SELECT scheduler_id, session_id, task_state FROM sys.dm_os_tasks WHERE task_state IN ('RUNNING','RUNNABLE')
If you’re seeing a large number of tasks in the RUNNABLE state, then you may well be feeling some pain from CPU contention.