SOS_SCHEDULER_YIELD – What is it really telling us?

I decided to write this off the back of a conversation I was having the other day around the SOS_SCHEDULER_YIELD wait type.

The conversation went something along the lines of “but David, I’m seeing SOS_SCHEDULER_YIELD, we must have CPU issues”.

Yes this particular customer had been CPU bound recently but was that really their problem now, what is SOS_SCHEDULER_YIELD really mean?

Microsoft Says

Occurs when a task voluntarily yields the scheduler for other tasks to execute. During this wait the task is waiting for its quantum to be renewed.

https://docs.microsoft.com/

But What Does That Mean?

When a process runs on the processor in SQL, it has a quantum, or maximum time that it’s allowed to run, in the case of SQL that’s 4ms.

The processor scheduler is basically a first in first out queue (that’s not entirely true but for the most part, that’s how it works). When a process is ready to execute, it jumps on the scheduler in a RUNNABLE state. When it gets to the front of the queue, it changes to RUNNING and is able to execute.

Just like queing for the toilet, you’ve got to wait there until it’s free and then you’re good to do your thing.


That RUNNING query will carry on running until one of two things happens.

  1. The process has to stop and wait for something, perhaps for a page to come back from the IO subsystem or for a lock. At this point, it’ll jump off the scheduler and switch to the SUSPENDED state.
  2. It exceeds it’s 4ms quota and has to hop to the back of the queue and register a SOS_SCHEDULER_YIELD wait.

So Does SOS_SCHEDULER_YIELD Actually Matter?

Maybe, maybe not….

Think of the situation when our process is the only one on that particular scheduler. It still has to yield the scheduler when the 4ms quantum is up, even when no one else is waiting so will register a SOS_SCHEDULER_YIELD but it can then go straight back into the RUNNING state.

In this situation there’s no problem, the process, even though it’s having to chalk up a SOS_SCHEDULER_YIELD wait, it’s not actually having to wait on anything and can just get straight back on with doing what it was doing.

When it does become a problem is when we have a number of other processes already waiting on the scheduler. In that case, when our process exceeds it’s 4ms and yields the scheduler, it’ll need to go to the back of the queue and wait its turn before it can do any more processing. It’s this waiting inline that’s going to slow us down and indicate contention on CPU.

How To Spot SOS_SCHEDULER_YIELD Waits That We Do Care About?

There’s another number that we can look at in sys.dm_os_waitstats and that’s signal_wait_time_ms.

Signal_wait_time_ms is the time that the process has spend standing in that queue, waiting to execute. This means that if you’re seeing a large number of SOS_SCHEDULER_YIELD waits but the signal wait time is low then I’d tend not to worry, although you might want to look at why your queries are using as much CPU as they are, there could be some scope for optimisation somewhere.

But if you’re seeing big signal wait numbers then CPU contention is something that you might want to think about.

If you want to take a look at what’s actually waiting on the schedulers you can query sys.dm_os_tasks

SELECT scheduler_id, session_id, task_state
FROM sys.dm_os_tasks
WHERE task_state IN ('RUNNING','RUNNABLE')

If you’re seeing a large number of tasks in the RUNNABLE state, then you may well be feeling some pain from CPU contention.

6 thoughts on “SOS_SCHEDULER_YIELD – What is it really telling us?

Add yours

  1. Thanks – this is one of the clearest explanations I’ve read around this wait type and what it actually means.

    Like

  2. I am confused by one part:

    “That RUNNING query will carry on running until one of two things happens. ”

    Once thread is RUNNING, shouldn’t it run until it’s finished?
    -The process has to stop and wait for something…
    -It exceeds it’s 4ms quota and has to hop to the back of the queue and register a SOS_SCHEDULER_YIELD wait.

    What about situation where it finishes? Like simple query successfully finishes.

    Point 2. makes it sound like every 4ms thread yields processor, doesn’t matter if needs to or not, then jumps back in immiedietly after. Going back to toilet analogy, customer would enter sit down for 4ms and reenter toilet.

    That goes against my understanding of non-preemtive scheduling, where thread can run as long as it needs to. Unless there is resource wait.

    Like

    1. Within SQL, a worker has no knowledge of any other workers that are on the same scheduler. When its 4ms quantum has expired it must yield the scheduler, even if that means walking out of the toilet and going straight back in again.

      Like

      1. Thank you! It’s now clear how it works.
        I can see my question is a little convoluted, so thanks for answering anyway.
        I miss technical background so sometimes it’s hard to word my thought process in english.

        Like

  3. Thanks for the article, however, I am puzzled by one aspect of SOS_SCHEDULER_YIELD. If I look at the wait stats for this on one of my production servers, I see a wait time of say 970,000 seconds and signal wait time of say 969,000 seconds. This leaves a discrepancy of 1,000 seconds, which for most other wait types, we’d put down as the time waiting for the resource. But for the SOS_SCHEDULER_YIELD wait type, there is no resource being waited on (apart from the CPU, which is accounted for by the signal wait). So why do we have this discrepancy? What is happening during this 1,000 seconds?

    Like

Leave a Reply to Johny Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: