How do yield and await implement flow of control in .NET?

yield is the easier of the two, so let's examine it.

Say we have:

public IEnumerable<int> CountToTen()
{
  for (int i = 1; i <= 10; ++i)
  {
    yield return i;
  }
}

This gets compiled a bit like if we'd written:

// Deliberately use name that isn't valid C# to not clash with anything
private class <CountToTen> : IEnumerator<int>, IEnumerable<int>
{
    private int _i;
    private int _current;
    private int _state;
    private int _initialThreadId = CurrentManagedThreadId;

    public IEnumerator<CountToTen> GetEnumerator()
    {
        // Use self if never ran and same thread (so safe)
        // otherwise create a new object.
        if (_state != 0 || _initialThreadId != CurrentManagedThreadId)
        {
            return new <CountToTen>();
        }

        _state = 1;
        return this;
    }

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();

    public int Current => _current;

    object IEnumerator.Current => Current;

    public bool MoveNext()
    {
        switch(_state)
        {
            case 1:
                _i = 1;
                _current = i;
                _state = 2;
                return true;
            case 2:
                ++_i;
                if (_i <= 10)
                {
                    _current = _i;
                    return true;
                }
                break;
        }
        _state = -1;
        return false;
    }

    public void Dispose()
    {
      // if the yield-using method had a `using` it would
      // be translated into something happening here.
    }

    public void Reset()
    {
        throw new NotSupportedException();
    }
}

So, not as efficient as a hand-written implementation of IEnumerable<int> and IEnumerator<int> (e.g. we would likely not waste having a separate _state, _i and _current in this case) but not bad (the trick of re-using itself when safe to do so rather than creating a new object is good), and extensible to deal with very complicated yield-using methods.

And of course since

foreach(var a in b)
{
  DoSomething(a);
}

Is the same as:

using(var en = b.GetEnumerator())
{
  while(en.MoveNext())
  {
     var a = en.Current;
     DoSomething(a);
  }
}

Then the generated MoveNext() is repeatedly called.

The async case is pretty much the same principle, but with a bit of extra complexity. To reuse an example from another answer Code like:

private async Task LoopAsync()
{
    int count = 0;
    while(count < 5)
    {
       await SomeNetworkCallAsync();
       count++;
    }
}

Produces code like:

private struct LoopAsyncStateMachine : IAsyncStateMachine
{
  public int _state;
  public AsyncTaskMethodBuilder _builder;
  public TestAsync _this;
  public int _count;
  private TaskAwaiter _awaiter;
  void IAsyncStateMachine.MoveNext()
  {
    try
    {
      if (_state != 0)
      {
        _count = 0;
        goto afterSetup;
      }
      TaskAwaiter awaiter = _awaiter;
      _awaiter = default(TaskAwaiter);
      _state = -1;
    loopBack:
      awaiter.GetResult();
      awaiter = default(TaskAwaiter);
      _count++;
    afterSetup:
      if (_count < 5)
      {
        awaiter = _this.SomeNetworkCallAsync().GetAwaiter();
        if (!awaiter.IsCompleted)
        {
          _state = 0;
          _awaiter = awaiter;
          _builder.AwaitUnsafeOnCompleted<TaskAwaiter, TestAsync.LoopAsyncStateMachine>(ref awaiter, ref this);
          return;
        }
        goto loopBack;
      }
      _state = -2;
      _builder.SetResult();
    }
    catch (Exception exception)
    {
      _state = -2;
      _builder.SetException(exception);
      return;
    }
  }
  [DebuggerHidden]
  void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine param0)
  {
    _builder.SetStateMachine(param0);
  }
}

public Task LoopAsync()
{
  LoopAsyncStateMachine stateMachine = new LoopAsyncStateMachine();
  stateMachine._this = this;
  AsyncTaskMethodBuilder builder = AsyncTaskMethodBuilder.Create();
  stateMachine._builder = builder;
  stateMachine._state = -1;
  builder.Start(ref stateMachine);
  return builder.Task;
}

It's more complicated, but a very similar basic principle. The main extra complication is that now GetAwaiter() is being used. If any time awaiter.IsCompleted is checked it returns true because the task awaited is already completed (e.g. cases where it could return synchronously) then the method keeps moving through states, but otherwise it sets itself up as a callback to the awaiter.

Just what happens with that depends on the awaiter, in terms of what triggers the callback (e.g. async I/O completion, a task running on a thread completing) and what requirements there are for marshalling to a particular thread or running on a threadpool thread, what context from the original call may or may not be needed and so on. Whatever it is though something in that awaiter will call into the MoveNext and it will either continue with the next piece of work (up to the next await) or finish and return in which case the Task that it is implementing becomes completed.


I'll answer your specific questions below, but you would likely do well to simply read my extensive articles on how we designed yield and await.

https://blogs.msdn.microsoft.com/ericlippert/tag/continuation-passing-style/

https://blogs.msdn.microsoft.com/ericlippert/tag/iterators/

https://blogs.msdn.microsoft.com/ericlippert/tag/async/

Some of these articles are out of date now; the code generated is different in a lot of ways. But these will certainly give you the idea of how it works.

Also, if you do not understand how lambdas are generated as closure classes, understand that first. You won't make heads or tails of async if you don't have lambdas down.

When an await is reached, how does the runtime know what piece of code should execute next?

await is generated as:

if (the task is not completed)
  assign a delegate which executes the remainder of the method as the continuation of the task
  return to the caller
else
  execute the remainder of the method now

That's basically it. Await is just a fancy return.

How does it know when it can resume where it left off, and how does it remember where?

Well, how do you do that without await? When method foo calls method bar, somehow we remember how to get back to the middle of foo, with all the locals of the activation of foo intact, no matter what bar does.

You know how that's done in assembler. An activation record for foo is pushed onto the stack; it contains the values of the locals. At the point of the call the return address in foo is pushed onto the stack. When bar is done, the stack pointer and instruction pointer are reset to where they need to be and foo keeps going from where it left off.

The continuation of an await is exactly the same, except that the record is put onto the heap for the obvious reason that the sequence of activations does not form a stack.

The delegate which await gives as the continuation to the task contains (1) a number which is the input to a lookup table that gives the instruction pointer that you need to execute next, and (2) all the values of locals and temporaries.

There is some additional gear in there; for instance, in .NET it is illegal to branch into the middle of a try block, so you can't simply stick the address of code inside a try block into the table. But these are bookkeeping details. Conceptually, the activation record is simply moved onto the heap.

What happens to the current call stack, does it get saved somehow?

The relevant information in the current activation record is never put on the stack in the first place; it is allocated off the heap from the get-go. (Well, formal parameters are passed on the stack or in registers normally and then copied into a heap location when the method begins.)

The activation records of the callers are not stored; the await is probably going to return to them, remember, so they'll be dealt with normally.

Note that this is a germane difference between the simplified continuation passing style of await, and true call-with-current-continuation structures that you see in languages like Scheme. In those languages the entire continuation including the continuation back into the callers is captured by call-cc.

What if the calling method makes other method calls before it awaits-- why doesn't the stack get overwritten?

Those method calls return, and so their activation records are no longer on the stack at the point of the await.

And how on earth would the runtime work its way through all this in the case of an exception and a stack unwind?

In the event of an uncaught exception, the exception is caught, stored inside the task, and re-thrown when the task's result is fetched.

Remember all that bookkeeping I mentioned before? Getting exception semantics right was a huge pain, let me tell you.

When yield is reached, how does the runtime keep track of the point where things should be picked up? How is iterator state preserved?

Same way. The state of locals is moved onto the heap, and a number representing the instruction at which MoveNext should resume the next time it is called is stored along with the locals.

And again, there's a bunch of gear in an iterator block to make sure that exceptions are handled correctly.