Is C# type system sound and decidable?

Is C# type system decidable?

A type system is "decidable" if the compiler is in theory always able to decide whether the program type checks or not in finite time.

The C# type system is not decidable.

C# has "nominal" subtyping -- that is, you give classes and interfaces names and say what the base classes and interfaces are by name when you declare a class.

C# also has generic types, and, as of C# 4, covariance and contravariance of generic interfaces.

Those three things -- nominal subtyping, generic interfaces, and contravariance -- are sufficient to make a type system undecidable (in the absence of other restrictions on the ways that subtypes may mention each other.)

When this answer was originally written in 2014, that was suspected but not known. The history of this discovery is interesting.

First, the designers of the C# generic type system wondered the same thing, and wrote a paper in 2007 describing different ways in which type checking can go wrong, and what restrictions one can put on a nominal subtyping system that make it decidable.

https://www.microsoft.com/en-us/research/publication/on-decidability-of-nominal-subtyping-with-variance/

A more gentle introduction to the problem can be found on my blog, here:

https://ericlippert.com/2008/05/07/covariance-and-contravariance-part-11-to-infinity-but-not-beyond/

I have written about this subject on SE sites before; a researcher noticed the problem mentioned in that posting and solved it; we now know that nominal subtyping is in general undecidable if there is generic contravariance thrown into the mix. You can encode a Turing Machine into the type system and force the compiler to emulate its operation, and since the question "does this TM halt?" is undecidable, so must type checking be undecidable.

See https://arxiv.org/abs/1605.05274 for the details.

Is the C# type system sound?

A type system is "sound" if we are guaranteed that a program which type checks at compile time has no type errors at runtime.

The C# type system is not sound.

There are many reasons why it is not, but my least favourite is array covariance:

Giraffe[] giraffes = new[] { new Giraffe() };
Animal[] animals = giraffes; // This is legal!
animals[0] = new Tiger(); // crashes at runtime with a type error

The idea here is that most methods that take arrays only read the array, they do not write it, and it is safe to read an animal out of an array of giraffes. Java allows this, and so the CLR allows it because the CLR designers wanted to be able to implement variations on Java. C# allows it because the CLR allows it. The consequence is that every time you write anything into an array of a base class, the runtime must do a check to verify that the array is not an array of an incompatible derived class. The common case gets slower so that the rare error case can get an exception.

That brings up a good point though: C# is at least well-defined as to the consequences of a type error. Type errors at runtime produce sane behaviour in the form of exceptions. It's not like C or C++ where the compiler can and will blithely generate code that does arbitrarily crazy things.

There are a few other ways in which the C# type system is unsound by design.

  • If you consider getting a null reference exception to be a kind of runtime type error, then C# pre C# 8 is very unsound in that it does almost nothing to prevent this kind of error. C# 8 has many improvements in support for detecting nullity errors statically, but the null reference type checking is not sound; it has both false positives and false negatives. The idea is that some compile-time checking is better than none, even if it is not 100% reliable.

  • Many cast expressions allow the user to override the type system and declare "I know this expression will be of a more specific type at runtime, and if I'm wrong, throw an exception". (Some casts mean the opposite: "I know this expression is of type X, please generate code to convert it to an equivalent value of type Y". Those are generally safe.) Since this is a place where the developer is specifically saying that they know better than the type system, one can hardly blame the type system for the resulting crash.

There are also a handful of features that generate cast-like behaviour even though there is no cast in the code. For example, if you have a list of animals you can say

foreach(Giraffe g in animals)

and if there is a tiger in there, your program will crash. As the specification notes, the compiler simply inserts a cast on your behalf. (If you want to loop over all the giraffes and ignore the tigers, that's foreach(Giraffe g in animals.OfType<Giraffe>()).)

  • The unsafe subset of C# makes all bets off; you can break the rules of the runtime arbitrarily with it. Turning off a safety system turns a safety system off, so it should not be surprising that C# is not sound when you turn off soundness checking.

It's not particularly hard to create problems that the C# complier cannot solve in a reasonable amount of time. Some of the problems it is posed with (often related to generics/type inference) are NP-hard problems. Eric Lippert describes one such example here:

class MainClass
{
    class T{}
    class F{}
    delegate void DT(T t);
    delegate void DF(F f);
    static void M(DT dt)
    {
        System.Console.WriteLine("true");
        dt(new T());
    }
    static void M(DF df)
    {
        System.Console.WriteLine("false");
        df(new F());
    }
    static T Or(T a1, T a2, T a3){return new T();}
    static T Or(T a1, T a2, F a3){return new T();}
    static T Or(T a1, F a2, T a3){return new T();}
    static T Or(T a1, F a2, F a3){return new T();}
    static T Or(F a1, T a2, T a3){return new T();}
    static T Or(F a1, T a2, F a3){return new T();}
    static T Or(F a1, F a2, T a3){return new T();}
    static F Or(F a1, F a2, F a3){return new F();}
    static T And(T a1, T a2){return new T();}
    static F And(T a1, F a2){return new F();}
    static F And(F a1, T a2){return new F();}
    static F And(F a1, F a2){return new F();}
    static F Not(T a){return new F();}
    static T Not(F a){return new T();}
    static void MustBeT(T t){}
    static void Main()
    {
        // Introduce enough variables and then encode any Boolean predicate:
        // eg, here we encode (!x3) & ((!x1) & ((x1 | x2 | x1) & (x2 | x3 | x2)))
        M(x1=>M(x2=>M(x3=>MustBeT(
          And(
            Not(x3), 
            And(
              Not(x1), 
              And(
                Or(x1, x2, x1), 
                Or(x2, x3, x2))))))));
    }
}