Unsafe string manipulation mutates unexisting value

You are modifying the string in the interned string table, as the following code demonstrates:

using System;

namespace CoreApp1
{
    class Program
    {
        const string constFoo = "FOO";

        static unsafe void Main(string[] args)
        {
            fixed (char* p = constFoo)
            {
                for (int i = 0; i < constFoo.Length; i++)
                    p[i] = 'M';
            }

            // Madness ensues: The next line prints "MMM":
            Console.WriteLine("FOO"); // Prints the interned value of "FOO" which is now "MMM"
        }
    }
}

Here's something a little harder to explain:

using System;
using System.Runtime.InteropServices;

namespace CoreApp1
{
    class Program
    {
        const string constFoo = "FOO";

        static void Main()
        {
            char[] chars = new StringToChar {str = constFoo }.chr;

            for (int i = 0; i < constFoo.Length; i++)
            {
                chars[i] = 'M';
                Console.WriteLine(chars[i]); // Always prints "M".
            }

            Console.WriteLine("FOO"); // x86: Prints "MMM". x64: Prints "FOM".
        }
    }

    [StructLayout(LayoutKind.Explicit)]
    public struct StringToChar
    {
        [FieldOffset(0)] public string str;
        [FieldOffset(0)] public char[] chr;
    }
}

This doesn't use any unsafe code, but it still mutates the string in the intern table.

What's harder to explain here is that for x86 the interned string is changed to "MMM" as you'd expect, but for x64 it gets changed to "FOM". What happened to the changes to the first two characters? I can't explain this, but I'm guessing it's to do with fitting two characters into a word for x64 rather than just one.


To help you understand this, you can decompile the assembly and inspect the IL code.

Taking your second snippet, you will get something like this:

// static fields initialization
.method specialname static void .cctor () cil managed 
{
    IL_0000: ldstr "FOO"
    IL_0005: stsfld string Program::foo

    IL_000a: ldstr "FOO"
    IL_000f: stsfld string Program::bar
}

.method static void Main() cil managed 
{
    .entrypoint
    .locals init (
        [0] char* p,
        [1] string pinned,
        // ...
    )

    // fixed (char* ptr = "FOO")
    IL_0001: ldstr "FOO"
    IL_0006: stloc.1
    IL_0007: ldloc.1
    IL_0008: conv.u
    IL_0009: stloc.0
    // ...
}

Note that in all three cases, the string is loaded onto the evaluation stack using the ldstr opcode.

From the documentation:

The Common Language Infrastructure (CLI) guarantees that the result of two ldstr instructions referring to two metadata tokens that have the same sequence of characters return precisely the same string object (a process known as "string interning").

So in all three cases, you get the same string object - the interned string instance. This explains the "mutated" const object.

Tags:

C#

.Net

String