Parse UTF8 string from ReadOnlySequence<byte>

The first thing we should do here is test whether the sequence actually is a single span; if it is, we can hugely simplify and optimize.

Once we know that we have a multi-segment (discontiguous) buffer, there are two ways we can go:

  1. linearize the segments into a contiguous buffer, probably leasing an oversized buffer from ArrayPool.Shared, and use UTF8.GetString on the correct portion of the leased buffer, or
  2. use the GetDecoder() API on the encoding, and use that to populate a new string, which on older frameworks means overwriting a newly allocated string, or in newer frameworks means using the string.Create API

The first option is massively simpler, but involves a few memory-copy operations (but no additional allocations other than the string):

public static string GetString(in this ReadOnlySequence<byte> payload,
    Encoding encoding = null)
{
    encoding ??= Encoding.UTF8;
    return payload.IsSingleSegment ? encoding.GetString(payload.FirstSpan)
        : GetStringSlow(payload, encoding);

    static string GetStringSlow(in ReadOnlySequence<byte> payload, Encoding encoding)
    {
        // linearize
        int length = checked((int)payload.Length);
        var oversized = ArrayPool<byte>.Shared.Rent(length);
        try
        {
            payload.CopyTo(oversized);
            return encoding.GetString(oversized, 0, length);
        }
        finally
        {
            ArrayPool<byte>.Shared.Return(oversized);
        }
    }
}

Tags:

C#

.Net