Overhead of converting from []byte to string and vice-versa

In Go, strings are immutable so any change creates a new string. As a general rule, convert from a string to a byte or rune slice once and convert back to a string once. To avoid reallocations, for small and transient allocations, over-allocate to provide a safety margin if you don't know the exact number.

For example,

package main

import (
    "bytes"
    "fmt"
    "unicode"
    "unicode/utf8"

    "code.google.com/p/go.text/transform"
    "code.google.com/p/go.text/unicode/norm"
)

var isMn = func(r rune) bool {
    return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
}

var transliterations = map[rune]string{
    'Æ': "AE", 'Ð': "D", 'Ł': "L", 'Ø': "OE", 'Þ': "Th",
    'ß': "ss", 'æ': "ae", 'ð': "d", 'ł': "l", 'ø': "oe",
    'þ': "th", 'Œ': "OE", 'œ': "oe",
}

func RemoveAccents(b []byte) ([]byte, error) {
    mnBuf := make([]byte, len(b)*125/100)
    t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
    n, _, err := t.Transform(mnBuf, b, true)
    if err != nil {
        return nil, err
    }
    mnBuf = mnBuf[:n]
    tlBuf := bytes.NewBuffer(make([]byte, 0, len(mnBuf)*125/100))
    for i, w := 0, 0; i < len(mnBuf); i += w {
        r, width := utf8.DecodeRune(mnBuf[i:])
        if s, ok := transliterations[r]; ok {
            tlBuf.WriteString(s)
        } else {
            tlBuf.WriteRune(r)
        }
        w = width
    }
    return tlBuf.Bytes(), nil
}

func main() {
    in := "test stringß"
    fmt.Println(in)
    inBytes := []byte(in)
    outBytes, err := RemoveAccents(inBytes)
    if err != nil {
        fmt.Println(err)
    }
    out := string(outBytes)
    fmt.Println(out)
}

Output:

test stringß
test stringss

There is no answer to this question. If these conversions are a performance bottleneck in your application you should fix them. If not: Not.

Did you profile your application under realistic load and RemoveAccents is the bottleneck? No? So why bother?

Really: I assume one could do better (in the sense of less garbage, less iterations and less conversions) e.g. by chaining in some "TransliterationTransformer". But I doubt it would be wirth the hassle.

Tags:

String

Go