Why are standard string functions faster than my custom string functions?

strcmp and other library routines are written in assembly, or specialized C code, by experienced engineers and use a variety of techniques.

For example, the assembly implementation might load four bytes at a time into a register, and compare that register (as a 32-bit integer) to four bytes from the other string. On some machines, the assembly implementation might load eight bytes or even more. If the comparison shows the bytes are equal, the implementation moves on to the next four bytes. If the comparison shows the bytes are unequal, the implementation stops.

Even with this simple optimization, there are a number of issues to be dealt with. If the string addresses are not multiples of four bytes, the processor might not have an instruction that will load four bytes (many processors require four-byte loads to use addresses that are aligned to multiples of four bytes). Depending on the processor, the implementation might have to use slower unaligned loads or to write special code for each alignment case that does aligned loads and shifts bytes in registers to align the bytes to be compared.

When the implementation loads four bytes at once, it must ensure it does not load bytes beyond the terminating null character if those bytes might cause a segment fault (error because you tried to load an address that is not readable).

If the four bytes do contain the terminating null character, the implementation must detect it and not continue comparing further bytes, even if the current four are equal in the two strings.

Many of these issues require detailed assembly instructions, and the required control over the exact instructions used is not available in C. The exact techniques used vary from processor model to processor model and vary greatly from architecture to architecture.


if(xstrlen(str1)!=xstrlen(str2))    //computing length of str1
    return -1;                      
k=xstrlen(str1)-1;                  //computing length of str1 AGAIN!

You're computing the length of str1 TWICE. That is one reason why your function loses the game.

Also, your implemetation of xstrcmp is very naive compared to the ones defined in (most) Standard libraries. For example, your xstrcmp compares one byte at a time, when in fact it could compare multiple bytes in one go, taking advantage of proper alignment as well, or can do little preprocessing so as to align memory blocks, before actual comparison.


Faster implementation of strlen:

//Return difference in addresses - 1 as we don't count null terminator in strlen.
int xstrlen(char *str)
{
    char* ptr = str;
    while (*str++);
    return str - ptr - 1;
}

//Pretty nifty strcmp from here:
//http://vijayinterviewquestions.blogspot.com/2007/07/implement-strcmpstr1-str2-function.html
int mystrcmp(const char *s1, const char *s2)
{
    while (*s1==*s2)
    {
        if(*s1=='\0')
            return(0);
        ++s1;
        ++s2;
    }
    return(*s1-*s2);
}

I'll do the other one later if I have time. You should also note that most of these are done in assembly language or using other optimized means which will be faster than the best stright C implementation you can write.