How do I create a URL shortener?

Not an answer to your question, but I wouldn't use case-sensitive shortened URLs. They are hard to remember, usually unreadable (many fonts render 1 and l, 0 and O and other characters very very similar that they are near impossible to tell the difference) and downright error prone. Try to use lower or upper case only.

Also, try to have a format where you mix the numbers and characters in a predefined form. There are studies that show that people tend to remember one form better than others (think phone numbers, where the numbers are grouped in a specific form). Try something like num-char-char-num-char-char. I know this will lower the combinations, especially if you don't have upper and lower case, but it would be more usable and therefore useful.


Why would you want to use a hash?

You can just use a simple translation of your auto-increment value to an alphanumeric value. You can do that easily by using some base conversion. Say you character space (A-Z, a-z, 0-9, etc.) has 62 characters, convert the id to a base-40 number and use the characters as the digits.


public class UrlShortener {
    private static final String ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
    private static final int    BASE     = ALPHABET.length();

    public static String encode(int num) {
        StringBuilder sb = new StringBuilder();
        while ( num > 0 ) {
            sb.append( ALPHABET.charAt( num % BASE ) );
            num /= BASE;
        }
        return sb.reverse().toString();   
    }

    public static int decode(String str) {
        int num = 0;
        for ( int i = 0; i < str.length(); i++ )
            num = num * BASE + ALPHABET.indexOf(str.charAt(i));
        return num;
    }   
}

I would continue your "convert number to string" approach. However, you will realize that your proposed algorithm fails if your ID is a prime and greater than 52.

Theoretical background

You need a Bijective Function f. This is necessary so that you can find a inverse function g('abc') = 123 for your f(123) = 'abc' function. This means:

  • There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
  • and for every y you must be able to find an x so that f(x) = y.

How to convert the ID to a shortened URL

  1. Think of an alphabet we want to use. In your case, that's [a-zA-Z0-9]. It contains 62 letters.
  2. Take an auto-generated, unique numerical key (the auto-incremented id of a MySQL table for example).

    For this example, I will use 12510 (125 with a base of 10).

  3. Now you have to convert 12510 to X62 (base 62).

    12510 = 2×621 + 1×620 = [2,1]

    This requires the use of integer division and modulo. A pseudo-code example:

    digits = []
    
    while num > 0
      remainder = modulo(num, 62)
      digits.push(remainder)
      num = divide(num, 62)
    
    digits = digits.reverse
    

    Now map the indices 2 and 1 to your alphabet. This is how your mapping (with an array for example) could look like:

    0  → a
    1  → b
    ...
    25 → z
    ...
    52 → 0
    61 → 9
    

    With 2 → c and 1 → b, you will receive cb62 as the shortened URL.

    http://shor.ty/cb
    

How to resolve a shortened URL to the initial ID

The reverse is even easier. You just do a reverse lookup in your alphabet.

  1. e9a62 will be resolved to "4th, 61st, and 0th letter in the alphabet".

    e9a62 = [4,61,0] = 4×622 + 61×621 + 0×620 = 1915810

  2. Now find your database-record with WHERE id = 19158 and do the redirect.

Example implementations (provided by commenters)

  • C++
  • Python
  • Ruby
  • Haskell
  • C#
  • CoffeeScript
  • Perl

Tags:

Algorithm

Url