How can I get MIME type of an InputStream of a file that is being uploaded?

I'm a big proponent of "do it yourself first, then look for a library solution". Luckily, this case is just that.

You have to know the file's "magic number", i.e. its signature. Let me give an example for detecting whether the InputStream represents PNG file.

PNG signature is composed by appending together the following in HEX:

1) error-checking byte

2) string "PNG" as in ASCII:

     P - 0x50
     N - 0x4E
     G - 0x47

3) CR (carriage return) - 0x0D

4) LF (line feed) - 0xA

5) SUB (substitute) - 0x1A

6) LF (line feed) - 0xA

So, the magic number is

89   50 4E 47 0D 0A 1A 0A

137  80 78 71 13 10 26 10 (decimal)
-119 80 78 71 13 10 26 10 (in Java)

Explanation of 137 -> -119 conversion

N bit number can be used to represent 2^N different values. For a byte (8 bits) that is 2^8=256, or 0..255 range. Java considers byte primitives to be signed, so that range is -128..127. Thus, 137 is considered to be singed and represent -119 = 137 - 256.

Example in Koltin

private fun InputStream.isPng(): Boolean {
    val magicNumbers = intArrayOf(-119, 80, 78, 71, 13, 10, 26, 10)
    val signatureBytes = ByteArray(magicNumbers.size)
    read(signatureBytes, 0, signatureBytes.size)
    return signatureBytes.map { it.toInt() }.toIntArray().contentEquals(magicNumbers)
}

Of course, in order to support many MIME types, you have to scale this solution somehow, and if you are not happy with the result, consider some library.


I wrote my own content-type detector for a byte[] because the libraries above weren't suitable or I didn't have access to them. Hopefully this helps someone out.

// retrieve file as byte[]
byte[] b = odHit.retrieve( "" );

// copy top 32 bytes and pass to the guessMimeType(byte[]) funciton
byte[] topOfStream = new byte[32];
System.arraycopy(b, 0, topOfStream, 0, topOfStream.length);
String mimeGuess = guessMimeType(topOfStream);

...

private static String guessMimeType(byte[] topOfStream) {

    String mimeType = null;
    Properties magicmimes = new Properties();
    FileInputStream in = null;

    // Read in the magicmimes.properties file (e.g. of file listed below)
    try {
        in = new FileInputStream( "magicmimes.properties" );
        magicmimes.load(in);
        in.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

    // loop over each file signature, if a match is found, return mime type
    for ( Enumeration keys = magicmimes.keys(); keys.hasMoreElements(); ) {
        String key = (String) keys.nextElement();
        byte[] sample = new byte[key.length()];
        System.arraycopy(topOfStream, 0, sample, 0, sample.length);
        if( key.equals( new String(sample) )){
            mimeType = magicmimes.getProperty(key);
            System.out.println("Mime Found! "+ mimeType);
            break;
        } else {
            System.out.println("trying "+key+" == "+new String(sample));
        }
    }

    return mimeType;
}

magicmimes.properties file example (not sure these signatures are correct, but they worked for my uses)

# SignatureKey                  content/type
\u0000\u201E\u00f1\u00d9        text/plain
\u0025\u0050\u0044\u0046        application/pdf
%PDF                            application/pdf
\u0042\u004d                    image/bmp
GIF8                            image/gif
\u0047\u0049\u0046\u0038        image/gif
\u0049\u0049\u004D\u004D        image/tiff
\u0089\u0050\u004e\u0047        image/png
\u00ff\u00d8\u00ff\u00e0        image/jpg

It depends on where you are getting the input stream from. If you are getting it from a servlet then it is accessable through the HttpServerRequest object that is an argument of doPost. If you are using some sort of rest API like Jersey then the request can be injected by using @Context. If you are uploading the file through a socket it will be your responsibility to specify the MIME type as part of your protocol as you will not inherit the http headers.


According to Real Gagnon's excellent site, the better solution for your case would be to use Apache Tika.