How to correctly import data zipped with the "deflate" algorithm?

The Base64 string you provided as an example is not an encoding of a gzip stream (RFC 1952). It is an encoding of a zlib stream (RFC 1950). For background, those are different wrappers around the raw "deflate" compressed data format (RFC 1951), where the wrappers are headers and trailers proving information on the compressed data and integrity check values.

The Mathematica GZIP importer does not see a gzip stream, so it is rejected. The Mathematica import list in 10.0 does not include zlib, nor raw deflate.

The documentation for the format calls the format "GZipBase64Binary". If the example you gave is a valid element in that format, then the documentation is very misleading.

The documentation also says: "The third encoding,Base64GzipBinary, compresses the binary data using ZLIB and then converts the data to a Base64 representation." Here they misname their own format (swapping the Gzip and the Base64), and then say it is compressed using ZLIB. The zlib compression library can compress to any of: the zlib format, the gzip format, or raw deflate. The specification should, but does not, specify how the zlib library is to be used.

The specification is poorly written.

The Java Inflater class (misspelled, should be "Inflator") in fact decodes the zlib format, which is why that works. The Java documentation is also not clear, in some places saying it operates on deflate data, and in others that it operates on zlib data. In fact, it operates on zlib data, unless the nowrap parameter is true in the Inflater constructor, in which case it will operate on raw deflate data.

Update:

I thought I would be able to trick Mathematica into decompressing zlib streams by embedding them as comments in PNG files. The PNG format compresses images and comments using the zlib format, and Mathematica will decompress them. Alas, Mathematica refuses to decompress comments with arbitrary binary data that contains, for example, zeros. Your decompressed data starts with zeros. It looks like other byte values are dropped as well.

You will need to use external code to decompress zlib streams, until such time as Wolfram Language includes "ZLIB" as an Import format.


A simple, efficient, but undocumented way to use zlib deflate algorithm in Mathematica is to utilize the functions Developer`RawCompress[] and Developer`RawUncompress[].

They have the following syntax:

zlibStreamBytes = Developer`RawCompress[uncompressedDataBytes]
uncompressedDataBytes = Developer`RawUncompress[zlibStreamBytes]

Input and output of both functions are list of bytes, where each byte is an integer from 0 to 255.

uncompressedDataBytes represents a list of bytes one wants to compress.

zlibStreamBytes is a list of bytes representing zlib stream.

Simple example:

uncompressedDataBytes = ConstantArray[42, 30];
zlibStreamBytes = Developer`RawCompress[uncompressedDataBytes]

{120, 156, 211, 210, 194, 7, 0, 76, 104, 4, 237}

uncompressedDataBytes = Developer`RawUncompress[zlibStreamBytes]

{42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42}

Example from your question:

data = "eJwNylEVgDAMQ9EIwAAGMNDvNQiYAQzUAAZmY7MxG60NdJC/vHsCAJW9VWZb83RtB4avOd1sq9MjPhlYeVAfRlw0MwK3rMseWche2eAPY5cekQ==";
decoded = Developer`RawUncompress[ImportString[data, {"Base64", "Binary"}]];
ImportString[FromCharacterCode[decoded], "Real32"]

{0., 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.}

Remark:

If you want to use only documented features you can write your own LibraryLink wrapper around zlib. Such method will give the same high performance as Developer`* functions, but will require compilation of a dynamic library for each operating system you use.