parsing age of empires game record files(.mgx)

Old but here is a sample of what I did :

class GameRecordParser:

def __init__(self, filename):
    self.filename = filename
    f = open(filename, 'rb')

    # Get header size
    header_size = struct.unpack('<I', f.read(4))[0]
    sub = struct.unpack('<I', f.read(4))[0]
    if sub != 0 and sub < os.stat(filename).st_size:
        f.seek(4)
        self.header_start = 4
    else:
        self.header_start = 8

    # Get and decompress header
    header = f.read(header_size - self.header_start)
    self.header_data = zlib.decompress(header, -zlib.MAX_WBITS)

    # Get body
    self.body = f.read()
    f.close()

    # Get players data
    sep = b'\x04\x00\x00\x00Gaia'
    pos = self.header_data.find(sep) + len(sep)
    players = []
    for k in range(0, 8):
        id = struct.unpack('<I', self.header_data[pos:pos+4])[0]
        pos += 4
        type = struct.unpack('<I', self.header_data[pos:pos+4])[0]
        pos += 4
        name_size = struct.unpack('<I', self.header_data[pos:pos+4])[0]
        pos += 4
        name = self.header_data[pos:pos+name_size].decode('utf-8')
        pos += name_size
        if id < 9:
            players.append(Player(id, type, name))

Hope it helps future programmer :)

By the wway I am planning on writting such a library.


Your first problem is that you shouldn't be reversing the data; just get rid of the [::-1].

But if you do that, instead of getting that error -3, you get a different error -3, usually about an unknown compression method.

The problem is that this is headerless zlib data, much like what gzip uses. In theory, this means the information about the compression method, window, start dict, etc. has to be supplied somewhere else in the file (in gzip's case, by information in the gzip header). But in practice, everyone uses deflate with the max window size and no start dict, so if I were designing a compact format for a game back in the days when every byte counted, I'd just hardcode them. (In modern times, exactly that has been standardized in an RFC as "DEFLATE Compressed Data Format", but most 90s PC games weren't following RFCs by design...)

So:

>>> uncompressed_data = zlib.decompress(compressed_data, -zlib.MAX_WBITS)
>>> uncompressed_data[:8] # version
b'VER 9.8\x00'
>>> uncompressed_data[8:12] # unknown_const
b'\xf6(<A'

So, it not only decompressed, that looks like a version and… well, I guess anything looks like an unknown constant, but it's the same unknown constant in the spec, so I think we're good.

As the decompress docs explain, MAX_WBITS is the default/most common window size (and the only size used by what's usually called "zlib deflate" as opposed to "zlib"), and passing a negative value means that the header is suppressed; the other arguments we can leave to defaults.

See also this answer, the Advanced Functions section in the zlib docs, and RFC 1951. (Thanks to the OP for finding the links.)