Which is the best way to compress json to store in a memory based store like redis or memcache?

We just use gzip as a compressor.

import gzip
import cStringIO

def decompressStringToFile(value, outputFile):
  """
  decompress the given string value (which must be valid compressed gzip
  data) and write the result in the given open file.
  """
  stream = cStringIO.StringIO(value)
  decompressor = gzip.GzipFile(fileobj=stream, mode='r')
  while True:  # until EOF
    chunk = decompressor.read(8192)
    if not chunk:
      decompressor.close()
      outputFile.close()
      return 
    outputFile.write(chunk)

def compressFileToString(inputFile):
  """
  read the given open file, compress the data and return it as string.
  """
  stream = cStringIO.StringIO()
  compressor = gzip.GzipFile(fileobj=stream, mode='w')
  while True:  # until EOF
    chunk = inputFile.read(8192)
    if not chunk:  # EOF?
      compressor.close()
      return stream.getvalue()
    compressor.write(chunk)

In our usecase we store the result as files, as you can imagine. To use just in-memory strings, you can use a cStringIO.StringIO() object as a replacement for the file as well.

Based on @Alfe's answer above here is a version that keeps the contents in memory (for network I/O tasks). I also made a few changes to support Python 3.

import gzip
from io import StringIO, BytesIO

def decompressBytesToString(inputBytes):
  """
  decompress the given byte array (which must be valid 
  compressed gzip data) and return the decoded text (utf-8).
  """
  bio = BytesIO()
  stream = BytesIO(inputBytes)
  decompressor = gzip.GzipFile(fileobj=stream, mode='r')
  while True:  # until EOF
    chunk = decompressor.read(8192)
    if not chunk:
      decompressor.close()
      bio.seek(0)
      return bio.read().decode("utf-8")
    bio.write(chunk)
  return None

def compressStringToBytes(inputString):
  """
  read the given string, encode it in utf-8,
  compress the data and return it as a byte array.
  """
  bio = BytesIO()
  bio.write(inputString.encode("utf-8"))
  bio.seek(0)
  stream = BytesIO()
  compressor = gzip.GzipFile(fileobj=stream, mode='w')
  while True:  # until EOF
    chunk = bio.read(8192)
    if not chunk:  # EOF?
      compressor.close()
      return stream.getvalue()
    compressor.write(chunk)

To test the compression try:

inputString="asdf" * 1000
len(inputString)
len(compressStringToBytes(inputString))
decompressBytesToString(compressStringToBytes(inputString))

If you want it to be fast, try lz4. If you want it to compress better, go for lzma.

Are there any other better ways to compress json to save memory in redis(also ensuring light weight decoding afterwards)?

How good a candidate would be msgpack [http://msgpack.org/]?

Msgpack is relatively fast and has a smaller memory footprint. But ujson is generally faster for me. You should compare them on your data, measure the compression and decompression rates and the compression ratio.

Shall I consider options like pickle as well?

Consider both pickle(cPickle in partucular) and marshal. They are fast. But remember that they are not secure or scalable and you pay for the speed with the added responsibility.

Which is the best way to compress json to store in a memory based store like redis or memcache?

Tags:

Python

Json

Redis

Msgpack

Related

Recent Posts