Mark data as sensitive in python

... The only solution to this is to use mutable data structures. That is, you must only use data structures that allow you to dynamically replace elements. For example, in Python you can use lists to store an array of characters. However, every time you add or remove an element from a list, the language might copy the entire list behind your back, depending on the implementation details. To be safe, if you have to dynamically resize a data structure, you should create a new one, copy data, and then write over the old one. For example:

def paranoid_add_character_to_list(ch, l):
  """Copy l, adding a new character, ch.  Erase l.  Return the result."""
  new_list = []
  for i in range(len(l)):
    new_list.append(0)
  new_list.append(ch)
  for i in range(len(l)):
    new_list[i] = l[i]
    l[i] = 0
  return new_list

Source: http://www.ibm.com/developerworks/library/s-data.html

  • Author: John Viega ([email protected]) is co-author of Building Secure Software (Addison-Wesley, 2001) and Java Enterprise Architecture (O'Reilly and Associates, 2001). John has authored more than 50 technical publications, primarily in the area of software security. He also wrote Mailman, the GNU Mailing List Manager and ITS4, a tool for finding security vulnerabilities in C and C++ code.

No way to "mark as sensitive", but you could encrypt the data in memory and decrypt it again when you need to use it -- not a perfect solution but the best I can think of.


Edit

I have made a solution that uses ctypes (which in turn uses C) to zero memory.

import sys
import ctypes

def zerome(string):
    location = id(string) + 20
    size     = sys.getsizeof(string) - 20

    memset =  ctypes.cdll.msvcrt.memset
    # For Linux, use the following. Change the 6 to whatever it is on your computer.
    # memset =  ctypes.CDLL("libc.so.6").memset

    print "Clearing 0x%08x size %i bytes" % (location, size)

    memset(location, 0, size)

I make no guarantees of the safety of this code. It is tested to work on x86 and CPython 2.6.2. A longer writeup is here.

Decrypting and encrypting in Python will not work. Strings and Integers are interned and persistent, which means you are leaving a mess of password information all over the place.

Hashing is the standard answer, though of course the plaintext eventually needs to be processed somewhere.

The correct solution is to do the sensitive processes as a C module.

But if your memory is constantly being compromised, I would rethink your security setup.