Write thread-safe to file in python

We used the logging module:

import logging

logpath = "/tmp/log.log"
logger = logging.getLogger('log')
logger.setLevel(logging.INFO)
ch = logging.FileHandler(logpath)
ch.setFormatter(logging.Formatter('%(message)s'))
logger.addHandler(ch)


def application(env, start_response):
   logger.info("%s %s".format("hello","world!"))
   start_response('200 OK', [('Content-Type', 'text/html')])
   return ["Hello!"]

I've made a simple writer, that uses threading and Queue and works fine with multiple threads. Pros: teoreticaly it can aссept data from multiple processes without blocking them, and write asynconiosly in other thread. Cons: additional thread for writing consumes resourses; in CPython threading doesn't give real multithreading.

from queue import Queue, Empty
from threading import Thread

class SafeWriter:
    def __init__(self, *args):
        self.filewriter = open(*args)
        self.queue = Queue()
        self.finished = False
        Thread(name = "SafeWriter", target=self.internal_writer).start()  
    
    def write(self, data):
        self.queue.put(data)
    
    def internal_writer(self):
        while not self.finished:
            try:
                data = self.queue.get(True, 1)
            except Empty:
                continue    
            self.filewriter.write(data)
            self.queue.task_done()
    
    def close(self):
        self.queue.join()
        self.finished = True
        self.filewriter.close()
                    
#use it like ordinary open like this:
w = SafeWriter("filename", "w")
w.write("can be used among multiple threads")
w.close() #it is really important to close or the program would not end 

Look at the Queue class, it is thread safe.

from Queue import Queue
writeQueue = Queue()

in thread

writeQueue.put(repr(some_object))

Then to dump it to a file,

outFile = open(path,'w')
while writeQueue.qsize():
  outFile.write(writeQueue.get())
outFile.flush()
outFile.close()

Queue will accept any python object, so if you're trying to do something other than print to a file, just store the objects from the worker threads via Queue.put.

If you need to split the commits across multiple invocations of the script, you'll need a way to cache partially built commits to disk. To avoid multiple copies trying to write to the file at the same time, use the lockfile module, available via pip. I usually use json to encode data for these purposes, it supports serializing strings, unicode, lists, numbers, and dicts, and is safer than pickle.

with lockfile.LockFile('/path/to/file.sql'):
  fin=open('/path/to/file')
  data=json.loads(fin.read())
  data.append(newdata)
  fin.close()
  fout=open('/path/to/file','w')
  fout.write(json.dumps(data))
  fout.close()

Note that depending on OS features, the time taken to lock and unlock the file as well as rewrite it for every request may be more than you expect. If possible, try to just append to the file, as that will be faster. Also, you may want to use a client/server model, where each 'request' launches a worker script which connects to a server process and forwards the data on via a network socket. This sidesteps the need for lockfiles; depending on how much data you're talking, it may be able to hold it all in memory in the server process, or the server may need to serialize it to disk and pass it to the database that way.

WSGI server example:

from Queue import Queue
q=Queue()
def flushQueue():
    with open(path,'w') as f:
       while q.qsize():
           f.write(q.get())

def application(env, start_response):
   q.put("Hello World!")
   if q.qsize() > 999:
       flushQueue()
   start_response('200 OK', [('Content-Type', 'text/html')])
   return ["Hello!"]

Tags:

Python