File Uploads with Turbogears 2

I just want anyone who comes here looking for answers, to know that Allesandro Molina's great library Depot constitutes the best answer to this question.

It solves both the naming and copying issues, and it will incorporate nicely into your TurboGears application. You can use it with MongoDB GridFS, as in this example:

from depot.manager import DepotManager

# Configure a *default* depot to store files on MongoDB GridFS
DepotManager.configure('default', {
    'depot.backend': 'depot.io.gridfs.GridFSStorage',
    'depot.mongouri': 'mongodb://localhost/db'
})

depot = DepotManager.get()

# Save the file and get the fileid
fileid = depot.create(open('/tmp/file.png'))

# Get the file back
stored_file = depot.get(fileid)
print stored_file.filename
print stored_file.content_type

or you can easily create attachment fields in your SQLAlchemy models, like:

from depot.fields.sqlalchemy import UploadedFileField

class Document(Base):
    __tablename__ = 'document'

    uid = Column(Integer, autoincrement=True, primary_key=True)
    name = Column(Unicode(16), unique=True)

    content = Column(UploadedFileField)

… and then, storing documents with attached files (the source can be a file or bytes) becomes as easy as:

doc = Document(name=u'Foo', content=open('/tmp/document.xls'))
DBSession.add(doc)

Depot supports both LocalFileStorage, MongoDB's GridFSStorage, and Amazon's S3Storage. And, at least for files stored locally and in S3, the fileid will be generated by uuid.uuid1().


I don't know much about Turbogears and whether it can provide anything to avoid the following, but it seems to me that this code is fraught with danger. It may be possible for a malicious user to overwrite (or create) any file that the Turbogears python process has write access to.

What if asset_dirname is /tmp, the contents of file.filename is ../../../../../../../etc/passwd and the contents of the file root::0:0:root:/root:/bin/bash? In a UNIX environment this code (permissions pending) would open the file /tmp/../../../../../../../etc/passwd in truncate mode and then copy the contents of the uploaded file to it - effectively overwriting your system's password file and specifying a root user without a password. Presumably there are nasty things that can be done to a Windows machine too.

OK, this is an extreme example that requires that python is running as root (no one does that, do they?). Even if python is running as a low-priveleged user, previously uploaded files could be overwritten at will.

To summarise, don't trust user input, in this case the user supplied filename that is available in file.filename.


@mhawke - you're right you have to handle that - depends on what you are doing with the file, if it doesn't matter if there is a name collision eg you only care for the latest version of some data then theres probably no issue, or if the filename isn't actually important just the file contents, but its still bad practice.

You could use a named tempfile in a tmp dir, then move the file once validated to its final location. Or you could check the filename doesn't already exist like so:

file.name = slugify(myfile.filename)
name, ext = os.path.splitext(file.name)
while os.path.exists(os.path.join(permanent_store, file.name)):
    name += '_'
    file.name = name + ext

raw_file = os.path.join(permanent_store, file.name)

The slugify method would be used to tidy up the filename...