Is mounting S3 buckets directly into EC2 instances safe?

If you are using S3 as for storing data from user uploads, especially in a distributed environment, one big consideration is the fact that S3 is 'eventually consistent' (although, some regions are read-after-write consistent). The consequence of this is that you may successfully upload a file, but if you check for its existence immediately thereafter, may find it to not exist. This problem is more pronounced for scenarios such as updates or deletes, where even read-after-write consistency will not help.

The above will apply to your uploads to S3 regardless of the approach you take. In fact, this is true of most problems one might expect of S3 - it is not so much the approach used to store the data as it is the limitations of S3 that will likely be the most problematic.

S3fs uses the S3 API - just like the PHP (or other) SDK does. Moreover, S3 is designed to handle fairly high levels of concurrency - so (other than the consistency issues) there shouldn't be a problem mounting it on multiple instances (keeping in mind it isn't a traditional file system - problems like locking, etc are handled on the S3 side).

That said, there are some potential advantages and disadvantages of each implementation:

S3fs:

  • No support for partial/chunked downloads (as far as I know) - so you must download the full file to read any part of it - probably not an issue if you are just using it to store (and serve) uploads.
  • Written in C++ possible performance gains
  • Your application benefits from any updates to s3fs
  • Implements caching (both of full files and file information) - has the potential to improve speed a bit, and reduce costs
  • Limited to the functions that fuse exposes

SDK:

  • Exposes the full set of features S3 has to offer - depending on your use case, this may be enough to merit the use the SDK
  • Potentially tighter integration with your application - the returned errors, etc may allow your application to make better informed (and therefore more precise) choices
  • Any possible advantages need to be coded for - your application has to take advantage of them and be kept up to date with future changes to S3
  • More complexity and overhead to your code

In terms of 'safety', you could mean 'preventing data corruption' or 'preventing unauthorized access'. With regard to the former, the SDK might help a bit for dealing with eventual consistency (in the form of more verbose errors), but the underlying storage is the same, and I expect the differences to be minor. With regard to access control - you can use IAM to create a limited account, but that account is still going to need read/write access to your S3 files. Both should be adequately secure, in either case, your system needs to be compromised to gain access to your S3 bucket - I would suggest however, that with S3fs (since the credentials are typically stored outside the webroot, and are not accessible at all via PHP) there is slightly better security.

Personal opinion: I'd favour s3fs for a case where there is a single upload directory (e.g. one site making use of it) and where the access will be fairly simple (just need to upload files and occasionally update/delete). If you are going to need more complex access (e.g. partial downloads, multiple buckets, etc) or are going to use the S3 SDK for other purposes, then I would stick with the SDK for the uploads as well.