Proper way to implement RESTful large file upload

I would recommend taking a look at the Amazon S3 Rest API's solution to multipart file upload. The documentation can be found here.

To summarize the procedure Amazon uses:

  1. The client sends a request to initiate a multipart upload, the API responds with an upload id

  2. The client uploads each file chunk with a part number (to maintain ordering of the file), the size of the part, the md5 hash of the part and the upload id; each of these requests is a separate HTTP request. The API validates the chunk by checking the md5 hash received chunk against the md5 hash the client supplied and the size of the chunk matches the size the client supplied. The API responds with a tag (unique id) for the chunk. If you deploy your API across multiple locations you will need to consider how to store the chunks and later access them in a way that is location transparent.

  3. The client issues a request to complete the upload which contains a list of each chunk number and the associated chunk tag (unique id) received from API. The API validates there are no missing chunks and that the chunk numbers match the correct chunk tag and then assembles the file or returns an error response.

Amazon also supplies methods to abort the upload and list the chunks associated with the upload. You may also want to consider a timeout for the upload request in which the chunks are destroyed if the upload is not completed within a certain amount of time.

In terms of controlling the chunk sizes that the client uploads, you won't have much control over how the client decides to split up the upload. You could consider having a maximum chunk size configured for the upload and supply error responses for requests that contain chunks larger than the max size.

I've found the procedure works very well for handling large file uploads in REST APIs and facilitates the handling of the many edge cases associated with file upload. Unfortunately, I've yet to find a library that makes this easy to implement in any language so you pretty much have to write all of the logic yourself.


https://tus.io/ is resumable protocol which helps in chunk uploading and resuming the upload after timeout. This is a opensource implementation and has various client and server implementations already in different languages.