Upload Files Directly to AWS S3 Private Bucket

Posted on November 05, 201812 min read — in aws

Amazon S3 is one of the most used cloud object storage built to store and retrieve any amount of data from anywhere – websites and mobile apps, corporate applications, and data from IoT sensors or devices. Most of the time, files are uploaded to S3 from server-side using SDK. In this process, at first, server receives the files from client-side and then it uploads the file to S3. Between this transition from client-side to the server to S3, files are temporarily held into server memory. This might not be an issue for uploading small sized files, but it is certainly a big issue if the file size is very large.

Consider a scenario where users can upload maximum 5GB of files at a time and the application server is hosted on AWS EC2 instance which has 60GB of storage. At any moment, if 20 people concurrently upload 5 GB of files each, then the total size of files that server receives is larger than its memory capacity. And we certainly don't expect to happen this with our own production server. So, the solution to this problem is to upload files directly to S3 without any intervention from the server.

In this post, I will try to give a high-level idea about how to handle such a scenario. I will be using Python and Boto3 which is the official SDK of AWS for Python.

Install Boto3

Install AWS SDK for Python:

pip install boto3

Create IAM User with Appropriate Permissions

Log into the AWS Management Console and create IAM user, give necessary read/write permissions for S3 and collect the AWS_ACCESS_KEY_ID and AWS_SECRET_KEY for that user.

Create S3 Client

First, we need to create an S3 client using Boto3 by providing AWS_ACCESS_KEY_ID, AWS_SECRET_KEY and the region of our AWS account.

import boto3

AWS_ACCESS_KEY_ID = <access_key_id>
AWS_SECRET_KEY = <secret_key>
AWS_REGION = <region_name>

client = boto3.client(
    's3',
    aws_access_key_id = AWS_ACCESS_KEY_ID,
    aws_secret_access_key = AWS_SECRET_KEY,
    region_name = AWS_REGION
)

Create S3 Bucket

Create a private bucket:

AWS_BUCKET_NAME = 'mybucket'
client.create_bucket(Bucket = AWS_BUCKET_NAME)

Update CORS configuration

Set the following CORS configuration on the newly created bucket. This step is very important. CORS configuration may change depending on our needs. Replace AWS_BUCKET_NAME with our bucket name.

cors_configuration = {
    'CORSRules': [{
        'AllowedHeaders': ['*'],
        'AllowedMethods': ['GET', 'PUT'],
        'AllowedOrigins': ['*'],
        'MaxAgeSeconds': 3000
    }]
}

client.put_bucket_cors(Bucket=AWS_BUCKET_NAME, CORSConfiguration=cors_configuration)

Btw, the previous two steps can also be done via the AWS Management Console.

Upload Files

To upload files directly from client side to S3, first we need to generate a presigned URL for upload. Again, we'll be using our S3 client.

FILE_NAME = 'flower.png'
FILE_PATH = '../images/flower.png'
AWS_BUCKET_NAME = 'mybucket'

PRESIGNED_UPLOAD_URL = client.generate_presigned_url(
    ClientMethod = 'put_object',  
    Params = {
        'Bucket': AWS_BUCKET_NAME,
        'Key': FILE_NAME,
    }, 
    ExpiresIn = 3600,
)
print(PRESIGNED_UPLOAD_URL)

It will return a temporary URL which will expire in next 1 hour. In case of uploading, the ClientMethod will be put_object. Next we need to post our files in this URL.

curl --request PUT --upload-file FILE_PATH "PRESIGNED_UPLOAD_URL"

Here, I have used curl command to upload the file.

Download Files

To download files from our private bucket, again we need to generate another presigned URL. This time ClientMethod will be get_object.

PRESIGNED_DOWNLOAD_URL = client.generate_presigned_url(
    ClientMethod = 'get_object',  
    Params = {
        'Bucket': AWS_BUCKET_NAME,
        'Key': FILE_NAME,
    }, 
    ExpiresIn = 3600,
)
print(PRESIGNED_DOWNLOAD_URL)

Finally, open this PRESIGNED_DOWNLOAD_URL in any browser and the file will be downloaded directly from S3.

Practical implementation is to create an API endpoint which will receive the filename, client method and will return the presigned URL. Then when a multipart form is submitted, use AJAX to get the filename from file field, call the API and get the presigned URL. Finally, use AJAX to post the files in this URL.

« PREVIOUS
Control Access to API Gateway Using Amazon Cognito User Pool as Authorizer
NEXT »
Python Version Management: pyenv