S3 presigned post urls issue using boto3
I have a webapp served with apache2 running python-flask in the backend. The app heavily relies on the S3 Object Storage. I'm using boto3
to interact with the S3 storage. My issue is regarding the generate_presigned_url
method when used in production. It returns the following structure:
{
'url': 'https://eu-central-1.linodeobjects.com/my-s3-bucket',
'fields': {
'ACL': 'private',
'key': 'foo.bar',
'AWSAccessKeyId': 'FOOBAR',
'policy': 'base64longhash...',
'signature': 'foobar'
}
}
Everytime I use this method on the same python session the policy
key returns a longer value (about 1.5x increase in length for every subsequent request). After a few requests the size of the policy
gets really large (tens of MB) and the app breaks. If I restart the python service the policy
size gets reset.
After digging in the boto3
documentation and some threads in GitHub and here I couldn't find anything that helped me in regards to resetting the S3 connection without having to restart the whole python session. To keep restarting the apache2 service periodically is not a good approach, so my solution was to call the generate_presigned_url
from a standalone script using subprocess
and parse the string output back to json before using it, which is not ideal, as I wish I didn't have to keep calling bash scripts from inside apache. The main functions I use follow bellow:
AWS_BUCKET_PARAMS = {'ACL': 'private'}
# connect to my linode's s3 bucket
def awsSign():
return boto3.client('s3', aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY, endpoint_url=AWS_ENDPOINT_URL)
# generate presigned post object for uploading files
def awsPostForm(file_path):
s3 = awsSign()
return s3.generate_presigned_post(AWS_BUCKET, file_path, AWS_BUCKET_PARAMS, [AWS_BUCKET_PARAMS], 1800)
# generate post object from external script
def awsPostFormTerminal(file_path):
from subprocess import Popen, PIPE
cmd = [ 'python3', '-c', f'from utils import awsPostForm; print(awsPostForm("{file_path}"))' ]
output = Popen( cmd, stdout=PIPE ).communicate()[0]
return json.loads(output.decode('utf-8').replace('\n', '').replace("'", '"'))
The problem happens regardless of calling awsSign()
one or many times for a list of files.
In short, I wish for a better way of retrieving subsequent post
forms from generate_presigned_url
in the same python session, without increasing the policy
on every new request. If there is a proper way to restart the boto3
connection, provide some parameters that I missed when setting the API calls or maybe it's something particular to the Linode's S3 object storage service.
I also posted the same question on stackoverflow.
If anyone can point me at the right direction I'll appreciate!
3 Replies
The policy is base64-encoded. I believe this is a JSON document.
Have you tried decoding the base64 policy after each request to identify what is different about it each time?
Is it possible one of the parameters you’re feeding in is somehow including details from a previous request?
Decoding the policy back did the job! Turns out the AWS_BUCKET_PARAMS
variable was altered by reference after passing through generate_presigned_post
. This way the requests were sending all returned data from the previous request as well. Copying the variable inside the function scope before sending the request did the job, now there are no duplications and the returned object's size is stable. Thanks!
@SilvanaNobre brilliant!
Thanks for letting us know, your solution may help someone else in the same situation!