Processing AWS Lambda messages in Batches

For Node.js, check out https://www.npmjs.com/package/@middy/sqs-partial-batch-failure.

const middy = require('@middy/core')
const sqsBatch = require('@middy/sqs-partial-batch-failure')

const originalHandler = (event, context, cb) => {
  const recordPromises = event.Records.map(async (record, index) => { /* Custom message processing logic */ })
  return Promise.allSettled(recordPromises)
}

const handler = middy(originalHandler)
  .use(sqsBatch())

Check out https://medium.com/@brettandrews/handling-sqs-partial-batch-failures-in-aws-lambda-d9d6940a17aa for more details.


As of Nov 2019, AWS has introduced the concept of Bisect On Function Error, along with Maximum retries. If your function is idempotent this can be used.

In this approach you should throw an error from the function even if one item in the batch is failing. AWS with split the batch into two and retry. Now one half of the batch should pass successfully. For the other half the process is continued till the bad record is isolated.


There's an excellent article here. The relevant parts for you are...

  • Using a batchSize of 1, so that messages succeed or fail on their own.
  • Making sure your processing is idempotent, so reprocessing a message isn't harmful, outside of the extra processing cost.
  • Handle errors within your function code, perhaps by catching them and sending the message to a dead letter queue for further processing.
  • Calling the DeleteMessage API manually within your function after successfully processing a message.

The last bullet point is how I've managed to deal with the same problem. Instead of returning errors immediately, store them or note that an error has occurred, but then continue to handle the rest of the messages in the batch. At the end of processing, return or raise an error so that the SQS -> lambda trigger knows not to delete the failed messages. All successful messages will have already been deleted by your lambda handler.

sqs = boto3.client('sqs')

def handler(event, context):
    failed = False

    for msg in event['Records']:
        try:
            # Do something with the message.
            handle_message(msg)
        except Exception:
            # Ok it failed, but allow the loop to finish.
            logger.exception('Failed to handle message')
            failed = True
        else:
            # The message was handled successfully. We can delete it now.
            sqs.delete_message(
                QueueUrl=<queue_url>,
                ReceiptHandle=msg['receiptHandle'],
            )

    # It doesn't matter what the error is. You just want to raise here
    # to ensure the trigger doesn't delete any of the failed messages.
    if failed:
        raise RuntimeError('Failed to process one or more messages')

def handle_msg(msg):
    ...