AWS CloudFront - forward User-Agent but don't cache against it

You can use Lambda@Edge function (https://docs.aws.amazon.com/lambda/latest/dg/lambda-edge.html) assigned to your CloudFront distribution. You would need two functions:

  1. Viewer-Request event handler, that will read User-Agent header and copy it to e.g. X-My-User-Agent. Viewer-Request handler is invoked before the request from the client reaches your Cloudfront Distribution.
  2. Origin-Request event handler, that will read X-My-User-Agent and replace User-Agent. Origin-Request handler is invoked when Cloudfront did not find requested page in its cache and sends the request to the origin.

Please note that you should NOT add User-Agent to Cloudfront whitelist:

You can configure CloudFront to cache objects based on values in the Date and User-Agent headers, but we don't recommend it. These headers have a lot of possible values, and caching based on their values would cause CloudFront to forward significantly more requests to your origin.

Ref: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#request-custom-headers-behavior

Example of Viewer-Request handler (Lambda@Edge can be written only in NodeJS or Python, Ref: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-requirements-limits.html#lambda-requirements-lambda-function-configuration):

'use strict';

exports.handler = (event, context, callback) => {
  const request = event.Records[0].cf.request;
  const headers = request.headers;
  const customUserAgentHeaderName = 'X-My-User-Agent';
  const userAgent = headers['user-agent'][0].value;

  headers[customUserAgentHeaderName.toLowerCase()] = [
    {
      key: customUserAgentHeaderName,
      value: userAgent
    }
  ];


  callback(null, request);
};

Example of Origin-Request handler:

'use strict';

exports.handler = (event, context, callback) => {
  const request = event.Records[0].cf.request;
  const headers = request.headers;
  const customUserAgentHeaderName = 'X-My-User-Agent';
  const realUserAgent = headers[customUserAgentHeaderName.toLowerCase()][0].value;

  headers['user-agent'] = [
    {
      key: 'User-Agent',
      value: realUserAgent
    }
  ];


  callback(null, request);
};

If the requests are cached across different user-agents, in case of a hit, the real-user agent will not be passed to the origin at all. CloudFront will just return the cached response.

You mentioned that you like to send the user-agent information to Elasticsearch. Unless you are only interested in the requests that are missed, you can not rely on the logs collected from the origin application.

If you have Lambda@Edge to send user-agent as realUserAgent, but the user-agent header is itself not a caching parameter, the origin will still not receive that data in case of a Miss.

The only solution that I see here, is to use the access logs generated from CloudFront. The CloudFront access logs contain not only user-agent but also IP addresses and other useful information. This data is logged for both Hit and Miss. It is also easy to set up a logstash to send this information to Elasticsearch.

  • [1] https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html
  • [2] https://aws.amazon.com/premiumsupport/knowledge-center/cloudfront-logs-elasticsearch/