How are data sources used in Terraform?

Good examples up there!

The main difference between Terraform data source, resource and variable is :

Resource: Provisioning of resources/infra on our platform. Create, Update and delete!

Variable Provides predefined values as variables on our IAC. Used by resource for provisioning.

Data Source: Fetch values from our infra/provider and and provides data for our resource to provision infra/resource.

Examples are well explained above :)


Data sources provide information about entities that are not managed by the current Terraform configuration.

This may include:

  • Configuration data from Consul
  • Information about the state of manually-configured infrastructure components

In other words, data sources are read-only views into the state of pre-existing components external to our configuration.

Once you have defined a data source, you can use the data elsewhere in your Terraform configuration.

For example, let's suppose we want to create a Terraform configuration for a new AWS EC2 instance. We want to use an AMI image which were created and uploaded by a Jenkins job using the AWS CLI, and not managed by Terraform. As part of the configuration for our Jenkins job, this AMI image will always have a name with the prefix app-.

In this case, we can use the aws_ami data source to obtain information about the most recent AMI image that has the name prefix app-.

data "aws_ami" "app_ami" {
  most_recent = true
  filter {
    name   = "name"
    values = ["app-*"]
  }
}

Data sources export attributes, just like resources do. We can interpolate these attributes using the syntax data.TYPE.NAME.ATTR. In our example, we can interpolate the value of the AMI ID as data.aws_ami.app_ami.id, and pass it as the ami argument for our aws_instance resource.

resource "aws_instance" "app" {
  ami           = "${data.aws_ami.app_ami.id}"
  instance_type = "t2.micro"
}

Data sources are most powerful when retrieving information about dynamic entities - those whose properties change value often. For example, the next time Terraform fetches data for our aws_ami data source, the value of the exported attributes may be different (we might have built and pushed a new AMI).

Variables are used for static values, those that rarely changes, such as your access and secret keys, or a standard list of sudoers for your servers.


Data sources can be used for a number of reasons; but their goal is to do something and then give you data.

Let's take the example from their documentation:

# Find the latest available AMI that is tagged with Component = web
data "aws_ami" "web" {
  filter {
    name   = "state"
    values = ["available"]
  }

  filter {
    name   = "tag:Component"
    values = ["web"]
  }

  most_recent = true
}

This uses the aws_ami data source - this is different than a resource! It will instead just give you information, and not create anything. This example in particular will call out to the describe-images AWS API call, pass in a few --filter options as specified, and return an object that you can get information from - take a look at these attributes!

  • name
  • owner_id
  • description
  • image_id

... The list goes on. This is really useful if I were, let's say - always wanting to pull the latest AMI matching some tags, and keep a launch configuration up to date with it. I could use this data provider rather than always have to update a variable or hard-code the ID.

Data source can be used for other reasons as well; one of my favorites is the template provider.

Good luck!