Why can't I connect AWS RDS instance from EC2 instance in another VPC after peering

VPC Peering works much the same way as how Public Subnets connect to the Internet Gateway -- the Route Tables define how traffic goes in/out of the Subnets.

For VPC Peering to work:

  • Invite & Accept the peering connection (Done)
  • Create a Route table in each VPC that points to the Peering connection for the other VPC's IP range (Done)
  • Associate each subnet that you want able to peer to the Route Table
  • Alternatively, edit existing route tables to include the peering entry
  • If your RDS database is public, and you are attempting to connect using the public DNS of the database, then you will need to edit the DNS settings of your peering connection to allow DNS resolution.

The routing works as follows:

  • When traffic leaves a subnet, the Route Table is consulted to determine where to send the traffic
  • The most restrictive (eg /24) is evaluated first, through to the least restrictive (eg /0)
  • The traffic is routed according to the appropriate Route Table entry

This means that you can configure some of the subnets to peer, rather than having to include all of them. Traditionally, it is the Private subnets that peer and possibly only specific Private subnets -- but that is totally your choice.

Think of it as directions on a roadmap, telling traffic where it should be directed.


Below are the steps to make private RDS accessible via VPC peering:

Let’s say you have 2 VPCs:

  • Production VPC: 10.0.1.0/24
  • RDS VPC: 10.0.2.0/24

Step 1: create VPC peering connection between the two VPCs. Then accept the request to establish the connection. You will get a connection ID such as: pcx-e8e8e8e8

Step 2: configure route table in each VPC

  • Production VPC: add this route to RDS VPC: 10.0.2.0/24 —> pcx-e8e8e8e8
  • RDS VPC: add this route to Production VPC: 10.0.1.0/24 —> pcx-e8e8e8e8

Step 3: configure security group of RDS to accept the IP range of Production VPC, by adding this inbound rule

  • Port (MS SQL: 1433, MySQL: 3306, etc) — allow source: 10.0.1.0/24

Should be ready for connection now.

Note: when connecting to RDS, you should use the provided DNS name for better resiliency. AWS VPC DNS will take care of resolving this name to a local IP address of the RDS instance.


VPC peering is all about the details. Here are the items we had to run down to get it to work.

Peer VPC 1 to VPC 2 (obvious, but included for those that did not do this step). From VPC 1, establish peering to VPC 2. Accept request. If different region, switch to VPC 2 region and accept the peer request.

Examples:

VPC 1 CIDR = 10.0.0.0/16
VPC 2 CIDR = 172.16.0.0/16

VPC 1 (VPC With RDS Instance)
1. Route Table Servicing Subnet of RDS Instance - Add route destination to VPC 2 CIDR block (172.16.0.0/16) and target VPC 2 peering connection (select from list - pcx-#####).
2. RDS Security Group - Add inbound rule for DB port with source IP being the VPC 2 CIDR block (172.16.0.0/16). So, you will have two inbound rules for the DB port. One for the VPC 1 (10.0.0.0/16) CIDR Block and one for VPC 2 (172.16.0.0/16).
3. Network Access Control List for Private Route Table - if you are only allowing certain ports, add a rule for the DB Port, source = VPC 2 CIDR block (172.16.0.0/16) and Allow.

VPC 2
1. Route Table Servicing Subnet of EC2 Instance - Add route destination to VPC 1 CIDR block (10.0.0.0/16) and target VPC 1 peering connection (select from list - pcx-#####).
2. Instance Security Group - Add inbound rule for DB port with source IP being the VPC 1 CIDR block (10.0.0.0/16).
3. Network Access Control List for Route Table - if you are only allowing certain ports, add a rule for the DB Port, source = VPC 1 CIDR block (10.0.0.0/16) and Allow.

I think that was it - but if I find another setting, I will update this message.

Just some history, we were doing this for disaster recovery. Our production instances and RDS MS SQL DB are in us-east-1 (VPC 1) and our disaster recovery warm standby instances are in us-west-2 (VPC 2). We mostly get traffic from the US, but we may consider making the standby site a true production copy (scaling group) and then changing the Route 5 records to latency based routing.