Enforcing Terraform Policies and Standards

Image for post
Image for post

Terraform is a very powerful tool that can be used to create infrastructure as code. It makes it very easy to build consistently repeatable infrastructure. Terraform has a huge API that allows for easy defining of almost anything that you can imagine that would exist in AWS, GCP, Azure, and various other providers.

The flexibility of Terraform is an excellent strength of the tool, until it comes time to code review. Not because it is difficult to view the code and understand, but because certain teams/people should not be defining particular resources. For example, if you have a pipeline that is designed to deploy IAM roles, then the terraform that is executed in that pipeline should not define things like VPC’s, EC2, etc.

I am going to explain a way (i’m sure there are other ways to do this) of enforcing policies and standards on terraform definitions that can be automatically executed in a CI pipeline (or manually, if you’re into that). This provides an easy way to block particular things from being defined and having that ‘block’ automatically enforced by a CI pipeline before the code gets merged and deployed.

Before I start, I would like to call out that there is a tool that Terraform provides called Sentinel that can be used for a similar purpose. From my understanding, Sentinel is really intended to be used in Terraform Cloud. I found a way to use the Terraform Sentinel Simulator to run tests using a cli, but unfortunately there isn’t a way to run the tests using a real terraform plan file. It depends on terraform plan mocks, which I don’t like. In my scenario, I don’t have a Terraform Cloud account, so I had to find a different solution.

The typical usage of Terraform for me in my CI pipelines is to execute terraform plan -out tfplan to generate a file containing a terraform plan (this contains the set of actions that will be executed by terraform to translate the current state of infrastructure into the defined state). This command shows all of the changes that are going to occur on stdout, and also stores the plan to a file called ‘tfplan’. After this, my CI pipelines ask me as a user to click a button to indicate whether or not I approve the changes to be applied. If I click approve then the pipeline executes terraform apply tfplan to apply the generated plan. The terraform apply command simply executes the tfplan, causing the defined infrastructure to actually be created/updated/destroyed. So, in simple terms, this is what my CI pipelines do:

  1. Execute terraform plan -out tfplan
  2. Ask for user approval/denial
  3. If user approves, execute terraform apply tfplan

Adding enforcement of terraform policies is made pretty easy by the fact that the terraform cli provides the show command to translate a plan file into json:terraform show -json tfplan . This generates json in a consistent format shown here. Using this json, I was able to create a simple app that runs validations on the json keys/values as a way to automate enforcement of particular attributes.

Here is an example of what a go app could look like that validates that no ‘aws_vpc’ resources have been defined (just a very simple validation):

package main

import (
"encoding/json"
"flag"
"fmt"
tfjson "github.com/hashicorp/terraform-json"
"io/ioutil"
"log"
"os"
)

func main() {
tfplan := flag.String("tfplan", "plan.json", "Path to tfplan json file to be analyzed")
flag.Parse()

// Read in the tfplan json
b, err := ioutil.ReadFile(*tfplan)
if err != nil {
panic(fmt.Errorf("failed to read file: %w", err))
}
var plan tfjson.Plan
json.Unmarshal(b, &plan)

// Define the checks that will be enforced
enforcements := map[string]func(plan tfjson.Plan) []string{
"cannot define aws_vpc resources": CheckVpcDefinition,
}

// Evaluate all defined enforcements
violations := map[string][]string{}
for rule, f := range enforcements {
violations[rule] = f(plan)
}

// Print out all violations
l := log.New(os.Stderr, "", 0)
for rule, addresses := range violations {
for _, a := range addresses {
l.Printf("%s (%s)\n", rule, a)
}
}
}

// CheckVpcDefinition will validate that no aws_vpc resources
// have been defined
func CheckVpcDefinition(plan tfjson.Plan) []string {
var addresses []string
for _, resourceChange := range plan.ResourceChanges {
if resourceChange.Type == "aws_vpc" {
addresses = append(addresses, resourceChange.Address)
}
}
return addresses
}

The application does the following:

  1. Read in the terraform plan json file specified with the tfplan flag
  2. Parse the json file into a tfjson.Plan object
  3. Defines which enforcements should be executed as a map containing a mapping of message -> func. (Message is a string that is printed with each violator that is returned by the func. The func defines a particular policy or standard and returns the address of each terraform resource that violates the defined condition)
  4. Executes all defined enforcements
  5. Prints a message to stderr for each violation containing the address of the offending resource, like:
cannot define aws_vpc resources (aws_vpc.test)

This example is very simple, but you could write your validations to check any attribute on any of the resources. For example, this would be a validation to ensure that no IAM policy resources are defined with ‘*’ permissions:

func CheckIamPolicyStar(plan tfjson.Plan) []string {
var addresses []string
for _, resourceChange := range plan.ResourceChanges {
if resourceChange.Type == "aws_iam_policy" {
addr := resourceChange.Address

var after map[string]interface{}
var ok bool
if after, ok = (resourceChange.Change.After).(map[string]interface{}); !ok {
continue
}

var p string
if p, ok = after["policy"].(string); !ok {
continue
}

var j Policy
// Unmarshal the policy that is nested as an escaped json string
json.Unmarshal([]byte(p), &j)

for _, s := range j.Statements {
if strings.ToUpper(s.Effect) == "ALLOW" {
for _, a := range s.Action {
if a == "*" {
addresses = append(addresses, addr)
}
}
}
}
}
}

return addresses
}
type Policy struct {
Statements []Statement `json:"Statement"`
}

type Statement struct {
Effect string `json:"Effect"`
Action Value `json:"Action"`
}

// Value is a custom slice of strings needed for unmarshalling IAM policies
type Value []string

// UnmarshalJSON is a custom unmarshaller that will allow a json Value to be defined either as a string or []string
func (value *Value) UnmarshalJSON(b []byte) error {

var raw interface{}
err := json.Unmarshal(b, &raw)
if err != nil {
return err
}

var p []string
// value can be string or []string, convert everything to []string
switch v := raw.(type) {
case string:
p = []string{v}
case []interface{}:
var items []string
for _, item := range v {
items = append(items, fmt.Sprintf("%v", item))
}
p = items
default:
return fmt.Errorf("invalid %s value element: allowed is only string or []string", value)
}

*value = p
return nil
}

In this example, I had to define the struct type for an IAM policy because Terraform has IAM policies typically defined using heredoc syntax as an escaped json string nested within one of the attributes of a terraform resource. For example, this would be a policy defined with ‘*’ permissions:

resource "aws_iam_policy" "example" {
name = "example-policy"
description
= "This is an example policy"
policy
= <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
]
}
EOF
}

In my case, I only care about particular parts of the IAM policy, so I defined the Policy struct type and Statement struct type to contain only those specific fields.

Now with all of my enforcements defined in my go app, I can run the app in my CI pipeline before asking a user to approve the deployment. The new basic flow of the CI pipeline would look like this:

  1. Execute terraform plan -out tfplan
  2. Execute terraform show -json tfplan > tfplan.json
  3. Execute app that checks if the defined Terraform complies with defined the defined enforcements: tfenforce -tfplan tfplan.json
  4. If any violations were detected then they will be printed to stdout and the CI pipeline should configured to terminate if this occurs
  5. If no violations were detected, then ask for user approval/denial
  6. If user approves, execute terraform apply tfplan

This is meant as an example of how terraform policies and standards can be automated. Feel free to post any suggestions to my approach.

Source code for my examples can be found here.

Written by

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store