Puppet code versus hiera data

Puppet code Ever since the introduction of the create_resources function, Puppet enabled us to store resource definitions in hiera and create them at run-time. This allowed us to define resources both in hiera data as well as in Puppet manifests. But when do you put something in hiera and when do you use regular Puppet code? In this blog post, we are going to give you some guidelines.

The history

But before we discuss this issue, let’s first dive into some history. The create_resources function was added to Puppet on the 16th of March 2011. If you are interested, you can check out the change here. What is interesting, is the last line of the commit text:

Now I can dynamically determine how webserver instances are deployed to nodes
by updating the YAML files.

So before this commit, it was pretty hard to get dynamic data into your Puppet code. This commit was a big step forward.

Strong points

What are the advantages of using create_resources? Well first of all, it allows you to specify different resources for different environments, without using conditional statements in your Puppet code. The hiera lookup hierarchy, allows you to return different values based on Puppet variable settings. This is mostly used to differentiate over different environments, different OS-es or different hardware architectures. Check out the Puppet documentation for hiera to look at some of the details. Before the create_resources, we could also do this, but we needed long and difficult to understand conditional statements.

Let’s see this in an example. We need to add different firewall rules based on the environment. Here is some contrived Puppet only code to do this:


Firewall{
  chain  => 'OUTPUT',
  proto  => 'tcp',
  state  => 'NEW',
  action =>'reject',
}
case $environment {
  'development': {
    # No firewall rule needed for development
    }
  } 
  'test': {
    firewall{'100 outbound Containment':
      dport:        8215
      action:       'log'
    }
  }
  'production': {
    firewall{'100 outbound Containment':
      dport:        8212
      action:       'reject'
    }
  }
}

When we would use hiera to store the environment specific resources in the different hiera hierarchies, the puppet code would become:

$rules = hiera('firewall_rules')
$defaults = {
  chain  => 'OUTPUT',
  proto  => 'tcp',
  state  => 'NEW',
  action =>'reject',
  
}
create_resources('firewall', $rules, $defaults)

Although in this example the differences aren’t that big, when the number of resources becomes big, the create_resources variant stays better readable.

An other use case for create_resources is for organizations have strict separation of duties. Splitting resources over different files and using create_resources to create them, is a way to implement this.

Weak points

But there are certainly not only strong points. When create_resources fails, sometimes you get pretty weird error messages and it is hard to track down:

  • What resource failed
  • Where the Puppet code was that used the create_resources function
  • What line of yaml code contained the actual resource
  • What yaml file was used.

In general applying a resource using the basic Puppet language gives you much clearer failure messages.

Also, the Puppet language is more versatile than yaml. With the introduction of the future parser, the language has become even stronger. So when you want to interpolate variables or calculate with variables, you can do this much easier in Puppet than when using hiera and yaml. Let’s see an example of this with interpolation. Here is some Puppet code using interpolation on a resource attribute:

$port   = 8215
$action = 'reject'

firewall{'100 outbound Containment':
  chain  => 'OUTPUT',
  proto  => 'tcp',
  state  => 'NEW',
  dport  => $port
  action => $action,
}

If we would like to do the same interpolation in hiera, it would look like this:

firewall_rules:
  '100 outbound Containment':
    chain: OUTPUT
    proto: tcp
    state: NEW
    dport:  "%{hiera('port')}"
    action: "%{hiera('action')}"

Again, on this small examples, the differences are not that big, but when your configurations grows, these %{hiera('...')} statements become tedious.

Our philosophy

In the next paragraphs, we are going to explain some guidelines we use to decide whether to put the code into Puppet or if we put them into hiera data.

Static resource

Does the resource belong to the static part of your infrastructure? If so, put your code into Puppet. By the static part, we mean its is the same no matter what the OS, hardware architecture, our environment is.

Time-sensitive

Is your resource sensitive to time? Then consider putting it into hiera data. Examples of time-sensitive resources, are versions of packages, the set of patches you want to apply to Oracle RDBMS or WebLogic. We prefer to put this data into hiera. By doing this, we make it easy to upgrade. Here is an example of a hiera yaml file containing all WebLogic patches we need to have applied:

opatch_instances:
   '17584181':
      ensure:                   present
      oracle_product_home_dir:  "/opt/oracle/middleware11g/Oracle_SOA1"
      patch_id:                 17584181
      patch_file:               p17584181_111170_Generic.zip
      remote_file:              false

When a new patch is needed, we add it to our hiera data. We might first want to test it one one node. So thenw e put it in the hiera for that one node.

Although not so clear cut as for patches, you can also choose to put major versions in your hiera data. Here is an example using connect.

include 'features/oracle11EE'  # Uncomment this line when you want to use Oracle 11.
#include 'features/oracle12EE' # Uncomment this line when you want to use Oracle 12.

In this sample, we include the settings for an Oracle 11 Enterprise Edition database. When you are ready to upgrade to Oracle 12, you comment the Oracle 11 line and uncomment the Oracle 12 line and you are done. To give you some feel for the actual data here is one of the included files.

with profile::ora::software:: do
  version     =  '11.2.0.4'
  file_name   = 'p13390677_112040_Linux-x86-64'
  type        = 'EE'
end

Because this kind of changes are less frequent, putting this kind of code into your Puppet code is also fine.

Environment specific resources

Is you resource environment specific? Then probably putting it into your hiera data is a good idea. Examples of environment specific data are:

  • The used LDAP server. Probably you have a different LDAP server in production than you have in test.
  • The size of your database. You will probably like a smaller database on your development systems than on your production environment.
  • You probably have different IP gateways based on the location and subnet you are in.

All these examples are excellent examples of settings to lookup in your hiera data.

Implementation settings

Something that is very similar to environment specific is implementation settings. They are candidates to put into hiera.

An example of implementation settings would be the disk groups used for Oracle ASM. You could have redundant disks in your disk groups, but also have configurations that don’t need these high available options. Here is an example in yaml of an HA setup for disk groups.

ora_rac::settings::asm_disk_groups:
  'RECODG@+ASM1':
    ensure:          present
    redundancy_type: normal
    disks:
      FAILGROUP1:
        - 
          diskname:     RECODG_001
          path:         ORCL:RECODG_001
      FAILGROUP2:
        -
          diskname:     RECODG_002
          path:         ORCL:RECODG_002

Conclusion

Like most real life situations, there are not stone set rules. There are some guidelines though. We use this flow chart do decide whether we want the resource in hiera or just put it into Puppet.

Flow Chart Puppet or Hiera

Check out our reference implementation for more examples on how to best choose between putting your code into Puppet or hiera.

Comments