Chapter 11. Separating Data from Code

When using modules, it’s important to separate the code from the input data. A module written for a single target node may work fine with explicit data within the code; however, it won’t be usable on other systems without changes to the code.

If the data resides within the code, you’ll find yourself constantly going back to hack if/else conditions into the code for each necessary difference. I’m sure you’ve done this before, or may even have to do this now to maintain scripts you use today. This chapter will introduce a better way.

Moving the data (values) out of the code (manifest) creates reusable blocks of code that can implement configurable, data-driven policy.

Introducing Hiera

Hiera is a key/value lookup tool for configuration data. Puppet uses Hiera to dynamically look up configuration data for Puppet manifests.

Hiera allows you to provide node-specific data to a Puppet module to create a customized policy for the node. Hiera utilizes a configurable hierarchy of information that allows you to tune Hiera appropriately for how information is structured within your organization.

For example, at a small company, you may organize your data in this way:

  1. Company-wide common data
  2. Operating system–specific changes
  3. Site-specific information

A much larger organization might have a hierarchy such as the following:

  1. Enterprise-level common data
  2. Company specifics
  3. Division overrides
  4. Production/staging/QA/development
  5. Region (US, EU, Asia)-specific changes
  6. Operating system–specific configurations
  7. Cluster-specific changes
  8. Application-specific details

The multilevel hierarchy can be used to merge common data with node and environment-specific overrides, making it easy to utilize the same shared code throughout a diverse organization.

Selecting a Data Source

Hiera allows data to be provided by pluggable backends. Each data source in the hierarchy can select which backend to use. This allows you to supply data to Puppet with any file format or code you desire.

In this section, we’ll go over the built-in backends and what the file formats look like.

Selecting a Hiera Backend

There are three data backends built into Hiera:

yaml_data
Parses YAML format files
json_data
Parses JSON format files
hocon_data
Parses HOCON format files
Tip
Custom backends can be added to Hiera, as documented in “Backend configuration” and “Using Custom Backends in Environments”.

Creating Hiera Data

The built-in Hiera backends support five data types:

  • String
  • Number
  • Boolean (true/false)
  • Array
  • Hash

Let’s review how to utilize these data types in each backend.

Hiera data in YAML

The easiest and most common way to provide data to Hiera by utilizing the YAML file format.

Files in YAML format always start with three dashes by themselves on the first line. The YAML format utilizes indentation to indicate the relationships between data. YAML should always be written using spaces for indentation (do not use tabs).

Here are some examples of strings, boolean, arrays, and hashes in YAML:

# string
agent_running: 'running'

# boolean
agent_atboot: true

# array
puppet_components:
  - facter
  - puppet

# a hash of values
puppet:
  ensure: 'present'
  version: '4.10.9'

# A variable lookup
hostname: %{facts.hostname}

As this data is all about managing the Puppet agent, why don’t we organize this within a single hash? That could look as simple as this:

puppet:
  ensure: 'present'
  version: '4.10.9'
  agent:
    running: 'running'
    atboot: true
  components:
    - 'facter'
    - 'puppet'

As you can see, YAML provides a human-friendly, readable way to provide data without too much syntax. You can find out more about YAML at the Yaml Cookbook for Ruby site.

Tip
It is not always necessary to quote strings in YAML. The words running, facter, and puppet in the preceding example would be correctly interpreted as strings without the quotes. However, the rules for when to quote strings in YAML are many and often subtle. It’s better to be safe than sorry.

Hiera data in JSON

As is common with almost every use of JSON, the root of each data source must be a single hash. Each key within the hash names a piece of configuration data. Each value within the hash can be any valid JSON data type.

Our YAML example rewritten in JSON format would look like the following (this example shows values of a string, a boolean, and an array of strings):

{
  "puppet": {
    "ensure": "present",
    "version": "4.10.9",
    "agent": {
      "running": "running",
      "atboot": true
    },
    "components": [
      "facter",
      "puppet"
    ]
  }
}
Warning
JSON requires that the final entry in any data structure does not end in a comma. This is different from Puppet, where a final comma presents no difficulty.

JSON is a very strict format implemented for programmatic input and output. As such, it is friendly to parsers and less so to humans.

You can find complete details of the JSON data format at the Introducing JSON site.

Hiera data in HOCON

Human-Optimized Config Object Notation (HOCON) keeps the semantics of JSON, while attemping to make it more convenient as a human-editable file format.

In particular, HOCON allows the following things that JSON format does not (verbatim from the HOCON specification):

  • Ability to refer to another part of the configuration (set a value to another value)
  • Import/include another configuration file into the current file
  • A mapping to a flat properties list such as Java’s system properties
  • Ability to get values from environment variables
  • Ability to write comments
Tip
Puppet has chosen HOCON format for the configuration format of many new products based on these features.

Our JSON example would work perfectly if supplied to the HOCON backend. The following snippet shows HOCON features illegal in the JSON backend:

{
  "puppet": {
    # parameter values supplied to the puppet class
    ensure: present, # simple strings unquoted as in YAML
    "version": ${PUPPET_VERSION}, # environment variable

You can find complete details of the HOCON file format at the HOCON informal specification site.

Puppet variable and function lookup

All Hiera data can be interpolated, enabling use of Puppet variables or functions to supply data within a Hiera value. Interpolation is performed on any value prefixed by % and surrounded by curly braces {}.

To return the value of a Puppet variable, place the variable name within the braces—for example, hostname: "%{facts.hostname}".

Functions can be invoked within the interpolation braces as well: hours: "%{split('1:2:3',':')}".

Configuring Hiera

Puppet looks for a Hiera configuration file at the location specified by the hiera_config configuration variable. By default, this is ${confdir}/hiera.yaml, or /etc/puppetlabs/puppet/hiera.yaml in Puppet 4.

Puppet 4.9 and above use the Hiera version 5 format, although it will successfully parse the older version 3 files. All settings other than the version are optional and fall back to default values. Let’s review these settings now.

Version

The version setting must exist, and must be set to 5.

Defaults

The defaults hash can define a default backend, data path, and options for any level in the hierarchy that does not supply them. This can greatly reduce repetition within the file.

defaults:
  datadir: data          # relative directory path
  data_hash: yaml_data   # expect a hash of results from the YAML parser

Hierarchy

The hierarchy key defines an array containing the data sources in priority order to query for values. Each entry of the array defines a data source, format, and query options.

One of the most powerful features of Hiera is the interpolation of a node’s data, such as the hostname or operating system of the node, to select the source file location. In a larger enterprise, the data lookup hierarchy could be quite complex; however, I recommend the following for a good starting point:

  1. Put default values in a file named common.yaml.
  2. Put all operating system–specific information in a file named for the OS family as returned by Facter (e.g., RedHat.yaml, Debian.yaml, FreeBSD.yaml)
  3. Put information specific to a single node within a file named the fully qualified domain name with a .yaml extension.

You would implement this hierarchy using the following configuration syntax (as you can see, we are interpolating data provided by Facter to choose which files will be read):

hierarchy:
  - name: "Node specific values"
    path: "fqdn/%{facts.fqdn}.yaml"

  - name: "OS-specific values"
    path: "os/%{facts.os.family}.yaml"

  - name: "common"
    path: "common.yaml"

All paths are relative to the datadir of the hierarchy entry, or the value provided in the defaults hash.

Warning
The path must contain the file extension, which is different than previous versions of Hiera configuration format—even the version 4 format used up through Puppet 4.8 assume the file extension. This format provides more flexibility.

Paths

Unlike previous versions of Hiera, data can be sourced from multiple files in a single hierarchy level. These are the hash keys that can be used to identify data locations:

path
A single file name.
paths
An array of file name.
glob
A Ruby glob pattern that may match multiple files.
globs
An array of Ruby glob patterns, each of which may match multiple files.
mapped_paths
Iterates over an array or hash of values to assemble multiple paths.

These options are exclusive; only one can be used in each hierarchy entry. Each of these values can contain interpolation, as shown in the previous example. Here are examples demonstrating each:

hierarchy:
  # Read files specific to the host from multiple paths
  - name: "Node specific values"
    paths: ["infra/fqdn/%{facts.fqdn}.yaml", "product/fqdn/%{facts.fqdn}.yaml"]

  # Read every YAML file in the OS directory
  - name: "OS-specific values"
    glob: "os/%{facts.os.family}/*.yaml"

  # Recursively read every YAML file in the service and provider directories
  - name: "Service value tree"
    globs: ["team1/%{facts.team1}/**/*.yaml", "team2/%{facts.team2}/**/*.yaml"]

  # Read values from an array of services 
  - name: "multiple services"
    mapped_paths: [%{facts.services_array}, name, "service/%{name}.yaml"]

Details of Ruby glob patterns can be found in the documentation for Ruby’s Dir.glob method. The mapped_paths function can only be found in the PUP-7204 JIRA issue that created it.

Backend configuration

Each entry in the hierarchy array can have a hash of configuration options assigned in an optional options key. The options are specific to the backend; refer to the backend’s documentation for more details.

Tip
No options are required by the built-in YAML, JSON, or HOCON backends.

Following is an example of an options hash with the private and public keys used by the EYAML custom provider:

  - name: "encrypted secrets"
    lookup_key: eyaml_lookup_key
    path: "secrets/%{facts.domain}.eyaml"
    options:
      pkcs7_public_key: /etc/puppetlabs/puppet/eyaml/public_key.pkcs7
      pkcs7_private_key: /etc/puppetlabs/puppet/eyaml/private_key.pkcs7
Note
This provider is not included in Puppet 4 by default, but will be available in Puppet 5. You can find installation instructions at https://github.com/voxpupuli/hiera-eyaml.

Merge Strategy

In previous versions of Puppet, you could only set merge strategy globally using a single :merge_behavior configuration key. This was not flexible enough for even basic use cases. Hiera provides the ability to change the merge strategy on a per-lookup basis.

For single values, Hiera will proceed through the hierarchy until it finds a value (at which point, it will stop). For arrays and hashes, Hiera will merge data from each level of the hierarchy, as configured by the merge strategy selected for that lookup key. The following merge strategies are supported:

first (default)—formerly known as priority
Returns the first value found, with no merging. Keys found at the higher priority will return the value from that priority level without any recursion.
hash—formerly known as native
Merge keys only. The values from the higher priority match will be used exclusively, without any values from lower priority keys.
deep—formerly known as deeper
Recursively merge array and hash values. If a key exists at multiple levels, the lower-priority values that don’t conflict will be merged with the higher-priority values.
unique—formerly known as array
Flatten array and scalar values from all priority levels into a single list. Duplicate values will be dropped. Hashes will cause an error.

Puppet allows setting the merge strategy on a per-lookup basis, using the following two methods:

An entry in the lookup_options hash of the data source.
A key the lookup_options hash in the data source. The values assigned to the key are lookup options, including the merge strategy.
Options provided in the lookup() function
Any options used in a lookup() function call will override the lookup_options provided in the Hiera configuration file.

We’ll provide examples of using both methods in “Looking Up Hiera Data”.

Complete Example

Following is a complete example of a Hiera configuration file. This example is what we will use for the rest of testing code within this book. It enables YAML data input from /etc/puppetlabs/code/hieradata, which allows us to share the same Hiera data across all environments.

Tip
When the environments are distinct only to test code, use a shared Hiera path for ease of data management.

The following hierarchy prioritizes host-specific values, followed by operating system values, and defaulting to the lowest priority values common to every host:

---
version: 5
defaults:        # for any hierarchy level without these keys
  datadir: data  # directory name inside the environment
  data_hash: yaml_data

hierarchy:
  - name: "Hostname"
    path: "hostname/%{trusted.hostname}.yaml"

  - name: "OS-specific values"
    path: "os/%{facts.os.family}.yaml"

  - name: "common"
    path: "common.yaml"

Let’s go ahead and add this file now to your Puppet code directory:

[vagrant@client ~]$ cp /vagrant/etc-puppet/hiera.yaml /etc/puppetlabs/puppet/

Looking Up Hiera Data

There are several ways to validate whether your data is defined correctly in Hiera. For the following tests, create a Hiera data file containing values to control the Puppet service (this will be much like the Puppet manifest we created in Part I):

$ mkdir /etc/puppetlabs/code/hieradata
$ $EDITOR /etc/puppetlabs/code/hieradata/common.yaml

Within this file, place the following values:

---
puppet::status: 'running'
puppet::enabled: true

Now, let’s set up an override for this host. Create a file in the /etc/puppetlabs/code/hieradata/hostname/ directory with the name of the node:

$ mkdir /etc/puppetlabs/code/hieradata/hostname
$ facter hostname
client
$ $EDITOR /etc/puppetlabs/code/hieradata/hostname/client.yaml

In this file, place the following values:

---
puppet::status: 'stopped'
puppet::enabled: false

Checking Hiera Values from the Command Line

You can utilize the hiera command-line tool to test lookups of Hiera data. Unfortunately, this tool doesn’t retrieve values in the same manner as Puppet:

[vagrant@client ~]$ hiera puppet::enabled
true
[vagrant@client ~]$ hiera puppet::status
running

Wait, weren’t these values changed for the client host? Yes, they were. But the hiera command-line tool doesn’t have facts and other configuration data available from Puppet. As a result, it didn’t properly interpret the hierarchy that uses filenames derived from facts. Thus, it has returned the only values it knows to find: the default values from the common.yaml file.

For this reason, it is significantly easier and more accurate to test Hiera values by using the lookup() function to query Hiera data, as follows:

[vagrant@client ~]$ puppet apply -e "notice(lookup('puppet::enabled'))"
Notice: Scope(Class[main]): false
Notice: Compiled catalog for client.example.com in environment production
Notice: Applied catalog in 0.01 seconds

Better yet, it is possible to test Puppet lookups directly from the command line without evaluating code:

[vagrant@client code]$ puppet lookup puppet::status
--- stopped
...
Note
If you don’t get the values back that you expect, it may be because you don’t have the personal configuration file installed that points at the system $confdir and $codedir directories. Rerun with sudo and compare.

Performing Hiera Lookups in a Manifest

Let’s modify one of our manifests to utilize Hiera data. First, create a manifest that contains variables for the configuration of the Puppet agent service. Name it something like hierasample.pp:

# Always set a default value when performing a Hiera lookup
$status = lookup({ name => 'puppet::status',  default_value => 'running' })
$enabled = lookup({ name => 'puppet::enabled', default_value => true })

# Now the same code can be used regardless of the value
service { 'puppet':
  ensure => $status,
  enable => $enabled,
}

Execute this manifest utilizing the Hiera data we created:

[vagrant@client ~]$ sudo puppet apply /vagrant/manifests/hierasample.pp
Notice: Compiled catalog for client.example.com in environment production
Notice: /Stage/Main/Service[puppet]/ensure: ensure changed 'running' to 'stopped'
Notice: /Stage/Main/Service[puppet]/enable: enable changed 'true' to 'false'
Notice: Applied catalog in 0.07 seconds

This has stopped the Puppet service, and prevented it from starting at boot, due to the overrides we added for this specific hostname.

Now let’s comment out the host-specific override in hostname/client.yaml, and reapply the manifest:

[vagrant@client ~]$ rm /etc/puppetlabs/code/hieradata/hostname/client.yaml
[vagrant@client ~]$ sudo puppet apply hierasample.pp
Notice: Compiled catalog for client.example.com in environment production
Notice: /Stage/Main/Service[puppet]/ensure: ensure changed 'stopped' to 'running'
Notice: /Stage/Main/Service[puppet]/enable: enable changed 'false' to 'true'
Notice: Applied catalog in 0.07 seconds

Now that the host-specific override has been removed, the default values we placed in common.yaml are applied.

Testing Merge Strategy

For a more complex example, let’s test how data in hashes can be merged together. Using the example Hiera configuration we created in the previous section, define some users in the global common.yaml file, like so:

# common.yaml
users:
  jill:
    uid: 1000
    home: '/home/jill'
  jack:
    uid: 1001
    home: '/home/jack'

Let’s query this data source and confirm the global values. The following are the results for any system without any higher-priority overrides:

[vagrant@client ~]$ puppet lookup users
---
jill:
  uid: 1000
  home: "/home/jill"
jack:
  uid: 1001
  home: "/home/jack"

On the client system, we actually want home directories in /homes/ for some strange reason. In addition, we have a special local user, Jane. As we only want to change the user’s home directories, we might create a higher-priority hostname/client.yaml file with just the differences:

# hostname/client.yaml
users:
  jill:
    home: '/homes/jill'
  jack:
    home: '/homes/jack'
  jane:
    uid : 999
    home: '/homes/jane'

So let’s test this out now with a command-line lookup() query. We’ll compare the results for a default match with a match for the client machine, as shown here:

[vagrant@client ~]$ puppet lookup --node default users
---
jill:
  uid: 1000
  home: "/home/jill"
jack:
  uid: 1001
  home: "/home/jack"

[vagrant@client ~]$ puppet lookup users
---
jill:
  home: "/homes/jill"
jack:
  home: "/homes/jack"
jane:
  uid: 999
  home: "/homes/jane"

What happened to the user’s UIDs when run on the local client node? When searching for users, it found the jill and jack user keys in the hostname/client.yaml file. It accepted the value associated with that key, and ignored the unique sub-keys in the lower-priority global file. To get the same results with first or hash merging, you’d have to repeat all keys in the higher-priority file.

Warning

The --node option causes lookup() to request the facts from PuppetDB, which we haven’t discussed yet. Without PuppetDB, the query for puppet lookup --node client.example.com won’t return the same results, as the Hiera hierarchy depends on fact values. Leave off the --node option to use the local node’s facts in the hierarchy.

Let’s try a query that performs a recursive merge through both hashes and find unique keys at each level:

[vagrant@client ~]$ puppet lookup users --merge deep
---
jill:
  uid: 1000
  home: "/homes/jill"
jack:
  uid: 1001
  home: "/homes/jack"
jane:
  uid: 999
  home: "/homes/jane"

As you can see, this result merged hash keys from all levels in the hierarchy. When keys matched ("home" for instance), it chose the higher-priority value.

As we know that users should be merged from all levels in the hierarchy, let’s set the merge strategy using the lookup_options hash in common.yaml, as shown here:

# common.yaml
lookup_options:
  users:
    merge: deep

By placing this configuration in the data, the deep merge strategy will be used by default for lookups of the users key:

[vagrant@client ~]$ puppet lookup users
---
jill:
  uid: 1000
  home: "/homes/jill"
jack:
  uid: 1001
  home: "/homes/jack"
jane:
  uid: 999
  home: "/homes/jane"

Best Practice

Use the deep merge strategy for Don’t Repeat Yourself (DRY) data management.

Providing Global Data

In this chapter, you created a global data repository that can provide data to customize the catalog of a node. You have configured this to use YAML data files from the global directory /etc/puppetlabs/code/hieradata, independent of the node’s environment.

This configuration is easy to maintain, and works well for nodes using puppet apply. It provides the source for global data parameters common to all environments.

In “Using Environment Data”, you will learn how to create environment-specific Hiera data hierarchies.