When using modules, it’s important to separate the code from the input data. A module written for a single target node may work fine with explicit data within the code; however, it won’t be usable on other systems without changes to the code.
If the data resides within the code, you’ll find yourself constantly going back to hack if/else conditions into the code for each necessary difference. I’m sure you’ve done this before, or may even have to do this now to maintain scripts you use today. This chapter will introduce a better way.
Moving the data (values) out of the code (manifest) creates reusable blocks of code that can implement configurable, data-driven policy.
Hiera is a key/value lookup tool for configuration data. Puppet uses Hiera to dynamically look up configuration data for Puppet manifests.
Hiera allows you to provide node-specific data to a Puppet module to create a customized policy for the node. Hiera utilizes a configurable hierarchy of information that allows you to tune Hiera appropriately for how information is structured within your organization.
For example, at a small company, you may organize your data in this way:
A much larger organization might have a hierarchy such as the following:
The multilevel hierarchy can be used to merge common data with node and environment-specific overrides, making it easy to utilize the same shared code throughout a diverse organization.
Hiera allows data to be provided by pluggable backends. Each data source in the hierarchy can select which backend to use. This allows you to supply data to Puppet with any file format or code you desire.
In this section, we’ll go over the built-in backends and what the file formats look like.
There are three data backends built into Hiera:
yaml_datajson_datahocon_dataThe built-in Hiera backends support five data types:
StringNumberBoolean (true/false)ArrayHashLet’s review how to utilize these data types in each backend.
The easiest and most common way to provide data to Hiera by utilizing the YAML file format.
Files in YAML format always start with three dashes by themselves on the first line. The YAML format utilizes indentation to indicate the relationships between data. YAML should always be written using spaces for indentation (do not use tabs).
Here are some examples of strings, boolean, arrays, and hashes in YAML:
# stringagent_running:'running'# booleanagent_atboot:true# arraypuppet_components:-facter-puppet# a hash of valuespuppet:ensure:'present'version:'4.10.9'# A variable lookuphostname:%{facts.hostname}
As this data is all about managing the Puppet agent, why don’t we organize this within a single hash? That could look as simple as this:
puppet:ensure:'present'version:'4.10.9'agent:running:'running'atboot:truecomponents:-'facter'-'puppet'
As you can see, YAML provides a human-friendly, readable way to provide data without too much syntax. You can find out more about YAML at the Yaml Cookbook for Ruby site.
running, facter, and puppet in the preceding example would be correctly interpreted as strings without the quotes. However, the rules for when to quote strings in YAML are many and often subtle. It’s better to be safe than sorry.As is common with almost every use of JSON, the root of each data source must be a single hash. Each key within the hash names a piece of configuration data. Each value within the hash can be any valid JSON data type.
Our YAML example rewritten in JSON format would look like the following (this example shows values of a string, a boolean, and an array of strings):
{"puppet":{"ensure":"present","version":"4.10.9","agent":{"running":"running","atboot":true},"components":["facter","puppet"]}}
JSON is a very strict format implemented for programmatic input and output. As such, it is friendly to parsers and less so to humans.
You can find complete details of the JSON data format at the Introducing JSON site.
Human-Optimized Config Object Notation (HOCON) keeps the semantics of JSON, while attemping to make it more convenient as a human-editable file format.
In particular, HOCON allows the following things that JSON format does not (verbatim from the HOCON specification):
Our JSON example would work perfectly if supplied to the HOCON backend. The following snippet shows HOCON features illegal in the JSON backend:
{"puppet":{# parameter values supplied to the puppet classensure:present,# simple strings unquoted as in YAML"version":${PUPPET_VERSION},# environment variable
You can find complete details of the HOCON file format at the HOCON informal specification site.
All Hiera data can be interpolated, enabling use of Puppet variables or functions to supply data within a Hiera value. Interpolation is performed on any value prefixed by % and surrounded by curly braces {}.
To return the value of a Puppet variable, place the variable name within the braces—for example, hostname: "%{facts.hostname}".
Functions can be invoked within the interpolation braces as well: hours: "%{split('1:2:3',':')}".
Puppet looks for a Hiera configuration file at the location specified by the hiera_config configuration variable. By default, this is ${confdir}/hiera.yaml, or /etc/puppetlabs/puppet/hiera.yaml in Puppet 4.
Puppet 4.9 and above use the Hiera version 5 format, although it will successfully parse the older version 3 files. All settings other than the version are optional and fall back to default values. Let’s review these settings now.
The defaults hash can define a default backend, data path, and options for any level in the hierarchy that does not supply them. This can greatly reduce repetition within the file.
defaults:datadir:data# relative directory pathdata_hash:yaml_data# expect a hash of results from the YAML parser
The hierarchy key defines an array containing the data sources in priority order to query for values. Each entry of the array defines a data source, format, and query options.
One of the most powerful features of Hiera is the interpolation of a node’s data, such as the hostname or operating system of the node, to select the source file location. In a larger enterprise, the data lookup hierarchy could be quite complex; however, I recommend the following for a good starting point:
You would implement this hierarchy using the following configuration syntax (as you can see, we are interpolating data provided by Facter to choose which files will be read):
hierarchy:-name:"Node specific values"path:"fqdn/%{facts.fqdn}.yaml"-name:"OS-specific values"path:"os/%{facts.os.family}.yaml"-name:"common"path:"common.yaml"
All paths are relative to the datadir of the hierarchy entry, or the value provided in the defaults hash.
path must contain the file extension, which is different than previous versions of Hiera configuration format—even the version 4 format used up through Puppet 4.8 assume the file extension. This format provides more flexibility.Unlike previous versions of Hiera, data can be sourced from multiple files in a single hierarchy level. These are the hash keys that can be used to identify data locations:
pathpathsglobglobsmapped_pathsThese options are exclusive; only one can be used in each hierarchy entry. Each of these values can contain interpolation, as shown in the previous example. Here are examples demonstrating each:
hierarchy:# Read files specific to the host from multiple paths-name:"Node specific values"paths:["infra/fqdn/%{facts.fqdn}.yaml","product/fqdn/%{facts.fqdn}.yaml"]# Read every YAML file in the OS directory-name:"OS-specific values"glob:"os/%{facts.os.family}/*.yaml"# Recursively read every YAML file in the service and provider directories-name:"Service value tree"globs:["team1/%{facts.team1}/**/*.yaml","team2/%{facts.team2}/**/*.yaml"]# Read values from an array of services-name:"multiple services"mapped_paths:[%{facts.services_array},name,"service/%{name}.yaml"]
Details of Ruby glob patterns can be found in the documentation for Ruby’s Dir.glob method. The mapped_paths function can only be found in the PUP-7204 JIRA issue that created it.
Each entry in the hierarchy array can have a hash of configuration options assigned in an optional options key. The options are specific to the backend; refer to the backend’s documentation for more details.
Following is an example of an options hash with the private and public keys used by the EYAML custom provider:
-name:"encrypted secrets"lookup_key:eyaml_lookup_keypath:"secrets/%{facts.domain}.eyaml"options:pkcs7_public_key:/etc/puppetlabs/puppet/eyaml/public_key.pkcs7pkcs7_private_key:/etc/puppetlabs/puppet/eyaml/private_key.pkcs7
In previous versions of Puppet, you could only set merge strategy globally using a single :merge_behavior configuration key. This was not flexible enough for even basic use cases. Hiera provides the ability to change the merge strategy on a per-lookup basis.
For single values, Hiera will proceed through the hierarchy until it finds a value (at which point, it will stop). For arrays and hashes, Hiera will merge data from each level of the hierarchy, as configured by the merge strategy selected for that lookup key. The following merge strategies are supported:
first (default)—formerly known as priorityhash—formerly known as nativedeep—formerly known as deeperunique—formerly known as arrayPuppet allows setting the merge strategy on a per-lookup basis, using the following two methods:
lookup_options hash of the data source.lookup_options hash in the data source. The values assigned to the key are lookup options, including the merge strategy.lookup() functionlookup() function call will override the lookup_options provided in the Hiera configuration file.We’ll provide examples of using both methods in “Looking Up Hiera Data”.
Following is a complete example of a Hiera configuration file. This example is what we will use for the rest of testing code within this book. It enables YAML data input from /etc/puppetlabs/code/hieradata, which allows us to share the same Hiera data across all environments.
The following hierarchy prioritizes host-specific values, followed by operating system values, and defaulting to the lowest priority values common to every host:
---version:5defaults:# for any hierarchy level without these keysdatadir:data# directory name inside the environmentdata_hash:yaml_datahierarchy:-name:"Hostname"path:"hostname/%{trusted.hostname}.yaml"-name:"OS-specific values"path:"os/%{facts.os.family}.yaml"-name:"common"path:"common.yaml"
Let’s go ahead and add this file now to your Puppet code directory:
[vagrant@client~]$cp/vagrant/etc-puppet/hiera.yaml/etc/puppetlabs/puppet/
There are several ways to validate whether your data is defined correctly in Hiera. For the following tests, create a Hiera data file containing values to control the Puppet service (this will be much like the Puppet manifest we created in Part I):
$mkdir/etc/puppetlabs/code/hieradata$$EDITOR/etc/puppetlabs/code/hieradata/common.yaml
Within this file, place the following values:
---puppet::status:'running'puppet::enabled:true
Now, let’s set up an override for this host. Create a file in the /etc/puppetlabs/code/hieradata/hostname/ directory with the name of the node:
$mkdir/etc/puppetlabs/code/hieradata/hostname$facterhostnameclient$$EDITOR/etc/puppetlabs/code/hieradata/hostname/client.yaml
In this file, place the following values:
---puppet::status:'stopped'puppet::enabled:false
You can utilize the hiera command-line tool to test lookups of Hiera data. Unfortunately, this tool doesn’t retrieve values in the same manner as Puppet:
[vagrant@client~]$hierapuppet::enabledtrue[vagrant@client~]$hierapuppet::statusrunning
Wait, weren’t these values changed for the client host? Yes, they were. But the hiera command-line tool doesn’t have facts and other configuration data available from Puppet. As a result, it didn’t properly interpret the hierarchy that uses filenames derived from facts. Thus, it has returned the only values it knows to find: the default values from the common.yaml file.
For this reason, it is significantly easier and more accurate to test Hiera values by using the lookup() function to query Hiera data, as follows:
[vagrant@client~]$puppetapply-e"notice(lookup('puppet::enabled'))"Notice:Scope(Class[main]):falseNotice:Compiledcatalogforclient.example.cominenvironmentproductionNotice:Appliedcatalogin0.01seconds
Better yet, it is possible to test Puppet lookups directly from the command line without evaluating code:
[vagrant@clientcode]$puppetlookuppuppet::status---stopped...
$confdir and $codedir directories. Rerun with sudo and compare.Let’s modify one of our manifests to utilize Hiera data. First, create a manifest that contains variables for the configuration of the Puppet agent service. Name it something like hierasample.pp:
# Always set a default value when performing a Hiera lookup$status=lookup({name=>'puppet::status',default_value=>'running'})$enabled=lookup({name=>'puppet::enabled',default_value=>true})# Now the same code can be used regardless of the valueservice{'puppet':ensure=>$status,enable=>$enabled,}
Execute this manifest utilizing the Hiera data we created:
[vagrant@client~]$sudopuppetapply/vagrant/manifests/hierasample.ppNotice:Compiledcatalogforclient.example.cominenvironmentproductionNotice:/Stage/Main/Service[puppet]/ensure:ensurechanged'running'to'stopped'Notice:/Stage/Main/Service[puppet]/enable:enablechanged'true'to'false'Notice:Appliedcatalogin0.07seconds
This has stopped the Puppet service, and prevented it from starting at boot, due to the overrides we added for this specific hostname.
Now let’s comment out the host-specific override in hostname/client.yaml, and reapply the manifest:
[vagrant@client~]$rm/etc/puppetlabs/code/hieradata/hostname/client.yaml[vagrant@client~]$sudopuppetapplyhierasample.ppNotice:Compiledcatalogforclient.example.cominenvironmentproductionNotice:/Stage/Main/Service[puppet]/ensure:ensurechanged'stopped'to'running'Notice:/Stage/Main/Service[puppet]/enable:enablechanged'false'to'true'Notice:Appliedcatalogin0.07seconds
Now that the host-specific override has been removed, the default values we placed in common.yaml are applied.
For a more complex example, let’s test how data in hashes can be merged together. Using the example Hiera configuration we created in the previous section, define some users in the global common.yaml file, like so:
# common.yamlusers:jill:uid:1000home:'/home/jill'jack:uid:1001home:'/home/jack'
Let’s query this data source and confirm the global values. The following are the results for any system without any higher-priority overrides:
[vagrant@client~]$puppetlookupusers---jill:uid:1000home:"/home/jill"jack:uid:1001home:"/home/jack"
On the client system, we actually want home directories in /homes/ for some strange reason. In addition, we have a special local user, Jane. As we only want to change the user’s home directories, we might create a higher-priority hostname/client.yaml file with just the differences:
# hostname/client.yamlusers:jill:home:'/homes/jill'jack:home:'/homes/jack'jane:uid:999home:'/homes/jane'
So let’s test this out now with a command-line lookup() query. We’ll compare the results for a default match with a match for the client machine, as shown here:
[vagrant@client~]$puppetlookup--nodedefaultusers---jill:uid:1000home:"/home/jill"jack:uid:1001home:"/home/jack"[vagrant@client~]$puppetlookupusers---jill:home:"/homes/jill"jack:home:"/homes/jack"jane:uid:999home:"/homes/jane"
What happened to the user’s UIDs when run on the local client node? When searching for users, it found the jill and jack user keys in the hostname/client.yaml file. It accepted the value associated with that key, and ignored the unique sub-keys in the lower-priority global file. To get the same results with first or hash merging, you’d have to repeat all keys in the higher-priority file.
The --node option causes lookup() to request the facts from PuppetDB, which we haven’t discussed yet. Without PuppetDB, the query for puppet lookup --node client.example.com won’t return the same results, as the Hiera hierarchy depends on fact values. Leave off the --node option to use the local node’s facts in the hierarchy.
Let’s try a query that performs a recursive merge through both hashes and find unique keys at each level:
[vagrant@client~]$puppetlookupusers--mergedeep---jill:uid:1000home:"/homes/jill"jack:uid:1001home:"/homes/jack"jane:uid:999home:"/homes/jane"
As you can see, this result merged hash keys from all levels in the hierarchy. When keys matched ("home" for instance), it chose the higher-priority value.
As we know that users should be merged from all levels in the hierarchy, let’s set the merge strategy using the lookup_options hash in common.yaml, as shown here:
# common.yamllookup_options:users:merge:deep
By placing this configuration in the data, the deep merge strategy will be used by default for lookups of the users key:
[vagrant@client~]$puppetlookupusers---jill:uid:1000home:"/homes/jill"jack:uid:1001home:"/homes/jack"jane:uid:999home:"/homes/jane"
In this chapter, you created a global data repository that can provide data to customize the catalog of a node. You have configured this to use YAML data files from the global directory /etc/puppetlabs/code/hieradata, independent of the node’s environment.
This configuration is easy to maintain, and works well for nodes using puppet apply. It provides the source for global data parameters common to all environments.
In “Using Environment Data”, you will learn how to create environment-specific Hiera data hierarchies.