Chapter 7. Expressing Relationships

This chapter focuses on metaparameters that create and manage relationships between resources.

After parsing all of the Puppet manifests, Puppet builds a dependency graph used to structure the application of changes. Relationships between resources control the order in which resources are evaluated.

Resource relationships and ordering are perhaps the most confusing topics for newcomers to Puppet. Most people are familiar with linear processing, controlled by the order expressed within the file. Puppet provides metaparameters to define dependencies to be handled within and between manifests. This is significantly more powerful than rigid, linear ordering for the following reasons:

Linear ordering is easy to write once, but difficult to maintain over time.
Linear ordering prevents code from easily extending common or shared code.
Targeted relationships allow for multiple dependencies beyond strict ordering.
Many-to-one relationships are considerably more powerful, albeit harder to learn.
Loose ordering allows isolation of dependences for failed resources.

You will appreciate the power and flexibility of Puppet’s resource ordering when you build a module that extends (or “wraps”) a community-provided module. For now, simply keep in mind that Puppet will process the resources by evaluating the dependency graph created from the metaparameters introduced in this chapter.

Managing Dependencies

There are situations where avoiding implicit dependencies of linear ordering can provide significant value.

In a linear ordering dependency evaluation, every succeeding statement is assumed to depend on the statement before it. In that case, nothing following in that script should be processed, as it is assumed to depend on the statement that failed.

In a scenario where the manifest has six operations listed in the order A → B → C → D → E → F, if A fails then should B through F not happen?

This would be undesirable if a resource early in the manifest was not essential to other resources in the manifest. By allowing you to explicitly declare dependencies, Puppet can enforce significantly more of the catalog. In the example just cited, it may be that only step F depends on A, so resources B through E can be processed.

During convergence, Puppet will evaluate the dependencies for each resource. If a dependency for a resource fails, neither it nor any resource that depends on it will be applied by Puppet. This is generally desirable behavior.

Puppet’s explicit dependency metaparameters provide for complex and powerful dependency management. Let’s show you how to use them.

Referring to Resources

As shown throughout all previous examples, a resource is declared using the resource type in lowercase, with the definition enclosed in curly braces:

package { 'puppet-agent':
  ensure => present,
}

Once the resource has been given a unique title, it is possible to refer to that resource by name. This is called a resource reference. In this chapter, we’re going to refer to specific resources quite often, so let’s describe how to do it. To create a resource reference, capitalize the first letter of the resource type and enclose the title in square brackets. For example, when referring to the preceding package resource, you’d use Package['puppet-agent']. Here’s an example that creates a service to run the Puppet agent:

service { 'puppet':
  ensure  => running,
  enabled => true,
  require => Package['puppet-agent'],
}

Remember: create a resource with the lowercase type, and refer to an existing resource with a capitalized first letter.

Tip

An easy way to remember this is the common name versus proper name rule of English. A park is a resource type, but Golden Gate Park is a specific instance—that is, a proper noun, the first letter of which is always capitalized.

Ordering Resources

In many situations, some resources must be applied before others. For example, you cannot start a service until after you install the package that contains the application. Here we will show you the before and require metaparameters you can use to ensure the package is installed before the service is started:

package { 'puppet':
  ensure  => present,
  before  => Service['puppet'],
}

service { 'puppet':
  ensure  => running,
  enable  => true,
  require => Package['puppet'],
}

The before and require metaparameters are redundant in this case. Either one would work by itself. Use the one that fits your manifest and is easiest to read. Belt-and-suspenders people like myself often use both when possible.

Tip

Ordering resources can be a trap. Many Puppet novices try to order every resource into a strict pattern, no matter whether the resources are truly dependent or not. This makes an implementation fragile. Adopt a less is more approach, and list only the necessary dependencies.

Assuming Implicit Dependencies

Many Puppet types define autorequire dependencies on other Puppet resources. For example, a directory will have implicit dependencies on a parent directory:

file { '/var/log':
  ensure  => directory,
}

file { '/var/log/puppet':
  ensure  => directory,
  autorequire => File['/var/log'],   # implicit dependencies
  autorequire => File['/var'],       # added by Puppet
}

The autorequire lines are not in the manifest; Puppet adds them automatically. However, unlike an explicit dependency, this dependency is soft, meaning that it only exists if the other resource is found within the catalog. If you haven’t explicitly defined a File['/var/log'] or File['/var'] resource, then no dependency will be added.

File resources also autorequire the user resource of the user who owns the file or directory:

user { 'jill':
  ensure => present,
  shell  => '/bin/bash',
}

file { '/home/jill':
  ensure  => directory,
  owner   => 'jill',
  require => User['jill'], # implicit dependency added by Puppet
}

In this situation, Puppet will always order the application of user jill before it tries to create the directory owned by her. You don’t need to set this dependency explicitly. If the user jill were not defined by a resource, then the file resource would not have a dependency and would blindly attempt to change the file ownership, with the assumption that the node already has an account for Jill.

Triggering Refresh Events

The before and require metaparameters ensure that dependencies are processed before resources that require them. However, these parameters do not link or provide data to the other resource.

The notify and subscribe metaparameters operate in a similar manner, but will also send a refresh event to the dependent resource if the dependency is changed. The dependent resource will take a resource-specific action. For example, a service would restart after the configuration file has been changed.

Let’s modify our previous policy to upgrade the Puppet package whenever a newer version is available:

package { 'puppet-agent':
  ensure  => latest,
  notify  => Service['puppet'],
}

service { 'puppet':
  ensure  => running,
  enable  => true,
  subscribe => Package['puppet-agent'],
}

If a newer version of Puppet is available, then the puppet-agent package will be upgraded. Any time this package is installed or upgraded, the puppet service will be restarted.

As noted previously, the notify and subscribe metaparameters are redundant. Either one would send the refresh event without the other. However, there is no harm in applying a belt-and-suspenders approach.

The refresh event has special meaning for exec resources with the attribute refreshonly set to true. The exec resource will not be evaluated unless it receives a refresh event. In the following example, we will update the facts.yaml file for MCollective only after Puppet has been upgraded:

Package { 'puppet-agent':
  ensure => latest,
  notify => Exec['update-facts'],
}

exec { 'update-facts':
  path        => ['/bin','/usr/bin'],
  command     => 'facter --puppet --yaml > /etc/mcollective/facts.yaml',
  refreshonly => true,
}

Under normal conditions, this exec resource will not execute. However, if the puppet-agent package is installed or upgraded, the notify attribute will send a refresh event and the command will be run.

Warning

refreshonly has numerous limitations, the primary being that it only runs on notification. It doesn’t evaluate and converge—it’s more like a tossed rock. Test the system state with creates, onlyif, or unless whenever possible.

An exec resource used in a chain of notifications can silently block the entire chain if a dependency fails for any reason. Felix has a great write-up on the issues that resource dependencies can create at “Friends Don’t Let Friends Use Refreshonly”.

Chaining Resources with Arrows

You can also order related resources using chaining arrows. Place the required resource on the left, and a dependent resource on the right, linked together with ->. For example, to install Puppet before starting the service, you could declare it like so:

Package['puppet-agent'] -> Service['puppet']

You can use ~> to also send a refresh event, like notify does. For example, this will restart the Puppet service after the package is upgraded:

Package['puppet'] ~> Service['puppet']

The chaining arrow syntax is harder to read than the metaparameters, and should be avoided when possible. In particular, right-to-left relationships are harder to read and explicitly against the Puppet Language Style Guide:

# Don't do this. Order it left -> right instead.
Service['puppet'] <~ Package['puppet']

Processing with Collectors

A collector is a grouping of many resources together. You can use collectors to affect many resources at once.

Warning

We refer to collectors as “agents of unintended consequences.” Use collectors sparingly and carefully.

A collector is declared by the capitalized type followed by <|, an optional attribute comparison, and |>. Let’s examine some collectors:

User <||>                       # every user declared in the catalog
User <| groups == 'wheel' |>    # users in the wheel group
Package <||>                    # every package declared in the catalog
Package <| tag == 'yum' |>      # packages tagged with 'yum' tag
Service <||>                    # every service declared in the catalog
Service <| enabled == true |>   # services set to start at boot time

Search expressions may be grouped with parentheses and combined, as shown here:

# Services running OR set to start at boot time
Service <| ( ensure == running ) or ( enabled == true ) |>  

# Services other than Puppet set to be running
Service <| ( ensure == running ) and ( title != 'puppet' ) |>

Warning

Note the comment “declared in the catalog.” The User collector in the second example would only match users who are declared (in a Puppet manifest) to be members of the wheel group, and not a user added to the group by, let’s say the useradd command. Collectors act on resources in the catalog; they do not inspect the system for undeclared resources.

One scenario where chaining arrows have proven very useful is processing many resources with collectors. By combining chaining arrows with collectors, you can set dependencies for every resource of one type.

For example, you could have our previous exec update facts.yaml whenever a package is added or removed:

# Regenerate the facts whenever a package is added, upgraded, or removed
Package <||> ~> Exec['update-facts']

Likewise, you could ensure that the Puppet Labs Yum repository is installed before any packages tagged with puppet or mcollective:

Yumrepo['puppetlabs'] -> Package <| tag == 'puppet' |>
Yumrepo['puppetlabs'] -> Package <| tag == 'mcollective' |>

Best Practice

Limit use of collectors to clearly scoped and limited effect. A collector that matches all resources of a given type will affect a resource another person adds to the catalog, unaware that your collector will affect it. The best usage of collectors affects only the resources within the same manifest.

You can find more details about collectors at “Language: Resource Collectors” on the Puppet docs site.

Understanding Puppet Ordering

During the catalog build, prior to applying any resources, Puppet creates a dependency graph using the Directed Acyclic Graph (DAG) model, which ensures no loop in the pathways. Each catalog resource is a vertex in the graph. The directed edges between vertices are created from the implicit dependencies of related resources, followed by dependencies declared using the metaparameters and chaining arrows discussed in this chapter. Puppet uses this non-looping directed graph to order the resource evaluation.

Resources without explicit ordering parameters are not guaranteed to be ordered in any specific way. In versions of Puppet greater than 2.6, unrelated resources were evaluated in an order that was apparently random, but was consistent from run to run. (In versions of Puppet prior to 2.6, it was not consistent from node to node or run to run.) The only way to ensure that one resource was evaluated before another was to define dependencies explicitly.

The ordering configuration option was introduced in Puppet 3.3 to allow control of ordering for unrelated resources. This configuration option accepts three values:

title-hash (default in all previous versions of Puppet): Orders unrelated resources randomly but consistently between runs.
manifest (default in Puppet 4): Orders unrelated resources by the order they are declared in the manifest.
random: Orders resources randomly and changes the order on each run. This is useful for identifying missing dependencies in a manifest.

Although resources in a manifest will generally be evaluated in the order defined, never count upon implicit dependencies. Always define all dependencies explicitly. This is especially important when you are extending another manifest or module, or when your manifest or module could be extended by someone else. This happens more than you might expect.

Best Practice

State all dependencies explicitly.

You can flush out missing dependencies by testing your manifests with the random ordering option. Each time you run the following, the resources will be ordered differently. This almost always causes failures for any resources missing necessary dependencies:

$ puppet apply --ordering=random testmanifest.pp

Debugging Dependency Cycles

It is necessary to avoid loops in dependencies, where two things each depend on the other being created first. The first time you run into this you may realize that many of the expressed dependencies aren’t really essential. Many newcomers to Puppet try to order every resource into a strict pattern, no matter whether the resources are truly dependent or not. This will make dependency cycle problems show up far more often than necessary:

[vagrant@client ~]$ puppet apply /vagrant/manifests/depcycle.pp
Notice: Compiled catalog for client.example.com in environment production
Error: Failed to apply catalog: Found 1 dependency cycle:
(Cron[check-exists] => File[/tmp/file-exists.txt] => Cron[check-exists])
Try the '--graph' option and opening the resulting '.dot' file
    in OmniGraffle or GraphViz

Puppet will tell you about the dependency cycle, and the output for this simple example is obvious and easy to read. You could edit this manifest and fix this cycle within a minute.

As the amount of Puppet code in use grows, avoiding this problem can take a fairly significant effort. If different teams are all writing their own modules with their own dependencies, you may find a situation where only one group of nodes sees a dependency loop that doesn’t affect hundreds of other cluster configurations. It all depends on which resources are included in the node’s catalog.

When you’re dealing with a large catalog of interdependent modules, that analysis can be very difficult. Thankfully, Puppet will show you the DAG-model dependency graph of the generated Puppet catalog, so that you can evaluate it visually.

[vagrant@client ~]$ puppet apply /vagrant/manifests/depcycle.pp --graph
Notice: Compiled catalog for client.example.com in environment production
Error: Failed to apply catalog: Found 1 dependency cycle:
(Cron[check-exists] => File[/tmp/file-exists.txt] => Cron[check-exists])
Cycle graph written to
  /home/vagrant/.puppetlabs/opt/puppet/cache/state/graphs/cycles.dot.

After you have created the cycles.dot file you can load it up in a viewer. Here are some suggestions:

I’m a big fan of OmniGraffle and it works great for this. There’s a free 14-day trial.
You can download GraphViz to convert the files, and ZGRViewer to view them.
You can copy and paste the contents of cycles.dot into a web resource like WebGraphviz to see an online rendering.

The graphical representation can be very useful for people who think in a visual manner. When working with a big team, it can be helpful to print it out in very large form, hang it on the wall, and discuss potential solutions with pins and markers.

Avoiding the Root User Trap

There is a very common dependency cycle trap that nearly every Puppet user falls smack into at least once. High up in the dependency graph are always several basic systems management resources owned by the root user that must be installed before anything else is done—for example, configuring authentication and name service ordering.

Farther down the dependency graph are things like mounting NFS volumes, creating users, and so on. Seems reasonable, yeah? You could declare a small dependency set such as the following to create users and ensure their home directories are mounted:

file { '/home':
  ensure => directory,
  owner  => 'root',
}

mount { '/home':
  ensure  => 'mounted',
  fstype  => 'nfs',
  device  => 'netapp:/home',
  require => File['/home'],
}

$users.each |$user,$config| {
  user { $user:
    uid      => $config['uid'],
    password => $config['passwd'],
    home     => $config['home'],
    shell    => $config['shell'],
    require  => Mount['/home'],
  }
}

And if your users are jack, jill, tina, and mike, then this recipe will work perfectly. Then one day you add the root user to the list, so that you can centrally manage the root password:

[vagrant@client ~]$ puppet apply /vagrant/manifests/depcycle2.pp
Notice: Compiled catalog for client.example.com in environment production
Error: Failed to apply catalog: Found 1 dependency cycle:
(File[/home] => Mount[/home] => User[root] => File[/home])
Try the '--graph' option and opening the resulting '.dot' file
 in OmniGraffle or GraphViz

At this point you’re saying, “Wait, what? There have been no code changes, how could data break a manifest dependency?” This is where you learn an important lesson; the dependency graph only tracks resources in the catalog.

Yesterday the root user wasn’t in the catalog. The file resource defined a soft autorequire upon the user, but this resource wasn’t defined so the dependency was ignored. When a file resource’s user attribute contains a user name not declared in the catalog, the file resource will blindly attempt to chown the file or directory to the named user.

However, when you add the root user to the Puppet catalog, the autorequire matches and a dependency is created. This creates a dependency loop that didn’t exist yesterday.

There are several simple ways to get out of this scenario:

Avoid creating a root user resource in the catalog. Create an alternate root account if necessary for root login.
Create the root user (and other users necessary for critical dependencies) early in a separate manifest.
Utilize numeric uid => 0 and gid => 0 attribute values to avoid creating the implicit root user dependency.

All of these solutions have drawbacks. You’ll have to figure out which one best suits your needs.

Utilizing Stages

Stage resources allow you to break up the Puppet run into stages to ensure that some things happen before other things. You create a stage and define the order of staging using the ordering metaparameters:

stage { 'initialize':
  before => Stage['main'],
}
stage { 'finalize': }
  after => Stage['main'],
}

Then you could assign classes to stages using the stage metaparameter.

Note

We haven’t covered Puppet classes yet, but topically this is the best time to discuss catalog ordering. (This book has a dependency cycle!) Just know that Puppet classes are named blocks of Puppet code.

In theory, this sounds great. In practice, I’ve been forced to remove stage from every place I’ve tried to use it, due to the overwhelming limitations of this approach:

It becomes even harder to sort out dependency cycles, as the assignment of a class to a stage prevents resolving conflicts with dependencies in a different stage.
You cannot notify or subscribe to resources across stage boundaries.
You cannot assign classes to stages using Hiera data, which forces you back to inflexible old-style class resource declarations (covered in Part II).

Stages are effectively unusable except for small corner cases where a very small manifest with no dependencies needs to run first or last. Even then it’s usually easier to set this up with class ordering metaparameters.

Reviewing Resource Relationships

Puppet evaluates resources in a manifest according to a dependency graph created from the following explicit dependency controls:

before metaparameter and the -> chaining arrow
notify metaparameter and the ~> chaining arrow
require metaparameter
subscribes metaparameter

Puppet 4 will evaluate resources not listed in the dependency graph in the order in which they are declared. You can change this ordering using the ordering configuration option. Never depend on the manifest ordering; instead, declare all relationships explicitly.

The random ordering option is useful for testing manifests for missing dependencies.

Previous Chapter

6. Controlling Resource Processing

Next Chapter

8. Upgrading from Puppet 3

Table of Contents for Learning Puppet 4

Chapter 7. Expressing Relationships

Managing Dependencies

Referring to Resources

Tip

Ordering Resources

Tip

Assuming Implicit Dependencies

Triggering Refresh Events

Warning

Chaining Resources with Arrows

Processing with Collectors

Warning

Warning

Best Practice

Understanding Puppet Ordering

Best Practice

Debugging Dependency Cycles

Avoiding the Root User Trap

Utilizing Stages

Note

Reviewing Resource Relationships

Table of Contents for
Learning Puppet 4