Regulatory compliance is a major driver for security. In some organizations, compliance is the main driver for the security program, defining how changes are made, what reviews and testing are done and when, which vulnerabilities are fixed and which ones are not, and how developers, testers, operations, and security people work together.
Regulations such as PCI-DSS, HIPAA and HITECH, SOX, GLBA, SEC regulations and MiFID, 23 NYCRR 500, FDA safety regulations for software validation, FERC and NERC, FERPA, FedRAMP, FFIEC and FISMA, and COBIT and HiTRUST and ISO/IEC 27001 and the NIST and CIS standards that organizations follow to meet these regulations, all define requirements and rules, guidelines and constraints on system design and assurance, staff training and awareness, risk management and vulnerability management, change control and release management, auditing and data retention, network and system monitoring, and IT operations.
If you work in a regulated environment, you must understand how compliance impacts your security program and how to include compliance in development and operations. In other parts of this book we try to help you to think like an attacker, to focus on threats and vulnerabilities and forensics. In this chapter, we will help you to think like an auditor, and understand how to look at your system from an auditor’s point of view, to focus on risks and controls and evidence.
Now, before we get onto what compliance means at a practical level, it’s important to draw attention to the difference between compliance and security for a moment.
Security is the process of putting in place physical, technical, and administrative controls to protect the confidentiality, integrity, and availability of information. It means taking steps to ensure that only the people who are authorized to access our systems and data can do so, that the data we store is accurate and remains that way, and that customers can use a system as and when they need to.
Compliance is a bit different—it’s about showing that we understand and are committed to security, and meeting requirements so that we’re allowed to keep operating. Security is the policies and processes that protect all of our information and systems. Compliance allows us to operate in certain fields or store certain types of information by showing that we can do so safely.
Compliance standards and regulations like PCI and HIPAA and GLBA, and government privacy laws such as California state privacy laws and German and Italian national privacy laws, Canada’s PIPEDA, and the EU’s new General Data Protection Regulation (GDPR), lay out legal requirements and specific obligations for protecting personal and private information, where this information can and cannot be kept, and what you have to do in the event that these obligations are violated.
Throughout this chapter, and in other chapters, we’ll refer to different categories of personal or private information. Some of this information is important enough to have their own acronyms: PHI and PII.
Personal (or Protected) Health Information: any information that relates to the physical or mental health of an individual, including the patient’s condition, provision of health care, or payment for health care.
Personally Identifiable Information: any information used to identify a person’s identity, including name, date of birth, biometric records, SSN, IP address, and credit card records.
The point of regulatory compliance is to achieve the following:
Make sure that you understand how to do things responsibly by defining a minimum set of requirements for security and privacy, generally requiring you to follow recognized standards or best practices for risk management, change management, software development, vulnerability management, identity and access management, data encryption, and IT operations.
Force you to prove that you are doing things responsibly by requiring you to establish and maintain top-down policies, track all changes, regularly scan and test for vulnerabilities, and undergo periodic assessments and audits to check that your controls are effective.
Punish your organization if you fail to meet these requirements through fines and other sanctions. These fines could run to tens or hundreds of thousands of dollars per month for banks that do not comply with PCI DSS, and to several million dollars for health care organizations in the event of a data breach under HIPAA.
In this chapter, we’ll look at compliance to understand how it impacts:
Design and requirements management (regulatory auditing and reporting requirements, and traceability).
Testing and reviews (assurance and assessments, validation, and verification).
Delivering software changes into production (change control and release management).
Oversight of developers especially in production (separation of duties).
Protection of data (privacy, access control and encryption, and data retention).
Documentation (how to get by with the least amount of paperwork necessary).
We’ll examine different compliance frameworks, including PCI DSS, throughout this book, to show how they work and what they mean at a technical level. And we’ll explore ways to meet compliance requirements effectively and efficiently in Agile environments.
But we won’t be providing exhaustive coverage of any regulation. We aren’t offering legal advice. You need to take time to understand and confirm your compliance obligations, what systems and data and activities are in scope, and work with your compliance and risk and security officers, your auditors, and management to come up with an approach that works for you and for them.
Compliance may be enforced on your organization directly, or, if you are a service provider, by your customers. Some organizations operate under multiple regulations, with overlapping and varying requirements. For example, if you are a public company offering an online medical service, you could be subject to SOX, HIPAA, PCI DSS (if your customers can pay using credit/debit cards), and different data privacy regulations. You will need to come up with an approach that satisfies their requirements together, and independently.
Some regulations give you a pretty clear idea of what you need to do. Others? Not so much. This is because there are two very different approaches that regulators can take when framing regulations:
A set of specific guidelines on what you must or can, and what you must not or cannot, do. Prescriptive, rules-based regulations lay out a set of specific controls and procedures, highlight risks, and tell you—and auditors—what you must do, when or how often, and what evidence you need to keep. These are regulations that can be mostly captured in checklists.
Examples of rules-based regulations include PCI DSS, FedRAMP, FISMA, and various safety regulations.
These regulations describe security or risk management or operational objectives, or legal obligations, that you must meet, but they don’t specify how you need to do this. There are usually a few detailed rules around regulatory reporting and assessments, but otherwise your organization is free to choose its own approach, provided that your controls are deemed to be “adequate,” “effective,” and “reasonable.”
In theory, this allows you to be more efficient and more innovative in meeting regulatory requirements. But this also means that compliance is less certain. Auditors have more leeway to judge if your programs and controls are sufficient or satisfactory, and penalties are assessed by how inadequate, ineffective, or unreasonable the auditors found them to be. You will need to defend your approach and clearly show how it supports the regulations.
This is why so many organizations fall back on heavyweight control frameworks like COBIT and ITIL. Following a standardized, widely accepted, and comprehensive governance model adds costs and delays, but reduces the risk that your program will be found inadequate by auditors or investigators, which should help minimize penalties or liability when something goes wrong.
Examples of outcome-based regulations are HIPAA, FDA QSR, SOX 404, and SEC Regulation SCI.
Let’s take a quick look at a couple of examples to show how fundamentally different these regulatory approaches are.
The Payment Card Industry Data Security Standard (PCI DSS) is a cross-industry standard of practice that many teams will end up having to deal with, as it applies to any system that deals with credit or debit card payments, either directly, or through a third-party service.
Rather than vague legal statements that you must demonstrate appropriate due care, PCI DSS lays out specific, mostly concrete requirements and obligations: you must do this, you must not do that, and this is how you must prove it. It sets reasonably clear expectations about what risks need to be managed and how, about what data needs to be protected and how, what testing or reviews you need to do, and when you need to do them.
PCI DSS is described in a number of guides, supplementary documents, notices, checklists, and FAQs; but the core of the regulation is relatively small. While it isn’t fun (regulations are never fun), it’s not that difficult to follow. There’s even a Quick Reference Guide, only about 40 pages long, that does a decent job of outlining what you need to do.
There are 12 sections in a PCI-compliant security program. While people can—and do—argue about the effectiveness or completeness of the specific controls, PCI DSS is a reasonably good model for a security program, whether or not you deal with credit card information. It anticipates many of the risks and security issues that you need to deal with in any online system that holds important data, and lays out a structured approach to manage them. Just substitute “cardholder data” with whatever sensitive or private data that you need to manage:
Use of firewalls and network segmentation. Ensure that configurations are periodically reviewed and that all configuration changes are tested. Map all connections and data flows.
Identifying all of the systems involved, and making sure that they are configured safely to “industry-accepted definitions” (eg., CIS), specifically making sure to change vendor-supplied default credentials.
Limit/restrict storage. Use one-way hashing, strong encryption, tokenization, masking, or truncation. Specific restrictions on what data can/cannot be stored.
Using strong encryption and security protocols when transmitting sensitive data over open networks.
Use of antivirus systems.
We’ll look at this section in more detail below.
Access control restricted based on business need-to-know.
Including when multifactor authentication is required.
Restricting and tracking physical access to data centers and sensitive areas, and to backup media.
Auditing, logging. Specific requirements for auditing of activities, access to restricted data, and retention of audit trails.
Scanning for wireless access points, internal and external network vulnerability scanning, penetration testing, intrustion detection/prevention, and detective change control.
Governance, risk assessment, and information security policies to prove that management and everyone else understands the requirements and how they are being met, in addition to insisting that operational procedures and policies for each of the previous requirements are “documented, in use, and known to all affected parties.”
This includes regular security and compliance awareness training for developers and operations.
In order to be compliant, your organization must meet all of these requirements, and undergo regular assessments.
Requirement 6 is the section that applies most directly to software development. It addresses requirements, design, coding, testing, reviews, and implementation.
Identify vulnerabilities, determine priority based on risk, and deal with them. We looked at vulnerability management in more detail in Chapter 6, Agile Vulnerability Management.
Make sure that patches to third-party software are up to date. Critical patches must be installed within one month.
This includes specific requirements such as ensuring that all test accounts and credentials are removed before applications are put into production, reviewing code changes (manually or using automated tools) for security vulnerabilities, and making sure that code review results are reviewed and approved by management before the code changes are released.
Including clear separation of duties between production and development/test, evidence of security testing, impact analysis and backout planning, and documented change approval by authorized parties.
At least annually, and provide them with secure coding guidelines.
Requires an application vulnerability assessment review at least once per year, or before rolling out major changes. Alternatively, you can protect your system by implementing a runtime defense solution that blocks attacks (such as a Web Application Firewall).
To prove all of this.
That’s “all there is” to PCI compliance from a development perspective, beyond correctly implementing controls over credit cardholder data in Requirement 3 and access restrictions to cardholder data in Requirement 7. The devil, of course, is in the details.
Contrast this with Reg SCI (Systems Compliance and Integrity), which is one of a number of regulations that stock exchanges and other financial services organizations in the US need to meet. It is concerned with ensuring the capacity, integrity, resiliency, availability, and security of financial systems.
Reg SCI is 743 pages of legalese, much of it describing the process followed to draft the regulation. The core of the regulation is captured in about 20 pages starting somewhere after page 700. It requires that organizations develop and maintain “written policies and procedures reasonably designed” to ensure that systems are developed and operated safely and that they meet all necessary legal requirements.
These policies and procedures must cover system design and development, testing, capacity planning and testing, continuity planning and testing, change management, network design and management, system operations, monitoring, incident response, physical security, and information security, as well as outsourcing any of these responsibilities. Reg SCI also lays out obligations for reporting to the SEC, and for annual audits.
Policies and procedures may be considered to be “reasonably designed” if they comply with “information technology practices that are widely available to information professionals in the financial sector and issued by a recognized organization” such as the US government.
The list of recognized best practices includes, for almost every area, NIST SP 800-53r4, which is a 462-page document describing 218 specific risk management, security, and privacy controls and how to apply them for US federal information systems (based on the risk profile of the system). Controls that apply to software development are described under a number of different control areas, such as System Acquisition (mostly), Configuration Management, Risk Assessment, System and Information Integrity, Awareness and Training, Privacy, Security Assessment and Authorization, and so on. Each of these controls reference other controls or other NIST documents, and other publications and standards.
The overall effect is, well, overwhelming. Determining what you need to do, what you should do, what you can afford to do, and how to prove it, is left up to each organization—and to its auditors.
NIST SP 800-53r4 doesn’t say you can’t build and deliver systems using Agile and Lean methods, but it is a shining example of everything that Agile and Lean is against: legal policies and shelf-loads of paper that are designed to make a bureaucrat happy, but that don’t translate to specific, measurable requirements that can be satisfied.
The SEC makes it clear that the list of standards is offered as guidance. Your controls and programs would be assessed against these standards. If you don’t choose to follow them, the burden is on you to prove that your system of controls is “reasonably designed.”
While Reg SCI’s outcome-based approach allows an organization some flexibility and freedom to make risk-based decisions on how to meet requirements, PCI’s prescriptive checklists are much clearer: you know what you have to do, and what you will be measured against. If you fail to meet PCI’s requirements, you should have a pretty good idea why, and what to do about it. On the other hand, it’s difficult to get auditors and assessors to look up from their detailed checklists at what you are doing, to help you understand whether your program is actually making your organization secure.
Regulations and governance frameworks are risk driven: they are intended to help protect the organization, and more important, customers and society, against security and privacy risks.
Risk management in Agile development was covered in Chapter 7, Risk for Agile Teams. For now, let’s look at how risk management drives compliance-related decisions and the application of controls from a compliance perspective.
Regulators require your organization to have an active risk management program to address security-related and privacy-related risks, and related operational and technical risks. This ensures that you’re not only trying to meet the basics of a compliance checklist, but that management and everyone working on projects and in operations are proactively working to identify, understand, and deal with rapidly changing risks and threats.
Auditors will look for a formal policy statement explaining why risk management is important to the organization, and outlining people’s responsibilities for managing risks. They will also look for evidence of your risk management program in operation, and test that your operational and security controls and procedures are being regularly reviewed and updated to deal with changing risks.
For example, PCI DSS 12.1.2 requires organizations to perform a formal risk assessment, at least annually, to review threats and vulnerabilities, as well as any changes to the environment, and to ensure that controls and programs are in place to effectively address these risks.
On a day-to-day level, your risk management program should include doing the following:
Using threat information to prioritize patching, testing, reviews, and security awareness training—we explored this in Chapter 8, Threat Assessments and Understanding Attacks
Taking advantage of regular Agile retrospectives and reviews, and include reviewing new security risks or operational risks or compliance risks and how they need to be dealt with
Conducting postmortem analysis after operational problems, including outages and security breaches, as well as after pen tests and other assessments, to identify weaknesses in controls and opportunities for improvement
Ensuring that team members are aware of common risks and have taken appropriate steps to deal with them—risks such as the OWASP Top 10
The Open Web Application Security Project (OWASP), a community of application security experts, regularly publishes a set of the top 10 risks that web applications face: the OWASP Top 10.
This is a list of the 10 most critical web application security risks. For each risk, the OWASP Top 10 provides examples of vulnerabilities and attacks, and guidance on how to test for these problems and how to prevent them.
The OWASP Top 10 is widely referenced in regulations such as PCI DSS, as a key risk management tool. Security training for developers is expected to cover the OWASP Top 10 risks. Security scanning tools map findings to the OWASP Top 10, and application pen testers will report if their findings map into the OWASP Top 10.
A fundamental requirement of compliance and governance frameworks is the ability to prove that all changes are properly authorized and tracked—and that unauthorized changes have been prevented. This involves tracing changes from when they were requested, to when they were delivered, and proving that the necessary checks and tests were performed:
When was a change made?
Who made the change?
Who authorized it?
Who reviewed it?
Was the change tested?
Did the tests pass?
How and when was the change deployed to production?
This could require a lot of paperwork. Or, as we’ll explain in this chapter, it could be done without any paperwork at all, by relying on your existing workflows and toolchains.
HIPAA, PCI DSS, and GLBA are examples of regulations that address protecting private or confidential information: health records, credit cardholder details, and customer personal financial data, respectively.
These regulations, and government privacy laws as we mentioned earlier, identify information which must be classified as private or sensitive, and lay out rules and restrictions for protecting this data, and what you need to do to prove that you are protecting it (including assurance, auditing, and data retention).
We are only providing general guidance here. Consult your data privacy officer or compliance officer or legal counsel to understand your organization’s data privacy obligations.
The basic steps to dealing with data privacy include:
Provide clear, vetted guidelines and rules up front to the Product Owner and the rest of the team, as well as training if necessary, so that everyone appreciates and understands data privacy risks, rules, and constraints.
Work with the team to ensure that compliance requirements are included in the team’s Definition of Done. This includes any documentation or other evidence that needs to be created or recorded for auditors.
Create a matrix of all information that is private or sensitive, who in the organization owns the data, and who is allowed to access (create, change, read, delete) it. Make this easily available to the team, and keep it up to date.
Understand and confirm data residency constraints, especially if you are running a system in the cloud. Some privacy laws may dictate that protected data cannot be shared or stored outside of specific legal jurisdictions.
Map out where protected data is collected, stored, updated, deleted, shared, and referenced, using the same kind of data flow diagrams that you created in threat modeling. Include temporary copies and working copies, caches, reports, spreadsheets, and backups.
Ensure that protected data is encrypted at rest and in transit using a recognized standard encryption algorithm, masked or pseudonymized, or substituted with a secure token.
Write stories to handle requirements for privacy consent and opt-out, right to be forgotten and notice provisions, as well as mandatory auditing, logging, data retention, and regulatory reporting.
For any story that collects, stores, updates, deletes, references, or shares protected data, carefully review the acceptance criteria against your compliance requirements, and mark these stories for additional legal reviews or compliance testing before they can be considered done.
Regularly scan content and databases and files, including logs, test data, and operations working directories, for protected data that is not encrypted, masked, or tokenized.
Scan code for references to protected data and calls to crypto, masking, or tokenization functions, and alert when this code is changed so that it can be reviewed.
Write compliance stories for audits, external reviews, pen tests, and other mandated checks so that they will be scheduled and planned for.
Create an incident response plan to deal with data breaches and disclosure, and regularly test your incident response capability (as we outline in Chapter 13).
Record evidence that you did all of this properly, including code review comments, acceptance tests, change control records, system access logs, and audit results.
Regularly review with the team to make sure that these practices and guidelines are being followed consistently.
To help you to follow good practices for crypto, experts at OWASP have put together these cheat sheets:
For an explanation on how to store data safely, use the Cryptographic Storage Cheat Sheet.
Because safely working with passwords is a separate, specific problem, there is a separate, specific Password Storage Cheat Sheet.
For safely communicating data, use the Transport Layer Protection Cheat Sheet.
To reiterate: do not implement your own encryption algorithm. If you have questions about how to handle crypto, get expert help.
How can you adapt your Agile or DevOps approach to the constraints of highly regulated environments?
Start by confronting compliance head-on. Don’t take a passive approach to compliance, and wait for someone to tell you what to do and how to do it—or worse, to tell you when you have failed to do something that you should have. Try to understand the intent and goals of regulatory directives, list out any clearly defined hard rules that they state, and look for ways to wire compliance into your workflows, and especially into automation.
Consider how you can leverage good technical practices and automation to meet compliance as well as your own support needs. If you can ensure—and prove—that every change is made through automated pipelines after it has been reviewed and checked in to version control, you have a powerful response to auditors as well as a powerful tool for investigating, and even preventing, operational problems.
You can do all this in a way that doesn’t slow teams down, that puts them in charge, and makes them accountable for meeting compliance requirements.
Throughout this book we emphasize how important it is that the security team find ways to be an enabler rather than a blocker: ways to help development and operations get their jobs done safely rather than getting in their way.
But sometimes in order to stay onside of compliance, you have to stand up and say, “No, we can’t do that,” and reject a change or block a deployment. Each time this has to happen, go back to the team and to management and find a way to prevent the situation from coming up again, by building compliance into the design or into the team’s workflows.
Mandatory compliance testing, reviews, and other items in compliance checklists need to be considered in the team’s Definition of Done, for stories, sprints, and releases:
What reviews need to be done and who needs to do them
What testing needs to be done
What documentation needs to be updated
What evidence needs to be provided that all of this was done
In Chapter 5 we looked at how to write security stories to help in implementing security requirements or security controls. In the same way, you may want to write compliance stories to describe explicit steps that need to be done and proven, separate from compliance criteria that might be part of specific user stories.
Compliance stories can act as reminders or schedule placeholders for controls that need to be put into place up front, including compulsory training for team members, and for assessments such as audits and pen tests.
Use tools like osquery and InSpec to write automated online compliance checks, and to provide traceability back to specific regulations or rule areas or governance controls.
Agile and Lean teams look for ways to minimize waste by keeping documentation to a minimum: working code over documentation.
But when it comes to compliance, some paperwork is unavoidable.
So what is the minimum amount of documentation that you need to satisfy your compliance and governance obligations? And what can you leverage from artifacts that you are already creating?
You will need documented security and risk management policies and guidelines, directional stuff establishing management accountability and clearly communicating responsibilities. And legal protection, such as confidentiality and nondisclosure agreements. All of this will need to be handed out to all staff, signed off, filed, and regularly reviewed and updated.
If you write something down in a policy, you must be prepared for everyone to follow it—and to prove that they followed it correctly.
Don’t copy and paste from a policy template that you find on the web (such as at https://www.sans.org/security-resources/policies) without making sure that it is practical for your organization, or write something down because you think it sounds like a good idea. If your organization fails to meet its own policies, auditors could use this as evidence of a material failure of controls, especially if you are in an outcome-based regulatory environment. People can be fired for this, and executives can be held personally liable.
Policies must be taken seriously. Treat them like promises—and you always keep your promises, right?
You will also need to keep a record of requirements so that they can be audited, and to trace changes: you won’t be able to just write stories on index cards or sticky notes pasted on a wall, and then throw them out as they are implemented.
But most compliance procedures and detailed checklists can be, and should be, taken out of documents and pushed into team workflows for developers and operations, and, wherever possible, into code. These should be enforced through rules in your automated build and deployment pipelines, in automated tests and checks, and by taking advantage of the audit trails that are automatically created.
To do this you will need to bring management, compliance, internal audit, the PMO, and the security team together with development and operations.
Compliance rules and control workflows need to be defined up front by all of these stakeholders working together—and any changes to the rules or workflows should be formally approved by management and documented, for example, in a change advisory board (CAB) meeting. Developers and operations need to walk through procedures and checklists with compliance and security and the PMO, map out key controls, and agree on simple ways to automate them. Management needs to understand how operational risks, security risks, and other risks will be controlled and managed through these automated workflows, tests, and pipelines.
Let’s look at how to solve some specific compliance concerns, in code.
In Chapter 11, we explained how an automated build pipeline works, and how to use continuous integration and continuous delivery tools and workflows to test and deploy changes quickly and safely.
Now you get to use all of this for compliance, tracing each change from when it was requested to when it was delivered, to show that changes are handled properly and consistently. These are the steps that an Agile team could take to prove traceability and assurance in a regulatory environment:
As we’ve seen, instead of working from detailed requirements specifications, Agile teams like to write up requirements in simple, concrete user stories on index cards or sticky notes that are thrown away once the story is implemented. But to satisfy compliance, you’ll need to keep a record of each feature and change, who asked for it, who approved it, and the acceptance criteria (i.e., conditions of satisfaction) that were agreed to.
While you could try to do this using a spreadsheet, for example, teams today can take advantage of Agile project management tools or story trackers like Rally, Version One, or Jira to provide insight into what the team is doing, and what it has already done from an auditing perspective.
The team works with compliance to agree on its Definition of Done, including the evidence needed to prove compliance for stories, sprints, and releases.
Everything—application code, configuration recipes and templates, tests, policies, and documentation (everything, that is, except for secrets)—is committed to version control, with a tie back to the specific requirement or change request or bug report (using a story ID, ticket number, or some other unique identifying tag that can be referenced as a comment on check-in), so that you have a detailed history of everything associated with each change.
Commit filters automatically scan for secrets and unsafe calls before code can be merged in to the mainline.
Changes to code and configuration are reviewed—before commit if possible—and the results of reviews are visible to the team, using Git pull requests or a collaborative code review tool like Gerrit or Review Board.
Reviewers follow checklists to ensure that all code meets the team’s standards and guidelines, and to watch out for unsafe coding practices. Management periodically audits to make sure that reviews are done consistently, and that engineers aren’t rubber-stamping each other’s work.
Every change (to application code and to configuration) is tested through continuous integration or continuous delivery: TDD, static code analysis, and other scanning, and automated acceptance testing as described in Chapter 11. Test coverage is automatically measured. Any serious problems cause the build to fail.
Because tests are checked into a source repo, you can review the tests and match them up against the acceptance criteria for each story, to see if the requirements were implemented correctly.
Code (including code dependencies) and infrastructure are regularly scanned for vulnerabilities as part of your automated pipelines, and vulnerabilities found are recorded and pushed into the team’s backlog to be fixed.
Changes are deployed automatically to acceptance test, then staging, and, if all tests and checks pass, to production, so that you can show when a change was promoted to each environment, and how this was done.
Systems are regularly scanned for unauthorized changes using detective change controls (like Tripwire or OSSEC).
This is a beautiful thing for operations and support. You can tell when changes are made or vulnerabilities are patched, and, if something goes wrong, you can trace exactly what was changed so that you can fix it quickly. It’s a beautiful thing for governance, because you can follow each change and ensure that the changes were made consistently and responsibly. And it’s a beautiful thing for compliance, because you can prove all of this to your auditors.
Change control is fundamental to compliance regulations, and to governance frameworks like ITIL (the IT Infrastructure Library) and COBIT. This comprises:
Making sure that all changes are authorized—SOX, insider threats, fraud, Reg SCI
Minimizing the operational risk of change—making sure that changes are understood, tested, and safe
Ensuring that changes are auditable, which builds on traceability
Most governance frameworks deal with change management in a bureaucratic, paperwork-heavy way, with forward planning, and formal approvals at change advisory board (CAB) meetings where a committee meets to assess the risks and impact of a change, determine preparedness, and agree to scheduling.
This is in essential conflict with Agile and Lean development, and especially with DevOps and continuous deployment which are predicated on frequent, iterative change, including running A/B experiments in production to get feedback. This could mean making changes several times a week, or several times a day—or at organizations like Amazon, several times per second.
How the heck can you have change control when you are rolling out changes every few seconds?
By taking the risk out of change up front, running every change through the same battery of tests and checks that we described earlier. And by optimizing for small, incremental changes.
While ITIL change management is designed to deal with infrequent, high-risk “big bang” changes, most changes by Agile and DevOps teams are small and low-risk, and can flow under the bar. They can be treated as standard or routine changes that have been preapproved by management, and that don’t require a heavyweight change review meeting.
Many larger changes can also be made this way, using dark launching. This is a practice made famous at Flickr and Facebook, where changes to code are hidden behind a feature flag: a switch that will only be turned on after getting approval. In the meantime, the team can continue to make and test changes in incremental steps, releasing the code to production without impacting operations. This can, in some cases, involve running the new code in simulation mode to collect data on usage and performance, or deploying it to a small community of users for beta testing, until everyone is confident that the feature is ready.
Feature flags and dark launching carry some potential operational and security risks that you need to understand and consider:
Although dark features are hidden to users while they are being rolled out, the code may still be accessible to attackers. Adding these features increases the attack surface of the application, which is especially a concern given that some of this code is still work in progress and is more likely to contain bugs which could be exploitable.
While dark features are being rolled out, the code will be more complicated, harder to understand, more difficult to change, and more expensive to test because you need to cover more paths, and sometimes combinations of paths if multiple features overlap. Feature switches should be short-lived, to minimize these risks. Once the feature has been rolled out, the switch should be deprecated, and the code cleaned up and refactored.
Before turning on the feature, the team members could hold an operational readiness review, or “pre-mortem” review meeting to go over their preparedness, explore failure scenarios and their ability to respond to them, and ensure that everyone is informed of the change in advance.
The risk of change can also be minimized by automating the steps involved, ensuring that changes are tested and deployed consistently and repeatably. All changes—to code and to configuration—should be rolled out using your automated build and delivery pipelines, the same pipelines that you use for testing, to take advantage of the built-in controls, to make sure that the steps have been rehearsed and proven, and to provide full traceability and visibility. Everyone knows what changes were made, when, by who, how they were tested, and how and when they were deployed.
By the time that you’re ready to deploy to production, you’ve already run the change through development, to acceptance testing, and staging, all using the same steps.
In this model, changes become constant, routine, and predictable.
One concern that needs to be addressed in change management, especially in DevOps environments where engineers can push changes automatically into production using continuous deployment, is separation of duties.
Separation of duties ensures that no single person can have full control from start to end of a change, and that changes cannot be made without testing and approval. This is intended to prevent malicious insider attacks and fraud, and to prevent honest insiders from taking shortcuts and bypassing the checks and balances that are designed to protect the system and the organization from security risks and operational risks.
Separation of duties is spelled out as a practice in governance frameworks like ITIL and COBIT, and it’s expected in other frameworks and regulations like ISO/IEC 27001, SOC 1 and SOC 2, SOX 404, PCI DSS, NIST 800-53, Reg SCI, and others. It ties into change control, as well as data privacy (by limiting the number of people who have access to production data).
Auditors are used to looking for evidence of separation of duties, such as network segmentation between development and production systems, and matrices clearly showing that developers, testers, operations, and support analysts are assigned to different roles, with access restrictions to the different systems and commands.
The most obvious implementation of this principle is the Waterfall/CMMI model which requires a series of documented handoffs between roles:
Business analysts define requirements, and get approval from the business owner.
Designers take the requirements and create specifications.
Developers write code to implement the specifications, and then hand the code off for testing.
Independent testers verify that the code meets the specifications.
Operations packages the code into a release, and waits for a change manager to complete an impact analysis and risk assessment, and schedule the change to be deployed.
All these steps are documented and tracked for management review and sign off.
Auditors like this a lot. Look at the clear, documented handoffs and reviews and approvals, the double checks and opportunities to catch mistakes and malfeasance.
But look at all the unnecessary delays and overhead costs, and the many chances for misunderstandings and miscommunication. This is why almost nobody builds and delivers systems this way any more.
The DevOps Audit Toolkit makes the argument that you don’t need all of these handoffs to prevent fraud and insider maliciousness, and to ensure that all changes are properly authorized, tested, and tracked. You can even empower developers to push changes directly to production, as long as they do the following:
The team agrees and ensures that all changes meet the Definition of Done, which includes that all changes are tested against the acceptance criteria defined for each story, and successfully reviewed by the Product Owner, who represents the interests of the business and management
Peer reviews (or pair programming) ensure that no engineer (dev or ops) can make a change without at least one other person in the organization being aware of it and understanding it. You could even insist that the team assign reviewers randomly to prevent collusion.
Every change—to code or configuration—is made through automated build and deployment pipelines, which ensure that they are tested and tracked.
Developers can be given read-only access to production system logs and monitors so that they can help in troubleshooting. But any fixes or patches need to be made through the automated build pipeline (fixing forward) or by automatically rolling changes back (again, through the same automated pipeline).
All changes made through the pipeline are done in a transparent way, logged and published to dashboards, chat rooms, and so on.
Production access logs are regularly reviewed by management or compliance.
Access credentials are reviewed regularly: including access to different environments, access to repos, and access to the pipelines and configuration management tools.
Automated detective change control tools (such as Tripwire, OSSEC, AIDE, and UpGuard) are used to alert on unauthorized changes to the build environment and to production systems. If you are deploying changes a few times per month or even several times per week, this is straightforward. If you are making changes multiple times a day, you need to be careful to filter out approved, automated changes to show the exceptions. It’s also important that these alerts are sent to someone outside of the engineering team, to security or compliance or management for review, to ensure that there is no conflict of interest.
For an informed and balanced auditor’s perspective on separation of duties in DevOps environments, see Douglas Barbin’s article, “Auditing DevOps – Developers with Access to Production”.
Building compliance into your culture takes time and persistence. This is something that needs to be done from the top down, and from the bottom up.
Management needs to understand what is required for compliance purposes and communicate these requirements down to every team. They also need to show that they are serious about meeting compliance, in the way that they spend money, and the way that they make decisions about priorities.
For teams, compliance should—and has to—build on top of the team’s commitment to doing things right and delivering working software. Teams that are already working toward zero defect tolerance, and teams that are following good technical practices, including continuous integration, should be more successful in meeting compliance.
The more that you can leverage from these practices, and especially from automation, the easier it will be to satisfy compliance. In the model that we’ve described here, many compliance requirements can be met by reinforcing good engineering practices that teams are already following, or should be following, and taking advantage of audit trails provided by automated tooling.
Even within a structured and prescriptive control framework like PCI, give engineering teams the opportunity to come up with their own ideas on how to meet requirements, the chance to automate as much work as possible, and a voice in what documentation needs to be done. Help them to understand where the lines are, how high the bar needs to be set, and where they could have some flexibility to meet a governance or compliance requirement.
These aren’t problems to be avoided or evaded. They are problems that need to be solved in ways that are efficient, effective and, where possible, satisfying to the people who need to do the work. They should be treated in a Lean way: map the value stream, recognize and understand the constraints, identify the bottlenecks and inefficiencies, measure, learn, and improve.
All of this is a heavy lift for startups, or for teams that don’t have strong leadership and management support. But it is achievable—and worth achieving.
To make auditors happy, you need to provide them with evidence that you’ve met specific compliance requirements, or evidence that proves compliance with defined outcomes.
Just as beauty is in the eye of the beholder, compliance is in the opinion of the auditor. Auditors might not understand the approach that you are taking at first, especially auditors who are accustomed to reviewing detailed policies and procedures, and making people fill out checklists and spreadsheets.
You will need to explain how your pipelines work, walk them through your controls, and through the code and repos and tests and audit logs, and show how it all ties together. But a good auditor should be able to appreciate what you are doing, and recognize how it is good for you, as well as for them.
If you follow the approach outlined here, using your automated build pipelines as a control plane for compliance, you can prove that you are regularly scanning and reviewing code and infrastructure for vulnerabilities, and you can track when vulnerabilities were found and fixed.
You can provide a complete audit trail for every change, from when the change was requested and why, who authorized the change, who made the change and what that person changed, who reviewed the change and what they found in the review, how and when the change was tested and what the results were, to when it was deployed:
You can prove that changes were reviewed and tested, and prove that changes were all made in a standardized, repeatable way.
You can show that compliance policies and checks were enforced in reviews, in testing, in scanning, and in release control.
You can show that you’ve enforced separation of duties between development and operations.
And, if you are following the controls consistently, you can prove that all of this was done for every change.
In the same way that frequently exercising build and deployment steps reduces operational risks, exercising compliance on every change, following the same standardized process and automated steps each time, minimizes the risks of compliance violations. You — and your auditors — can be confident that all changes are made the same way, that all code is run through the same tests and checks, and that everything is tracked the same way, from start to finish.
As part of your evidence, be prepared to show auditors logs of test failures and build failures, and that these errors were subsequently remediated, to demonstrate that your controls are actually working and catching mistakes.
Auditors can verify that your control program is consistent, complete, repeatable, and auditable. We can see the smile on your auditor’s face already.
Your auditors won’t always be happy, especially if they are coming in to investigate after a security breach.
Although many regulations lay out reporting requirements in the event of a breach, it’s not always clear what your priorities should be, especially when you are in the middle of dealing with a security incident. There are a lot of factors to understand and consider.
How serious is the breach? Different agencies have different bars for what you need to report, and when. What if you fall under several different compliance and legal jurisdictions?
Who do you need to report to first? Legal counsel? Your Board? A forensics specialist? Law enforcement? Your partners and customers? Your insurance agent? The regulators? Government agencies? Who do you contact? What information do you need to provide, and how quickly?
You should have all of this worked out in advance as part of your incident response playbooks.
Once you’ve dealt with disclosure, you need to prepare for the follow-up analysis and audit(s) to understand what went wrong, how badly it went wrong, who will be held responsible or accountable, and how much it will cost to make up for what went wrong. If you experience a serious breach, and you work in a regulated environment, your organization will by definition be found noncompliant—after all, if you were 100% compliant, how could anything have possibly gone wrong? An investigation just has to prove this, and with the power of perfect hindsight, it will. After a breach or a serious compliance failure, everyone will be trying to find the smoking gun, and they will keep looking for mistakes and gaps and scapegoats until they have enough smoke to fill out their report.
Repeatable, automated workflows with built-in audit trails, and evidence checked into version control, will at least make this less painful and less expensive for you and for the investigators, and help to show that you were doing some (hopefully most) things right.
Getting certified to a standard like ISO/IEC 27001, or an attestation that you’ve met the requirements of a SOC review, or a similar assessment can be an important goal in meeting your compliance requirements.
This paperwork helps you make a case to your auditors, as well as to your customers and your owners—and your competitors—that you’ve taken responsible and reasonable steps to protect your organization and your customers. Certification could also be used as safe harbor in defending legal actions in the event of a breach or some other failure.
Certification takes a lot of work. But it shows that your organization has reached a significant level of maturity, and it’s an authentic validation of your commitment to do things right.
In compliance environments, we talk about certification and attestation as if they are the end goal of the compliance process. In some respects, however, they are just the beginning. The aim of compliance schemes is to ensure that the organization is continuously compliant. Each of the requirements (at least in prescriptive, rules-based regulations) are designed to be recurring and present in all day-to-day activities. This is why most certifications or attestation reviews are done across a period of time, such as a year, to verify that your controls were consistently applied.
This moves compliance from being a moment in time, auditable requirement to a continuous compliance need. By using the automation suggestions in this chapter, your team and organization can verify its continued compliance and identify issues early. While this may not prevent a compliance breach from happening, it will mean that if the auditors are sent to your door, you will have evidence of your processes and approaches up to and at the time of the breach, not just on your last attestation or certification audit.
Certification or a successful attestation report does not mean that your organization is compliant. It means that you’ve satisfied a series of conditions that a regulator would reasonably expect, using a recognized and validated approach. A regulator could still find you in breach of compliance for some reason, although you’ve reduced your risk significantly.
Certification does not mean that you are secure. Several organizations that held certifications or passed all of their audits still suffered from high-profile breaches or serious operational failures. Just because you’ve met a certification qualification doesn’t mean that you can stop looking, learning, and improving.
While compliance does not imply security for all systems, it is a requirement to operate for many organizations, and can play an important part in building your security approaches and culture.
Compliance regulations lay out minimum requirements for security controls, reviews, and oversight. This can be done in two different ways: a detailed rules-based approach like PCI DSS, or an outcome-based model like SOX 404.
Compliance will put constraints on an Agile team’s freedom to “inspect and adapt” the way that it works, but the team can still have a voice (and choices) in how to meet compliance obligations.
Instead of trying to enforce compliance through manual checklists and point-in-time audits, compliance rules can be built into engineering workflows and automation, to continuously enforce controls and record evidence for auditors in your build pipelines.
Separation of duties between developers and operations presents concerns for regulators as responsibilities between these organizations blur, especially in DevOps environments. You will need to take careful steps and rely heavily on automation to manage these risks.
Compliance is everyone’s responsibility, even if you are not cutting code every day, or if you are only playing a supporting role in the development life cycle. You can help protect the organization’s systems and help meet compliance requirements through good information security practices like locking your devices whenever you step away; choosing strong, unique passwords for all your work systems; and not sharing your accounts or passwords with others.
And finally, be vigilant and speak up if something doesn’t seem right. Everyone needs to do their part to detect and report any incidents or issues that could compromise compliance. If you spot something, please report it. Remember, compliance needs to be 24/7, and the consequences for a lapse can be serious. It’s up to everyone to play their part.