Table of Contents for
sed & awk, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition sed & awk, 2nd Edition by Arnold Robbins Published by O'Reilly Media, Inc., 1997
  1. sed & awk, 2nd Edition
  2. Cover
  3. sed & awk, 2nd Edition
  4. A Note Regarding Supplemental Files
  5. Dedication
  6. Preface
  7. Scope of This Handbook
  8. Availability of sed and awk
  9. Obtaining Example Source Code
  10. Conventions Used in This Handbook
  11. About the Second Edition
  12. Acknowledgments from the First Edition
  13. Comments and Questions
  14. 1. Power Tools for Editing
  15. 1.1. May You Solve Interesting Problems
  16. 1.2. A Stream Editor
  17. 1.3. A Pattern-Matching Programming Language
  18. 1.4. Four Hurdles to Mastering sed and awk
  19. 2. Understanding Basic Operations
  20. 2.1. Awk, by Sed and Grep, out of Ed
  21. 2.2. Command-Line Syntax
  22. 2.3. Using sed
  23. 2.4. Using awk
  24. 2.5. Using sed and awk Together
  25. 3. Understanding Regular Expression Syntax
  26. 3.1. That’s an Expression
  27. 3.2. A Line-Up of Characters
  28. 3.3. I Never Metacharacter I Didn’t Like
  29. 4. Writing sed Scripts
  30. 4.1. Applying Commands in a Script
  31. 4.2. A Global Perspective on Addressing
  32. 4.3. Testing and Saving Output
  33. 4.4. Four Types of sed Scripts
  34. 4.5. Getting to the PromiSed Land
  35. 5. Basic sed Commands
  36. 5.1. About the Syntax of sed Commands
  37. 5.2. Comment
  38. 5.3. Substitution
  39. 5.4. Delete
  40. 5.5. Append, Insert, and Change
  41. 5.6. List
  42. 5.7. Transform
  43. 5.8. Print
  44. 5.9. Print Line Number
  45. 5.10. Next
  46. 5.11. Reading and Writing Files
  47. 5.12. Quit
  48. 6. Advanced sed Commands
  49. 6.1. Multiline Pattern Space
  50. 6.2. A Case for Study
  51. 6.3. Hold That Line
  52. 6.4. Advanced Flow Control Commands
  53. 6.5. To Join a Phrase
  54. 7. Writing Scripts for awk
  55. 7.1. Playing the Game
  56. 7.2. Hello, World
  57. 7.3. Awk’s Programming Model
  58. 7.4. Pattern Matching
  59. 7.5. Records and Fields
  60. 7.6. Expressions
  61. 7.7. System Variables
  62. 7.8. Relational and Boolean Operators
  63. 7.9. Formatted Printing
  64. 7.10. Passing Parameters Into a Script
  65. 7.11. Information Retrieval
  66. 8. Conditionals, Loops, and Arrays
  67. 8.1. Conditional Statements
  68. 8.2. Looping
  69. 8.3. Other Statements That Affect Flow Control
  70. 8.4. Arrays
  71. 8.5. An Acronym Processor
  72. 8.6. System Variables That Are Arrays
  73. 9. Functions
  74. 9.1. Arithmetic Functions
  75. 9.2. String Functions
  76. 9.3. Writing Your Own Functions
  77. 10. The Bottom Drawer
  78. 10.1. The getline Function
  79. 10.2. The close( ) Function
  80. 10.3. The system( ) Function
  81. 10.4. A Menu-Based Command Generator
  82. 10.5. Directing Output to Files and Pipes
  83. 10.6. Generating Columnar Reports
  84. 10.7. Debugging
  85. 10.8. Limitations
  86. 10.9. Invoking awk Using the #! Syntax
  87. 11. A Flock of awks
  88. 11.1. Original awk
  89. 11.2. Freely Available awks
  90. 11.3. Commercial awks
  91. 11.4. Epilogue
  92. 12. Full-Featured Applications
  93. 12.1. An Interactive Spelling Checker
  94. 12.2. Generating a Formatted Index
  95. 12.3. Spare Details of the masterindex Program
  96. 13. A Miscellany of Scripts
  97. 13.1. uutot.awk—Report UUCP Statistics
  98. 13.2. phonebill—Track Phone Usage
  99. 13.3. combine—Extract Multipart uuencoded Binaries
  100. 13.4. mailavg—Check Size of Mailboxes
  101. 13.5. adj—Adjust Lines for Text Files
  102. 13.6. readsource—Format Program Source Files for troff
  103. 13.7. gent—Get a termcap Entry
  104. 13.8. plpr—lpr Preprocessor
  105. 13.9. transpose—Perform a Matrix Transposition
  106. 13.10. m1—Simple Macro Processor
  107. A. Quick Reference for sed
  108. A.1. Command-Line Syntax
  109. A.2. Syntax of sed Commands
  110. A.3. Command Summary for sed
  111. B. Quick Reference for awk
  112. B.1. Command-Line Syntax
  113. B.2. Language Summary for awk
  114. B.3. Command Summary for awk
  115. C. Supplement for Chapter 12
  116. C.1. Full Listing of spellcheck.awk
  117. C.2. Listing of masterindex Shell Script
  118. C.3. Documentation for masterindex
  119. masterindex
  120. C.3.1. Background Details
  121. C.3.2. Coding Index Entries
  122. C.3.3. Output Format
  123. C.3.4. Compiling a Master Index
  124. Index
  125. About the Authors
  126. Colophon
  127. Copyright

Advanced Flow Control Commands

You have already seen several examples of changes in sed’s normal flow control. In this section, we’ll look at two commands that allow you to direct which portions of the script get executed and when. The branch (b) and test (t) commands transfer control in a script to a line containing a specified label. If no label is specified, control passes to the end of the script. The branch command transfers control unconditionally while the test command is a conditional transfer, occurring only if a substitute command has changed the current line.

A label is any sequence of up to seven characters.[1] A label is put on a line by itself that begins with a colon:

:mylabel

There are no spaces permitted between the colon and the label. Spaces at the end of the line will be considered part of the label. When you specify the label in a branch or test command, a space is permitted between the command and the label itself:

b mylabel

Be sure you don’t put a space after the label.

Branching

The branch command allows you to transfer control to another line in the script.

[address]b[label]

The label is optional, and if not supplied, control is transferred to the end of the script. If a label is supplied, execution resumes at the line following the label.

In Chapter 4, we looked at a typesetting script that transformed quotation marks and hyphens into their typesetting counterparts. If we wanted to avoid making these changes on certain lines, then we could use the branch command to skip that portion of the script. For instance, text inside computer-generated examples marked by the .ES and .EE macros should not be changed. Thus, we could write the previous script like this:

/^\.ES/,/^\.EE/b
s/^"/``/
s/"$/''/
s/"?□/''?□/g
.
.
.
s/\\(em\\^"/\\(em``/g
s/"\\(em/''\\(em/g
s/\\(em"/\\(em``/g
s/@DQ@/"/g

Because no label is supplied, the branch command branches to the end of the script, skipping all subsequent commands.

The branch command can be used to execute a set of commands as a procedure, one that can be called repeatedly from the main body of the script. As in the case above, it also allows you to avoid executing the procedure at all based on matching a pattern in the input.

You can have a similar effect by using ! and grouping a set of commands. The advantage of the branch command over ! for our application is that we can more easily specify multiple conditions to avoid. The ! symbol can apply to a single command, or it can apply to a set of commands enclosed in braces that immediately follows. The branch command, on the other hand, gives you almost unlimited control over movement around the script.

For example, if we are using multiple macro packages, there may be other macro pairs besides .ES and .EE that define a range of lines that we want to avoid altogether. So, for example, we can write:

/^\.ES/,/^\.EE/b
/^\.PS/,/^\.PE/b
/^\.G1/,/^\.G2/b

To get a good idea of the types of flow control possible in a sed script, let’s look at some simple but abstract examples. The first example shows you how to use the branch command to create a loop. Once an input line is read, command1 and command2 will be applied to the line; afterwards, if the contents of the pattern space match the pattern, then control will be passed to the line following the label “top,” which means command1 then command2 will be executed again.

:top
command1
command2
/pattern/b top
command3

The script executes command3 only if the pattern doesn’t match. All three commands will be executed, although the first two may be executed multiple times.

In the next example, command1 is executed. If the pattern is matched, control passes to the line following the label “end.” This means command2 is skipped.

command1
/pattern/b end
command2
:end
command3

In all cases, command1 and command3 are executed.

Now let’s look at how to specify that either command2 or command3 are executed, but not both. In the next script, there are two branch commands.

command1
/pattern/b dothree
command2
b
:dothree
command3

The first branch command transfers control to command3. If that pattern is not matched, then command2 is executed. The branch command following command2 sends control to the end of the script, bypassing command3. The first of the branch commands is conditional upon matching the pattern; the second is not. We will look at a “real-world” example after looking at the test command.

The Test Command

The test command branches to a label (or the end of the script) if a successful substitution has been made on the currently addressed line. Thus, it implies a conditional branch. Its syntax follows:

[address]t[label]

If no label is supplied, control falls through to the end of the script. If the label is supplied, then execution resumes at the line following the label.

Let’s look at an example from Tim O’Reilly. He was trying to generate automatic index entries based on evaluating the arguments in a macro that produced the top of a command reference page. If there were three quoted arguments, he wanted to do something different than if there were two or only one. The task was to try to match each of these cases in succession (3,2,1) and when a successful substitution was made, avoid making any further matches. Here’s Tim’s script:

/\.Rh 0/{
s/"\(.*\)" "\(.*\)" "\(.*\)"/"\1" "\2" "\3"/
t
s/"\(.*\)" "\(.*\)"/"\1" "\2"/
t
s/"\(.*\)"/"\1"/
}

The test command allows us to drop to the end of the script once a substitution has been made. If there are three arguments on the .Rh line, the test command after the first substitute command will be true, and sed will go on to the next input line. If there are fewer than three arguments, no substitution will be made, the test command will be evaluated false, and the next substitute command will be tried. This will be repeated until all the possibilities are used up.

The test command provides functionality similar to a case statement in the C programming language or the shell programming languages. You can test each case and when a case proves true, then you exit the construct.

If the above script were part of a larger script, we could use a label, perhaps tellingly named “break,” to drop to the end of the command grouping where additional commands can be applied.

/\.Rh 0/{
s/"\(.*\)" "\(.*\)" "\(.*\)"/"\1" "\2" "\3"/
t break
.
.
.
}
:break
more commands

The next section gives a full example of the test command and the use of labels.

One More Case

Remember Lenny? He was the fellow given the task of converting Scribe documents to troff. We had sent him the following script:

# Scribe font change script. 
s/@f1(\([^)]*\))/\\fB\1\\fR/g
/@f1(.*/{
N
s/@f1(\(.*\n[^)]*\))/\\fB\1\\fR/g
P
D
}

He sent the following mail after using the script:

Thank you so much!  You've not only fixed the script but shown me
where I was confused about the way it works.  I can repair the
conversion script so that it works with what you've done, but to be
optimal it should do two more things that I can't seem to get working
at all—maybe it's hopeless and I should be content with what's
there.  

First, I'd like to reduce multiple blank lines down to one.
Second, I'd like to make sed match the pattern over more than two
(say, even only three) lines.  

Thanks again.  

Lenny

The first request to reduce a series of blank lines to one has already been shown in this chapter. The following four lines perform this function:

/^$/{
N
/^\n$/D
}

We want to look mainly at accomplishing the second request. Our previous font-change script created a two-line pattern space, tried to make the match across those lines, and then output the first line. The second line became the first line in the pattern space and control passed to the top of the script where another line was read in.

We can use labels to set up a loop that reads multiple lines and makes it possible to match a pattern across multiple lines. The following script sets up two labels: begin at the top of the script and again near the bottom. Look at the improved script:

# Scribe font change script.  New and Improved.
:begin
/@f1(\([^)]*\))/{
s//\\fB\1\\fR/g
b begin
}
/@f1(.*/{
N
s/@f1(\([^)]*\n[^)]*\))/\\fB\1\\fR/g
t again
b begin
}
:again
P
D

Let’s look more closely at this script, which has three parts. Beginning with the line that follows :begin, the first part attempts to match the font change syntax if it is found completely on one line. After making the substitution, the branch command transfers control back to the label begin. In other words, once we have made a match, we want to go back to the top and look for other possible matches, including the instruction that has already been applied—there could be multiple occurrences on the line.

The second part attempts to match the pattern over multiple lines. The Next command builds a multiple line pattern space. The substitution command attempts to locate the pattern with an embedded newline. If it succeeds, the test command passes control to the line following the again label. If no substitution is made, control is passed to the line following the label begin so that we can read in another line. This is a loop that goes into effect when we’ve matched the beginning sequence of a font change request but have not yet found the ending sequence. Sed will loop back and keep appending lines into the pattern space until a match has been found.

The third part is the procedure following the label again. The first line in the pattern space is output and then deleted. Like the previous version of this script, we deal with multiple lines in succession. Control never reaches the bottom of the script but is redirected by the Delete command to the top of the script.



[1] The POSIX standard says that an implementation can allow longer labels if it wishes to. GNU sed allows labels to be of any length.