Table of Contents for
sed & awk, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition sed & awk, 2nd Edition by Arnold Robbins Published by O'Reilly Media, Inc., 1997
  1. sed & awk, 2nd Edition
  2. Cover
  3. sed & awk, 2nd Edition
  4. A Note Regarding Supplemental Files
  5. Dedication
  6. Preface
  7. Scope of This Handbook
  8. Availability of sed and awk
  9. Obtaining Example Source Code
  10. Conventions Used in This Handbook
  11. About the Second Edition
  12. Acknowledgments from the First Edition
  13. Comments and Questions
  14. 1. Power Tools for Editing
  15. 1.1. May You Solve Interesting Problems
  16. 1.2. A Stream Editor
  17. 1.3. A Pattern-Matching Programming Language
  18. 1.4. Four Hurdles to Mastering sed and awk
  19. 2. Understanding Basic Operations
  20. 2.1. Awk, by Sed and Grep, out of Ed
  21. 2.2. Command-Line Syntax
  22. 2.3. Using sed
  23. 2.4. Using awk
  24. 2.5. Using sed and awk Together
  25. 3. Understanding Regular Expression Syntax
  26. 3.1. That’s an Expression
  27. 3.2. A Line-Up of Characters
  28. 3.3. I Never Metacharacter I Didn’t Like
  29. 4. Writing sed Scripts
  30. 4.1. Applying Commands in a Script
  31. 4.2. A Global Perspective on Addressing
  32. 4.3. Testing and Saving Output
  33. 4.4. Four Types of sed Scripts
  34. 4.5. Getting to the PromiSed Land
  35. 5. Basic sed Commands
  36. 5.1. About the Syntax of sed Commands
  37. 5.2. Comment
  38. 5.3. Substitution
  39. 5.4. Delete
  40. 5.5. Append, Insert, and Change
  41. 5.6. List
  42. 5.7. Transform
  43. 5.8. Print
  44. 5.9. Print Line Number
  45. 5.10. Next
  46. 5.11. Reading and Writing Files
  47. 5.12. Quit
  48. 6. Advanced sed Commands
  49. 6.1. Multiline Pattern Space
  50. 6.2. A Case for Study
  51. 6.3. Hold That Line
  52. 6.4. Advanced Flow Control Commands
  53. 6.5. To Join a Phrase
  54. 7. Writing Scripts for awk
  55. 7.1. Playing the Game
  56. 7.2. Hello, World
  57. 7.3. Awk’s Programming Model
  58. 7.4. Pattern Matching
  59. 7.5. Records and Fields
  60. 7.6. Expressions
  61. 7.7. System Variables
  62. 7.8. Relational and Boolean Operators
  63. 7.9. Formatted Printing
  64. 7.10. Passing Parameters Into a Script
  65. 7.11. Information Retrieval
  66. 8. Conditionals, Loops, and Arrays
  67. 8.1. Conditional Statements
  68. 8.2. Looping
  69. 8.3. Other Statements That Affect Flow Control
  70. 8.4. Arrays
  71. 8.5. An Acronym Processor
  72. 8.6. System Variables That Are Arrays
  73. 9. Functions
  74. 9.1. Arithmetic Functions
  75. 9.2. String Functions
  76. 9.3. Writing Your Own Functions
  77. 10. The Bottom Drawer
  78. 10.1. The getline Function
  79. 10.2. The close( ) Function
  80. 10.3. The system( ) Function
  81. 10.4. A Menu-Based Command Generator
  82. 10.5. Directing Output to Files and Pipes
  83. 10.6. Generating Columnar Reports
  84. 10.7. Debugging
  85. 10.8. Limitations
  86. 10.9. Invoking awk Using the #! Syntax
  87. 11. A Flock of awks
  88. 11.1. Original awk
  89. 11.2. Freely Available awks
  90. 11.3. Commercial awks
  91. 11.4. Epilogue
  92. 12. Full-Featured Applications
  93. 12.1. An Interactive Spelling Checker
  94. 12.2. Generating a Formatted Index
  95. 12.3. Spare Details of the masterindex Program
  96. 13. A Miscellany of Scripts
  97. 13.1. uutot.awk—Report UUCP Statistics
  98. 13.2. phonebill—Track Phone Usage
  99. 13.3. combine—Extract Multipart uuencoded Binaries
  100. 13.4. mailavg—Check Size of Mailboxes
  101. 13.5. adj—Adjust Lines for Text Files
  102. 13.6. readsource—Format Program Source Files for troff
  103. 13.7. gent—Get a termcap Entry
  104. 13.8. plpr—lpr Preprocessor
  105. 13.9. transpose—Perform a Matrix Transposition
  106. 13.10. m1—Simple Macro Processor
  107. A. Quick Reference for sed
  108. A.1. Command-Line Syntax
  109. A.2. Syntax of sed Commands
  110. A.3. Command Summary for sed
  111. B. Quick Reference for awk
  112. B.1. Command-Line Syntax
  113. B.2. Language Summary for awk
  114. B.3. Command Summary for awk
  115. C. Supplement for Chapter 12
  116. C.1. Full Listing of spellcheck.awk
  117. C.2. Listing of masterindex Shell Script
  118. C.3. Documentation for masterindex
  119. masterindex
  120. C.3.1. Background Details
  121. C.3.2. Coding Index Entries
  122. C.3.3. Output Format
  123. C.3.4. Compiling a Master Index
  124. Index
  125. About the Authors
  126. Colophon
  127. Copyright

Expressions

The use of expressions in which you can store, manipulate, and retrieve data is quite different from anything you can do in sed, yet it is a common feature of most programming languages.

An expression is evaluated and returns a value. An expression consists of any combination of numeric and string constants, variables, operators, functions, and regular expressions. We covered regular expressions in detail in Chapter 2, and they are summarized in Appendix B. Functions will be discussed fully in Chapter 9. In this section, we will look at expressions consisting of constants, variables, and operators.

There are two types of constants: string or numeric (“red” or 1). A string must be quoted in an expression. Strings can make use of the escape sequences listed in Table 7.1.

Table 7.1. Escape Sequences
SequenceDescription
\aAlert character, usually ASCII BEL character
\bBackspace
\fFormfeed
\nNewline
\rCarriage return
\tHorizontal tab
\vVertical tab
\dddCharacter represented as 1 to 3 digit octal value
\xhex Character represented as hexadecimal value[4]
\cAny literal character c (e.g., \” for ")[5]

[4] POSIX does not provide “\x”, but it is commonly available.

[5] Like ANSI C, POSIX leaves purposely undefined what you get when you put a backslash before any character not listed in the table. In most awks, you just get that character.

A variable is an identifier that references a value. To define a variable, you only have to name it and assign it a value. The name can only contain letters, digits, and underscores, and may not start with a digit. Case distinctions in variable names are important: Salary and salary are two different variables. Variables are not declared; you do not have to tell awk what type of value will be stored in a variable. Each variable has a string value and a numeric value, and awk uses the appropriate value based on the context of the expression. (Strings that do not consist of numbers have a numeric value of 0.) Variables do not have to be initialized; awk automatically initializes them to the empty string, which acts like 0 if used as a number. The following expression assigns a value to x:

x = 1

x is the name of the variable, = is an assignment operator, and 1 is a numeric constant.

The following expression assigns the string “Hello” to the variable z:

z = "Hello"

A space is the string concatenation operator. The expression:

z = "Hello" "World"

concatenates the two strings and assigns “HelloWorld” to the variable z.

The dollar sign ($) operator is used to reference fields. The following expression assigns the value of the first field of the current input record to the variable w:

w = $1

A variety of operators can be used in expressions. Arithmetic operators are listed in Table 7.2.

Table 7.2. Arithmetic Operators
OperatorDescription
+Addition
-Subtraction
*Multiplication
/Division
%Modulo
^Exponentiation
**Exponentiation[6]

[6] This is a common extension. It is not in the POSIX standard, and often not in the system documentation, either. Its use is thus nonportable.

Once a variable has been assigned a value, that value can be referenced using the name of the variable. The following expression adds 1 to the value of x and assigns it to the variable y:

y = x + 1

So, evaluate x, add 1 to it, and put the result into the variable y. The statement:

print y

prints the value of y. If the following sequence of statements appears in a script:

x = 1
y = x + 1
print y

then the value of y is 2.

We could reduce these three statements to two:

x = 1
print x + 1

Notice, however, that after the print statement the value of x is still 1. We didn’t change the value of x; we simply added 1 to it and printed that value. In other words, if a third statement print x followed, it would output 1. If, in fact, we wished to accumulate the value in x, we could use an assignment operator +=. This operator combines two operations; it adds 1 to x and assigns the new value to x. Table 7.3 lists the assignment operators used in awk expressions.

Table 7.3. Assignment Operators
OperatorDescription
++Add 1 to variable.
--Subtract 1 from variable.
+=Assign result of addition.
-=Assign result of subtraction.
*=Assign result of multiplication.
/=Assign result of division.
%=Assign result of modulo.
^=Assign result of exponentiation.
**= Assign result of exponentiation.[7]

[7] As with “**”, this is a common extension, which is also nonportable.

Look at the following example, which counts each blank line in a file.

# Count blank lines.
/^$/ { 
	print x += 1 
     }

Although we didn’t initialize the value of x, we can safely assume that its value is 0 up until the first blank line is encountered. The expression “x += 1” is evaluated each time a blank line is matched and the value of x is incremented by 1. The print statement prints the value returned by the expression. Because we execute the print statement for every blank line, we get a running count of blank lines.

There are different ways to write expressions, some more terse than others. The expression “x += 1” is more concise than the following equivalent expression:

x = x + 1

But neither of these expressions is as terse as the following expression:

++x

“++” is the increment operator. (“--” is the decrement operator.) Each time the expression is evaluated the value of the variable is incremented by one. The increment and decrement operators can appear on either side of the operand, as prefix or postfix operators. The position has a different effect.

++x	Increment x before returning value (prefix)
x++	Increment x after returning value (postfix)

For instance, if our example was written:

/^$/ { 
	print x++
     }

When the first blank line is matched, the expression returns the value “0”; the second blank line returns “1”, and so on. If we put the increment operator before x, then the first time the expression is evaluated, it will return “1.”

Let’s implement that expression in our example. In addition, instead of printing a count each time a blank line is matched, we’ll accumulate the count as the value of x and print only the total number of blank lines. The END pattern is the place to put the print that displays the value of x after the last input line is read.

# Count blank lines.
/^$/ { 
	++x
}
END {
	print x
}

Let’s try it on the sample file that has three blank lines in it.

$ awk -f awkscr test
3

The script outputs the number of blank lines.

Averaging Student Grades

Let’s look at another example, one in which we sum a series of student grades and then calculate the average. Here’s what the input file looks like:

john 85 92 78 94 88
andrea 89 90 75 90 86
jasper 84 88 80 92 84

There are five grades following the student’s name. Here is the script that will give us each student’s average:

# average five grades 
{ total = $2 + $3 + $4 + $5 + $6
  avg = total / 5
  print $1, avg }

This script adds together fields 2 through 6 to get the sum total of the five grades. The value of total is divided by 5 and assigned to the variable avg. (“/” is the operator for division.) The print statement outputs the student’s name and average. Note that we could have skipped the assignment of avg and instead calculated the average as part of the print statement, as follows:

print $1, total / 5

This script shows how easy it is to write programs in awk. Awk parses the input into fields and records. You are spared having to read individual characters and declaring data types. Awk does this for you, automatically.

Let’s see a sample run of the script that calculates student averages:

$ awk -f grades.awk grades
john 87.4
andrea 86
jasper 85.6