Table of Contents for
sed & awk, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition sed & awk, 2nd Edition by Arnold Robbins Published by O'Reilly Media, Inc., 1997
  1. sed & awk, 2nd Edition
  2. Cover
  3. sed & awk, 2nd Edition
  4. A Note Regarding Supplemental Files
  5. Dedication
  6. Preface
  7. Scope of This Handbook
  8. Availability of sed and awk
  9. Obtaining Example Source Code
  10. Conventions Used in This Handbook
  11. About the Second Edition
  12. Acknowledgments from the First Edition
  13. Comments and Questions
  14. 1. Power Tools for Editing
  15. 1.1. May You Solve Interesting Problems
  16. 1.2. A Stream Editor
  17. 1.3. A Pattern-Matching Programming Language
  18. 1.4. Four Hurdles to Mastering sed and awk
  19. 2. Understanding Basic Operations
  20. 2.1. Awk, by Sed and Grep, out of Ed
  21. 2.2. Command-Line Syntax
  22. 2.3. Using sed
  23. 2.4. Using awk
  24. 2.5. Using sed and awk Together
  25. 3. Understanding Regular Expression Syntax
  26. 3.1. That’s an Expression
  27. 3.2. A Line-Up of Characters
  28. 3.3. I Never Metacharacter I Didn’t Like
  29. 4. Writing sed Scripts
  30. 4.1. Applying Commands in a Script
  31. 4.2. A Global Perspective on Addressing
  32. 4.3. Testing and Saving Output
  33. 4.4. Four Types of sed Scripts
  34. 4.5. Getting to the PromiSed Land
  35. 5. Basic sed Commands
  36. 5.1. About the Syntax of sed Commands
  37. 5.2. Comment
  38. 5.3. Substitution
  39. 5.4. Delete
  40. 5.5. Append, Insert, and Change
  41. 5.6. List
  42. 5.7. Transform
  43. 5.8. Print
  44. 5.9. Print Line Number
  45. 5.10. Next
  46. 5.11. Reading and Writing Files
  47. 5.12. Quit
  48. 6. Advanced sed Commands
  49. 6.1. Multiline Pattern Space
  50. 6.2. A Case for Study
  51. 6.3. Hold That Line
  52. 6.4. Advanced Flow Control Commands
  53. 6.5. To Join a Phrase
  54. 7. Writing Scripts for awk
  55. 7.1. Playing the Game
  56. 7.2. Hello, World
  57. 7.3. Awk’s Programming Model
  58. 7.4. Pattern Matching
  59. 7.5. Records and Fields
  60. 7.6. Expressions
  61. 7.7. System Variables
  62. 7.8. Relational and Boolean Operators
  63. 7.9. Formatted Printing
  64. 7.10. Passing Parameters Into a Script
  65. 7.11. Information Retrieval
  66. 8. Conditionals, Loops, and Arrays
  67. 8.1. Conditional Statements
  68. 8.2. Looping
  69. 8.3. Other Statements That Affect Flow Control
  70. 8.4. Arrays
  71. 8.5. An Acronym Processor
  72. 8.6. System Variables That Are Arrays
  73. 9. Functions
  74. 9.1. Arithmetic Functions
  75. 9.2. String Functions
  76. 9.3. Writing Your Own Functions
  77. 10. The Bottom Drawer
  78. 10.1. The getline Function
  79. 10.2. The close( ) Function
  80. 10.3. The system( ) Function
  81. 10.4. A Menu-Based Command Generator
  82. 10.5. Directing Output to Files and Pipes
  83. 10.6. Generating Columnar Reports
  84. 10.7. Debugging
  85. 10.8. Limitations
  86. 10.9. Invoking awk Using the #! Syntax
  87. 11. A Flock of awks
  88. 11.1. Original awk
  89. 11.2. Freely Available awks
  90. 11.3. Commercial awks
  91. 11.4. Epilogue
  92. 12. Full-Featured Applications
  93. 12.1. An Interactive Spelling Checker
  94. 12.2. Generating a Formatted Index
  95. 12.3. Spare Details of the masterindex Program
  96. 13. A Miscellany of Scripts
  97. 13.1. uutot.awk—Report UUCP Statistics
  98. 13.2. phonebill—Track Phone Usage
  99. 13.3. combine—Extract Multipart uuencoded Binaries
  100. 13.4. mailavg—Check Size of Mailboxes
  101. 13.5. adj—Adjust Lines for Text Files
  102. 13.6. readsource—Format Program Source Files for troff
  103. 13.7. gent—Get a termcap Entry
  104. 13.8. plpr—lpr Preprocessor
  105. 13.9. transpose—Perform a Matrix Transposition
  106. 13.10. m1—Simple Macro Processor
  107. A. Quick Reference for sed
  108. A.1. Command-Line Syntax
  109. A.2. Syntax of sed Commands
  110. A.3. Command Summary for sed
  111. B. Quick Reference for awk
  112. B.1. Command-Line Syntax
  113. B.2. Language Summary for awk
  114. B.3. Command Summary for awk
  115. C. Supplement for Chapter 12
  116. C.1. Full Listing of spellcheck.awk
  117. C.2. Listing of masterindex Shell Script
  118. C.3. Documentation for masterindex
  119. masterindex
  120. C.3.1. Background Details
  121. C.3.2. Coding Index Entries
  122. C.3.3. Output Format
  123. C.3.4. Compiling a Master Index
  124. Index
  125. About the Authors
  126. Colophon
  127. Copyright

Arithmetic Functions

Nine of the built-in functions can be classified as arithmetic functions. Most of them take a numeric argument and return a numeric value. Table 9.1 summarizes these arithmetic functions.

Table 9.1. awk’s Built-In Arithmetic Functions
Awk FunctionDescription
cos(x)Returns cosine of x (x is in radians).
exp(x)Returns e to the power x.
int(x)Returns truncated value of x.
log(x)Returns natural logarithm (base-e) of x.
sin(x)Returns sine of x (x is in radians).
sqrt(x)Returns square root of x.
atan2(y,x)Returns arctangent of y/x in the range -π to π.
rand( )Returns pseudo-random number r, where 0 <= r < 1.
srand(x)

Establishes new seed for rand( ). If no seed is specified, uses time of day. Returns the old seed.

Trigonometric Functions

The trigonometric functions cos( ) and sin( ) work the same way, taking a single argument that is the size of an angle in radians and returning the cosine or sine for that angle. (To convert from degrees to radians, multiply the number by π/180.) The trigonometric function atan2( ) takes two arguments and returns the arctangent of their quotient. The expression

atan2(0, -1)

produces π.

The function exp( ) uses the natural exponential, which is also known as base-e exponentiation. The expression

exp(1)

returns the natural number 2.71828, the base of the natural logarithms, referred to as e. Thus, exp(x) is e to the x-th power.

The log( ) function gives the inverse of the exp( ) function, the natural logarithm of x. The sqrt( ) function takes a single argument and returns the (positive) square root of that argument.

Integer Function

The int( ) function truncates a numeric value by removing digits to the right of the decimal point. Look at the following two statements:

print 100/3
print int(100/3)

The output from these statements is shown below:

33.3333
33

The int( ) function simply truncates; it does not round up or down. (Use the printf format “%.0f” to perform rounding.)[1]

Random Number Generation

The rand( ) function generates a pseudo-random floating-point number between 0 and 1. The srand( ) function sets the seed or starting point for random number generation. If srand( ) is called without an argument, it uses the time of day to generate the seed. With an argument x, srand( ) uses x as the seed.

If you don’t call srand( ) at all, awk acts as if srand( ) had been called with a constant argument before your program started, causing you to get the same starting point every time you run your program. This is useful if you want reproducible behavior for testing, but inappropriate if you really do want your program to behave differently every time. Look at the following script:

# rand.awk -- test random number generation
BEGIN {
	print rand( )
	print rand( )
	srand( )
	print rand( )
	print rand( )
}

We print the result of the rand( ) function twice, and then call the srand( ) function before printing the result of the rand( ) function two more times. Let’s run the script.

$ awk -f rand.awk
0.513871
0.175726
0.760277
0.263863

Four random numbers are generated. Now look what happens when we run the program again:

$ awk -f rand.awk
0.513871
0.175726
0.787988
0.305033

The first two “random” numbers are identical to the numbers generated in the previous run of the program while the last two numbers are different. The last two numbers are different because we provided the rand( ) function with a new seed.

The return value of the srand( ) function is the seed it was using. This can be used to keep track of sequences of random numbers, and re-run them if needed.

Pick ‘em

To show how to use rand( ), we’ll look at a script that implements a “quick-pick” for a lottery game. This script, named lotto, picks x numbers from a series of numbers 1 to y. Two arguments can be supplied on the command line: how many numbers to pick (the default is 6) and the highest number in the series (the default is 30). Using the default values for x and y, the script generates six unique random numbers between 1 and 30. The numbers are sorted for readability from lowest to highest and output. Before looking at the script itself, let’s run the program:

$ lotto
Pick 6 of 30
9 13 25 28 29 30
$ lotto 7 35
Pick 7 of 35
1 6 9 16 20 22 27

The first example uses the default values to print six random numbers from 1 to 30. The second example prints seven random numbers out of 35.

The full lotto script is fairly complicated, so before looking at the entire script, let’s look at a smaller script that generates a single random number in a series:

awk -v TOPNUM=$1 '
# pick1 - pick one random number out of y 
# main routine
BEGIN {
# seed random number using time of day 
	srand( ) 
# get a random number
	select = 1 + int(rand( ) * TOPNUM)
# print pick
	print select
}'

The shell script expects a single argument from the command line and this is passed into the program as “TOPNUM=$1,” using the -v option. All the action happens in the BEGIN procedure. Since there are no other statements in the program, awk exits when the BEGIN procedure is done.

The main routine first calls the srand( ) function to seed the random number generator. Then we get a random number by calling the rand( ) function:

select = 1 + int(rand( ) * TOPNUM)

It might be helpful to see this expression broken up so each part of it is obvious.

StatementResult
print r = rand( )0.467315
print r * TOPNUM14.0195
print int(r * TOPNUM)14
print 1 + int(r * TOPNUM)15

Because the rand( ) function returns a number between 0 and 1, we multiply it by TOPNUM to get a number between 0 and TOPNUM. We then truncate the number to remove the fractional values and then add 1 to the number. The latter is necessary because rand( ) could return 0. In this example, the random number that is generated is 15. You could use this program to print any single number, such as picking a number between 1 and 100.

$ pick1 100
83

The lotto script must “pick one” multiple times. Basically, we need to set up a for loop to execute the rand( ) function as many times as needed. One of the reasons this is difficult is that we have to worry about duplicates. In other words, it is possible for a number to be picked again; therefore we have to keep track of the numbers already picked.

Here’s the lotto script:

awk -v NUM=$1 -v TOPNUM=$2 '
# lotto - pick x random numbers out of y 
# main routine
BEGIN {
# test command line args; NUM = $1, how many numbers to pick 
# 	              TOPNUM = $2, last number in series
	if (NUM <= 0) 
		NUM = 6
	if (TOPNUM <= 0) 
		TOPNUM = 30
# print "Pick x of y"
	printf("Pick %d of %d\n", NUM, TOPNUM) 
# seed random number using time and date; do this once
	srand( ) 
# loop until we have NUM selections
	for (j = 1; j <= NUM; ++j) {
		# loop to find a not-yet-seen selection
		do {
			select = 1 + int(rand( ) * TOPNUM)
		} while (select in pick)
		pick[select] = select
	}
# loop through array and print picks.
	for (j in pick) 
		printf("%s ", pick[j])
	printf("\n")
}'

Unlike the previous program, this one looks for two command-line arguments, indicating x numbers out of y. The main routine looks to see if these numbers were supplied and if not, assigns default values.

There is only one array, pick, for holding the random numbers that are selected. Each number is guaranteed to be in the desired range, because the result of rand( ) (a value between 0 and 1) is multiplied by TOPNUM and then truncated. The heart of the script is a loop that occurs NUM times to assign NUM elements to the pick array.

To get a new non-duplicate random number, we use an inner loop that generates selections and tests to see if they are in the pick array. (Using the in operator is much faster than looping through the array comparing subscripts.) While (select in pick), the corresponding element has been found already, so the selection is a duplicate and we reject the selection. If it is not true that select in pick, then we assign select to an element of the pick array. This will make future in tests return true, causing the do loop to continue.

Finally, the program loops through the pick array and prints the elements. This version of the lotto script leaves one thing out. See if you can tell what it is if we run it again:

$ lotto 7 35
Pick 7 of 35
5 21 9 30 29 20 2

That’s right, the numbers are not sorted. We’ll defer showing the code for the sort routine until we discuss user-defined functions. While it’s not necessary to have written the sorting code as a function, it makes a lot of sense. One reason is that you can tackle a more generalized problem and retain the solution for use in other programs. Later on, we will write a function that sorts the elements of an array.

Note that the pick array isn’t ready for sorting, since its indices are the same as its values, not numbers in order. We would have to set up a separate array for sorting by our sort function:

# create a numerically indexed array for sorting
i = 1
for (j in pick)
	sortedpick[i++] = pick[j]

The lotto program is set up to do everything in the BEGIN block. No input is processed. You could, however, revise this script to read a list of names from a file and for each name generate a “quick-pick.”



[1] The way printf does rounding is discussed in Appendix B.