Table of Contents for
sed & awk, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition sed & awk, 2nd Edition by Arnold Robbins Published by O'Reilly Media, Inc., 1997
  1. sed & awk, 2nd Edition
  2. Cover
  3. sed & awk, 2nd Edition
  4. A Note Regarding Supplemental Files
  5. Dedication
  6. Preface
  7. Scope of This Handbook
  8. Availability of sed and awk
  9. Obtaining Example Source Code
  10. Conventions Used in This Handbook
  11. About the Second Edition
  12. Acknowledgments from the First Edition
  13. Comments and Questions
  14. 1. Power Tools for Editing
  15. 1.1. May You Solve Interesting Problems
  16. 1.2. A Stream Editor
  17. 1.3. A Pattern-Matching Programming Language
  18. 1.4. Four Hurdles to Mastering sed and awk
  19. 2. Understanding Basic Operations
  20. 2.1. Awk, by Sed and Grep, out of Ed
  21. 2.2. Command-Line Syntax
  22. 2.3. Using sed
  23. 2.4. Using awk
  24. 2.5. Using sed and awk Together
  25. 3. Understanding Regular Expression Syntax
  26. 3.1. That’s an Expression
  27. 3.2. A Line-Up of Characters
  28. 3.3. I Never Metacharacter I Didn’t Like
  29. 4. Writing sed Scripts
  30. 4.1. Applying Commands in a Script
  31. 4.2. A Global Perspective on Addressing
  32. 4.3. Testing and Saving Output
  33. 4.4. Four Types of sed Scripts
  34. 4.5. Getting to the PromiSed Land
  35. 5. Basic sed Commands
  36. 5.1. About the Syntax of sed Commands
  37. 5.2. Comment
  38. 5.3. Substitution
  39. 5.4. Delete
  40. 5.5. Append, Insert, and Change
  41. 5.6. List
  42. 5.7. Transform
  43. 5.8. Print
  44. 5.9. Print Line Number
  45. 5.10. Next
  46. 5.11. Reading and Writing Files
  47. 5.12. Quit
  48. 6. Advanced sed Commands
  49. 6.1. Multiline Pattern Space
  50. 6.2. A Case for Study
  51. 6.3. Hold That Line
  52. 6.4. Advanced Flow Control Commands
  53. 6.5. To Join a Phrase
  54. 7. Writing Scripts for awk
  55. 7.1. Playing the Game
  56. 7.2. Hello, World
  57. 7.3. Awk’s Programming Model
  58. 7.4. Pattern Matching
  59. 7.5. Records and Fields
  60. 7.6. Expressions
  61. 7.7. System Variables
  62. 7.8. Relational and Boolean Operators
  63. 7.9. Formatted Printing
  64. 7.10. Passing Parameters Into a Script
  65. 7.11. Information Retrieval
  66. 8. Conditionals, Loops, and Arrays
  67. 8.1. Conditional Statements
  68. 8.2. Looping
  69. 8.3. Other Statements That Affect Flow Control
  70. 8.4. Arrays
  71. 8.5. An Acronym Processor
  72. 8.6. System Variables That Are Arrays
  73. 9. Functions
  74. 9.1. Arithmetic Functions
  75. 9.2. String Functions
  76. 9.3. Writing Your Own Functions
  77. 10. The Bottom Drawer
  78. 10.1. The getline Function
  79. 10.2. The close( ) Function
  80. 10.3. The system( ) Function
  81. 10.4. A Menu-Based Command Generator
  82. 10.5. Directing Output to Files and Pipes
  83. 10.6. Generating Columnar Reports
  84. 10.7. Debugging
  85. 10.8. Limitations
  86. 10.9. Invoking awk Using the #! Syntax
  87. 11. A Flock of awks
  88. 11.1. Original awk
  89. 11.2. Freely Available awks
  90. 11.3. Commercial awks
  91. 11.4. Epilogue
  92. 12. Full-Featured Applications
  93. 12.1. An Interactive Spelling Checker
  94. 12.2. Generating a Formatted Index
  95. 12.3. Spare Details of the masterindex Program
  96. 13. A Miscellany of Scripts
  97. 13.1. uutot.awk—Report UUCP Statistics
  98. 13.2. phonebill—Track Phone Usage
  99. 13.3. combine—Extract Multipart uuencoded Binaries
  100. 13.4. mailavg—Check Size of Mailboxes
  101. 13.5. adj—Adjust Lines for Text Files
  102. 13.6. readsource—Format Program Source Files for troff
  103. 13.7. gent—Get a termcap Entry
  104. 13.8. plpr—lpr Preprocessor
  105. 13.9. transpose—Perform a Matrix Transposition
  106. 13.10. m1—Simple Macro Processor
  107. A. Quick Reference for sed
  108. A.1. Command-Line Syntax
  109. A.2. Syntax of sed Commands
  110. A.3. Command Summary for sed
  111. B. Quick Reference for awk
  112. B.1. Command-Line Syntax
  113. B.2. Language Summary for awk
  114. B.3. Command Summary for awk
  115. C. Supplement for Chapter 12
  116. C.1. Full Listing of spellcheck.awk
  117. C.2. Listing of masterindex Shell Script
  118. C.3. Documentation for masterindex
  119. masterindex
  120. C.3.1. Background Details
  121. C.3.2. Coding Index Entries
  122. C.3.3. Output Format
  123. C.3.4. Compiling a Master Index
  124. Index
  125. About the Authors
  126. Colophon
  127. Copyright

Command Summary for awk

The following alphabetical list of statements and functions includes all that are available in POSIX awk, nawk, or gawk. See Chapter 11, for extensions available in different implementations.

atan2( )

atan2(y, x)

Returns the arctangent of y/x in radians.

break

Exit from a while, for, or do loop.

close( )

close(filename-expr)

close(command-expr)

In most implementations of awk, you can only have a limited number of files and/or pipes open simultaneously. Therefore, awk provides a close( ) function that allows you to close a file or a pipe. It takes as an argument the same expression that opened the pipe or file. This expression must be identical, character by character, to the one that opened the file or pipe—even whitespace is significant.

continue

Begin next iteration of while, for, or do loop.

cos( )

cos(x)

Return cosine of x in radians.

delete

delete array[element]

Delete element of an array.

do
do
    body
while (expr)

Looping statement. Execute statements in body then evaluate expr and if true, execute body again.

exit

exit [expr]

Exit from script, reading no new input. The END rule, if it exists, will be executed. An optional expr becomes awk’s return value.

exp( )

exp(x)

Return exponential of x (e ^ x).

for

for (init-expr; test-expr; incr-expr) statement

C-style looping construct. init-expr assigns the initial value of the counter variable. test-expr is a relational expression that is evaluated each time before executing the statement. When test-expr is false, the loop is exited. incr-expr is used to increment the counter variable after each pass.

for (item in array) statement

Special loop designed for reading associative arrays. For each element of the array, the statement is executed; the element can be referenced by array[item].

getline

Read next line of input.

getline [var] [<file]

command | getline [var]

The first form reads input from file and the second form reads the output of command. Both forms read one line at a time, and each time the statement is executed it gets the next line of input. The line of input is assigned to $0 and it is parsed into fields, setting NF, NR, and FNR. If var is specified, the result is assigned to var and the $0 is not changed. Thus, if the result is assigned to a variable, the current line does not change. getline is actually a function and it returns 1 if it reads a record successfully, 0 if end-of-line is encountered, and -1 if for some reason it is otherwise unsuccessful.

gsub( )

gsub(r, s, t)

Globally substitute s for each match of the regular expression r in the string t. Return the number of substitutions. If t is not supplied, defaults to $0.

if

if (expr) statement1

[ else statement2 ]

Conditional statement. Evaluate expr and, if true, execute statement1; if else clause is supplied, execute statement2 if expr is false.

index( )

index(str, substr)

Return position (starting at 1) of substring in string.

int( )

int(x)

Return integer value of x by truncating any digits following a decimal point.

length( )

length(str)

Return length of string, or the length of $0 if no argument.

log( )

log(x)

Return natural logarithm (base e) of x.

match( )

match(s, r)

Function that matches the pattern, specified by the regular expression r, in the string s and returns either the position in s where the match begins, or 0 if no occurrences are found. Sets the values of RSTART and RLENGTH to the start and length of the match, respectively.

next

Read next input line and begin executing script at first rule.

print

print [ output-expr ] [ dest-expr ]

Evaluate the output-expr and direct it to standard output followed by the value of ORS. Each output-expr is separated by the value of OFS. dest-expr is an optional expression that directs the output to a file or pipe. “> file" directs the output to a file, overwriting its previous contents. “>> file" appends the output to a file, preserving its previous contents. In both of these cases, the file will be created if it does not already exist. “| command" directs the output as the input to a system command.

printf

printf (format-expr [, expr-list ]) [ dest-expr ]

An alternative output statement borrowed from the C language. It has the ability to produce formatted output. It can also be used to output data without automatically producing a newline. format-expr is a string of format specifications and constants; see next section for a list of format specifiers. expr-list is a list of arguments corresponding to format specifiers. See the print statement for a description of dest-expr.

rand( )

rand( )

Generate a random number between 0 and 1. This function returns the same series of numbers each time the script is executed, unless the random number generator is seeded using the srand( ) function.

return

return [expr]

Used at end of user-defined functions to exit function, returning value of expression.

sin( )

sin(x)

Return sine of x in radians.

split( )

split(str, array, sep)

Function that parses string into elements of array using field separator, returning number of elements in array. Value of FS is used if no field separator is specified. Array splitting works the same as field splitting.

sprintf( )

sprintf (format-expr [, expr-list ] )

Function that returns string formatted according to printf format specification. It formats data but does not output it. format-expr is a string of format specifications and constants; see the next section for a list of format specifiers. expr-list is a list of arguments corresponding to format specifiers.

sqrt( )

sqrt(x)

Return square root of x.

srand( )

srand(expr)

Use expr to set a new seed for random number generator. Default is time of day. Return value is the old seed.

sub( )

sub(r, s, t)

Substitute s for first match of the regular expression r in the string t. Return 1 if successful; 0 otherwise. If t is not supplied, defaults to $0.

substr( )

substr(str, beg, len)

Return substring of string str at beginning position beg, and the characters that follow to maximum specified length len. If no length is given, use the rest of the string.

system( )

system(command)

Function that executes the specified command and returns its status. The status of the executed command typically indicates success or failure. A value of 0 means that the command executed successfully. A non-zero value, whether positive or negative, indicates a failure of some sort. The documentation for the command you’re running will give you the details. The output of the command is not available for processing within the awk script. Use "command | getline" to read the output of a command into the script.

tolower( )

tolower(str)

Translate all uppercase characters in str to lowercase and return the new string.[3]

toupper( )

toupper(str)

Translate all lowercase characters in str to uppercase and return the new string.

while

while (expr) statement

Looping construct. While expr is true, execute statement.

Format Expressions Used in printf and sprintf

A format expression can take three optional modifiers following “%” and preceding the format specifier:

%-width.precision format-specifier

The width of the output field is a numeric value. When you specify a field width, the contents of the field will be right-justified by default. You must specify “-” to get left-justification. Thus, “%-20s” outputs a string left-justified in a field 20 characters wide. If the string is less than 20 characters, the field will be padded with spaces to fill.

The precision modifier, used for decimal or floating-point values, controls the number of digits that appear to the right of the decimal point. For string formats, it controls the number of characters from the string to print.

You can specify both the width and precision dynamically, via values in the printf or sprintf argument list. You do this by specifying asterisks, instead of specifying literal values.

printf("%*.*g\n", 5, 3, myvar);

In this example, the width is 5, the precision is 3, and the value to print will come from myvar. Older versions of nawk may not support this.

Note that the default precision for the output of numeric values is “%.6g.” The default can be changed by setting the system variable OFMT. This affects the precision used by the print statement when outputting numbers. For instance, if you are using awk to write reports that contain dollar values, you might prefer to change OFMT to “%.2f.”

The format specifiers, shown in Table B.6, are used with printf and sprintf statements.

Table B.6. Format Specifiers Used in printf
CharacterDescription
cASCII character.
dDecimal integer.
iDecimal integer. Added in POSIX.
e

Floating-point format ([-]d.precisione[+-]dd).

E

Floating-point format ([-]d.precisionE[+-]dd).

f

Floating-point format ([-]ddd.precision).

g

e or f conversion, whichever is shortest, with trailing zeros removed.

G

E or f conversion, whichever is shortest, with trailing zeros removed.

oUnsigned octal value.
sString.
x

Unsigned hexadecimal number. Uses a-f for 10 to 15.

X

Unsigned hexadecimal number. Uses A-F for 10 to 15.

%Literal %.

Often, whatever format specifiers are available in the system’s sprintf(3) subroutine are available in awk.

The way printf and sprintf( ) do rounding will often depend upon the system’s C sprintf(3) subroutine. On many machines, sprintf rounding is “unbiased,” which means it doesn’t always round a trailing “.5” up, contrary to naive expectations. In unbiased rounding, “.5” rounds to even, rather than always up, so 1.5 rounds to 2 but 4.5 rounds to 4. The result is that if you are using a format that does rounding (e.g., “%.0f”) you should check what your system does. The following function does traditional rounding; it might be useful if your awk’s printf does unbiased rounding.

# round --- do normal rounding
#	Arnold Robbins, arnold@gnu.ai.mit.edu
#	Public Domain
function round(x,       ival, aval, fraction)
{
        ival = int(x)	# integer part, int( ) truncates
	# see if fractional part
	if (ival == x)	# no fraction
		return x
	if (x < 0) {
		aval = -x	# absolute value
		ival = int(aval)
		fraction = aval - ival
		if (fraction >= .5)
			return int(x) - 1		# -2.5 --> -3
		else
			return int(x)		# -2.3 --> -2
	} else {
		fraction = x - ival
		if (fraction >= .5)
			return ival + 1
		else
			return ival
	}
}


[3] Very early versions of nawk, such as that in SunOS 4.1.x, don’t support tolower( ) and toupper( ). However, they are now part of the POSIX specification for awk.