Table of Contents for
sed & awk, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition sed & awk, 2nd Edition by Arnold Robbins Published by O'Reilly Media, Inc., 1997
  1. sed & awk, 2nd Edition
  2. Cover
  3. sed & awk, 2nd Edition
  4. A Note Regarding Supplemental Files
  5. Dedication
  6. Preface
  7. Scope of This Handbook
  8. Availability of sed and awk
  9. Obtaining Example Source Code
  10. Conventions Used in This Handbook
  11. About the Second Edition
  12. Acknowledgments from the First Edition
  13. Comments and Questions
  14. 1. Power Tools for Editing
  15. 1.1. May You Solve Interesting Problems
  16. 1.2. A Stream Editor
  17. 1.3. A Pattern-Matching Programming Language
  18. 1.4. Four Hurdles to Mastering sed and awk
  19. 2. Understanding Basic Operations
  20. 2.1. Awk, by Sed and Grep, out of Ed
  21. 2.2. Command-Line Syntax
  22. 2.3. Using sed
  23. 2.4. Using awk
  24. 2.5. Using sed and awk Together
  25. 3. Understanding Regular Expression Syntax
  26. 3.1. That’s an Expression
  27. 3.2. A Line-Up of Characters
  28. 3.3. I Never Metacharacter I Didn’t Like
  29. 4. Writing sed Scripts
  30. 4.1. Applying Commands in a Script
  31. 4.2. A Global Perspective on Addressing
  32. 4.3. Testing and Saving Output
  33. 4.4. Four Types of sed Scripts
  34. 4.5. Getting to the PromiSed Land
  35. 5. Basic sed Commands
  36. 5.1. About the Syntax of sed Commands
  37. 5.2. Comment
  38. 5.3. Substitution
  39. 5.4. Delete
  40. 5.5. Append, Insert, and Change
  41. 5.6. List
  42. 5.7. Transform
  43. 5.8. Print
  44. 5.9. Print Line Number
  45. 5.10. Next
  46. 5.11. Reading and Writing Files
  47. 5.12. Quit
  48. 6. Advanced sed Commands
  49. 6.1. Multiline Pattern Space
  50. 6.2. A Case for Study
  51. 6.3. Hold That Line
  52. 6.4. Advanced Flow Control Commands
  53. 6.5. To Join a Phrase
  54. 7. Writing Scripts for awk
  55. 7.1. Playing the Game
  56. 7.2. Hello, World
  57. 7.3. Awk’s Programming Model
  58. 7.4. Pattern Matching
  59. 7.5. Records and Fields
  60. 7.6. Expressions
  61. 7.7. System Variables
  62. 7.8. Relational and Boolean Operators
  63. 7.9. Formatted Printing
  64. 7.10. Passing Parameters Into a Script
  65. 7.11. Information Retrieval
  66. 8. Conditionals, Loops, and Arrays
  67. 8.1. Conditional Statements
  68. 8.2. Looping
  69. 8.3. Other Statements That Affect Flow Control
  70. 8.4. Arrays
  71. 8.5. An Acronym Processor
  72. 8.6. System Variables That Are Arrays
  73. 9. Functions
  74. 9.1. Arithmetic Functions
  75. 9.2. String Functions
  76. 9.3. Writing Your Own Functions
  77. 10. The Bottom Drawer
  78. 10.1. The getline Function
  79. 10.2. The close( ) Function
  80. 10.3. The system( ) Function
  81. 10.4. A Menu-Based Command Generator
  82. 10.5. Directing Output to Files and Pipes
  83. 10.6. Generating Columnar Reports
  84. 10.7. Debugging
  85. 10.8. Limitations
  86. 10.9. Invoking awk Using the #! Syntax
  87. 11. A Flock of awks
  88. 11.1. Original awk
  89. 11.2. Freely Available awks
  90. 11.3. Commercial awks
  91. 11.4. Epilogue
  92. 12. Full-Featured Applications
  93. 12.1. An Interactive Spelling Checker
  94. 12.2. Generating a Formatted Index
  95. 12.3. Spare Details of the masterindex Program
  96. 13. A Miscellany of Scripts
  97. 13.1. uutot.awk—Report UUCP Statistics
  98. 13.2. phonebill—Track Phone Usage
  99. 13.3. combine—Extract Multipart uuencoded Binaries
  100. 13.4. mailavg—Check Size of Mailboxes
  101. 13.5. adj—Adjust Lines for Text Files
  102. 13.6. readsource—Format Program Source Files for troff
  103. 13.7. gent—Get a termcap Entry
  104. 13.8. plpr—lpr Preprocessor
  105. 13.9. transpose—Perform a Matrix Transposition
  106. 13.10. m1—Simple Macro Processor
  107. A. Quick Reference for sed
  108. A.1. Command-Line Syntax
  109. A.2. Syntax of sed Commands
  110. A.3. Command Summary for sed
  111. B. Quick Reference for awk
  112. B.1. Command-Line Syntax
  113. B.2. Language Summary for awk
  114. B.3. Command Summary for awk
  115. C. Supplement for Chapter 12
  116. C.1. Full Listing of spellcheck.awk
  117. C.2. Listing of masterindex Shell Script
  118. C.3. Documentation for masterindex
  119. masterindex
  120. C.3.1. Background Details
  121. C.3.2. Coding Index Entries
  122. C.3.3. Output Format
  123. C.3.4. Compiling a Master Index
  124. Index
  125. About the Authors
  126. Colophon
  127. Copyright

The getline Function

The getline function is used to read another line of input. Not only can getline read from the regular input data stream, it can also handle input from files and pipes.

The getline function is similar to awk’s next statement. While both cause the next input line to be read, the next statement passes control back to the top of the script. The getline function gets the next line without changing control in the script. Possible return values are:

1

If it was able to read a line.

0

If it encounters the end-of-file.

-1

If it encounters an error.

Note

Although getline is called a function and it does return a value, its syntax resembles a statement. Do not write getline( ); its syntax does not permit parentheses.

In the previous chapter, we used a manual page source file as an example. The -man macros typically place the text argument on the next line. Although the macro is the pattern that you use to find the line, it is actually the next line that you process. For instance, to extract the name of the command from the manpage, the following example matches the heading “Name,” reads the next line, and prints the first field of it:

# getline.awk -- test getline function
/^\.SH "?Name"?/ { 
	getline # get next line
	print $1 # print $1 of new line.
}

The pattern matches any line with “.SH” followed by “Name,” which might be enclosed in quotes. Once this line is matched, we use getline to read the next input line. When the new line is read, getline assigns it $0 and parses it into fields. The system variables NF, NR, and FNR are also set. Thus, the new line becomes the current line, and we are able to refer to “$1” and retrieve the first field. Note that the previous line is no longer available as $0. However, if necessary, you can assign the line read by getline to a variable and avoid changing $0, as we’ll see shortly.

Here’s an example that shows how the previous script works, printing out the first field of the line following “.SH Name.”

$ awk -f getline.awk test
XSubImage

The sorter.awk program that we demonstrated at the end of Chapter 9, could have used getline to read all the lines after the heading “Related Commands.” We can test the return value of getline in a while loop to read a number of lines from the input. The following procedure replaces the first two procedures in the sorter program:

# Match "Related Commands" and collect them
/^\.SH "?Related Commands"?/ {
	print
	while (getline > 0)
		commandList = commandList $0
}

The expression “getline > 0” will be true as long as getline successfully reads an input line. When it gets to the end-of-file, getline returns 0 and the loop is exited.

Reading Input from Files

Besides reading from the regular input stream, the getline function allows you to read input from a file or a pipe. For instance, the following statement reads the next line from the file data:

getline < "data"

Although the filename can be supplied through a variable, it is typically specified as a string constant, which must be enclosed in quotes. The symbol “<” is the same as the shell’s input redirection symbol and will not be interpreted as the “less than” symbol. We can use a while loop to read all the lines from a file, testing for an end-of-file to exit the loop. The following example opens the file data and prints all of its lines:

while ( (getline < "data") > 0 )
	print

(We parenthesize to avoid confusion; the “<” is a redirection, while the “>” is a comparison of the return value.) The input can also come from standard input. You can use getline following a prompt for the user to enter information:

BEGIN { printf "Enter your name: "
	getline < "-"
	print  
}

This sample code prints the prompt “Enter your name:” (printf is used because we don’t want a carriage return after the prompt), and then calls getline to gather the user’s response.[1] The response is assigned to $0, and the print statement outputs that value.

Assigning the Input to a Variable

The getline function allows you to assign the input record to a variable. The name of the variable is supplied as an argument. Thus, the following statement reads the next line of input into the variable input:

getline input

Assigning the input to a variable does not affect the current input line; that is, $0 is not affected. The new input line is not split into fields, and thus the variable NF is also unaffected. It does increment the record counters, NR and FNR.

The previous example demonstrated how to prompt the user. That example could be written as follows, assigning the user’s response to the variable name.

BEGIN { printf "Enter your name: "
	getline name < "-"
	print name
}

Study the syntax for assigning the input data to a variable because it is a common mistake to instead write:

name = getline     # wrong

which assigns the return value of getline to the variable name.

Reading Input from a Pipe

You can execute a command and pipe the output into getline. For example, look at the following expression:

"who am i" | getline

That expression sets “$0” to the output of the who am i command.

dale       ttyC3        Jul 18 13:37

The line is parsed into fields and the system variable NF is set. Similarly, you can assign the result to a variable:

"who am i" | getline me

By assigning the output to a variable, you avoid setting $0 and NF, but the line is not split into fields.

The following script is a fairly simple example of piping the output of a command to getline. It uses the output from the who am i command to get the user’s name. It then looks up the name in /etc/passwd, printing out the fifth field of that file, the user’s full name:

awk '# getname - print users fullname from /etc/passwd
BEGIN { "who am i" | getline 
	name = $1
	FS = ":"
}
name ~ $1 { print $5 }
' /etc/passwd

The command is executed from the BEGIN procedure, and it provides us with the name of the user that will be used to find the user’s entry in /etc/passwd. As explained above, who am i outputs a single line, which getline assigns to $0. $1, the first field of that output, is then assigned to name.

The field separator is set to a colon (:) to allow us to access individual fields in entries in the /etc/passwd file. Notice that FS is set after getline or else the parsing of the command’s output would be affected.

Finally, the main procedure is designed to test that the first field matches name. If it does, the fifth field of the entry is printed. For instance, when Dale runs this script, it prints “Dale Dougherty.”

When the output of a command is piped to getline and it contains multiple lines, getline reads a line at a time. The first time getline is called it reads the first line of output. If you call it again, it reads the second line. To read all the lines of output, you must set up a loop that executes getline until there is no more output. For instance, the following example uses a while loop to read each line of output and assign it to the next element of the array, who_out:

while ("who" | getline)
	who_out[++i] = $0

Each time the getline function is called, it reads the next line of output. The who command, however, is executed only once.

The next example looks for “@date” in a document and replaces it with today’s date:

# subdate.awk -- replace @date with todays date
/@date/ {
	"date +'%a., %h %d, %Y'" | getline today
	gsub(/@date/, today)
}
{ print }

The date command, using its formatting options,[2] provides the date and getline assigns it to the variable today. The gsub( ) function replaces each instance of “@date” with today’s date.

This script might be used to insert the date in a form letter:

To: Peabody
From: Sherman 
Date: @date

I am writing you on @date to 
remind you about our special offer.

All lines of the input file would be passed through as is, except the lines containing “@date”, which are replaced with today’s date:

$ awk -f subdate.awk subdate.test
To: Peabody
From: Sherman
Date: Sun., May 05, 1996

I am writing you on Sun., May 05, 1996 to
remind you about our special offer.


[1] At least at one time, SGI versions of nawk did not support the use of “-” with getline to read from standard input. Caveat emptor.

[2] Older versions of date don’t support formatting options. Particularly the one on SunOS 4.1.x systems; there you have to use /usr/5bin/date. Check your local documentation.