Table of Contents for
sed & awk, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition sed & awk, 2nd Edition by Arnold Robbins Published by O'Reilly Media, Inc., 1997
  1. sed & awk, 2nd Edition
  2. Cover
  3. sed & awk, 2nd Edition
  4. A Note Regarding Supplemental Files
  5. Dedication
  6. Preface
  7. Scope of This Handbook
  8. Availability of sed and awk
  9. Obtaining Example Source Code
  10. Conventions Used in This Handbook
  11. About the Second Edition
  12. Acknowledgments from the First Edition
  13. Comments and Questions
  14. 1. Power Tools for Editing
  15. 1.1. May You Solve Interesting Problems
  16. 1.2. A Stream Editor
  17. 1.3. A Pattern-Matching Programming Language
  18. 1.4. Four Hurdles to Mastering sed and awk
  19. 2. Understanding Basic Operations
  20. 2.1. Awk, by Sed and Grep, out of Ed
  21. 2.2. Command-Line Syntax
  22. 2.3. Using sed
  23. 2.4. Using awk
  24. 2.5. Using sed and awk Together
  25. 3. Understanding Regular Expression Syntax
  26. 3.1. That’s an Expression
  27. 3.2. A Line-Up of Characters
  28. 3.3. I Never Metacharacter I Didn’t Like
  29. 4. Writing sed Scripts
  30. 4.1. Applying Commands in a Script
  31. 4.2. A Global Perspective on Addressing
  32. 4.3. Testing and Saving Output
  33. 4.4. Four Types of sed Scripts
  34. 4.5. Getting to the PromiSed Land
  35. 5. Basic sed Commands
  36. 5.1. About the Syntax of sed Commands
  37. 5.2. Comment
  38. 5.3. Substitution
  39. 5.4. Delete
  40. 5.5. Append, Insert, and Change
  41. 5.6. List
  42. 5.7. Transform
  43. 5.8. Print
  44. 5.9. Print Line Number
  45. 5.10. Next
  46. 5.11. Reading and Writing Files
  47. 5.12. Quit
  48. 6. Advanced sed Commands
  49. 6.1. Multiline Pattern Space
  50. 6.2. A Case for Study
  51. 6.3. Hold That Line
  52. 6.4. Advanced Flow Control Commands
  53. 6.5. To Join a Phrase
  54. 7. Writing Scripts for awk
  55. 7.1. Playing the Game
  56. 7.2. Hello, World
  57. 7.3. Awk’s Programming Model
  58. 7.4. Pattern Matching
  59. 7.5. Records and Fields
  60. 7.6. Expressions
  61. 7.7. System Variables
  62. 7.8. Relational and Boolean Operators
  63. 7.9. Formatted Printing
  64. 7.10. Passing Parameters Into a Script
  65. 7.11. Information Retrieval
  66. 8. Conditionals, Loops, and Arrays
  67. 8.1. Conditional Statements
  68. 8.2. Looping
  69. 8.3. Other Statements That Affect Flow Control
  70. 8.4. Arrays
  71. 8.5. An Acronym Processor
  72. 8.6. System Variables That Are Arrays
  73. 9. Functions
  74. 9.1. Arithmetic Functions
  75. 9.2. String Functions
  76. 9.3. Writing Your Own Functions
  77. 10. The Bottom Drawer
  78. 10.1. The getline Function
  79. 10.2. The close( ) Function
  80. 10.3. The system( ) Function
  81. 10.4. A Menu-Based Command Generator
  82. 10.5. Directing Output to Files and Pipes
  83. 10.6. Generating Columnar Reports
  84. 10.7. Debugging
  85. 10.8. Limitations
  86. 10.9. Invoking awk Using the #! Syntax
  87. 11. A Flock of awks
  88. 11.1. Original awk
  89. 11.2. Freely Available awks
  90. 11.3. Commercial awks
  91. 11.4. Epilogue
  92. 12. Full-Featured Applications
  93. 12.1. An Interactive Spelling Checker
  94. 12.2. Generating a Formatted Index
  95. 12.3. Spare Details of the masterindex Program
  96. 13. A Miscellany of Scripts
  97. 13.1. uutot.awk—Report UUCP Statistics
  98. 13.2. phonebill—Track Phone Usage
  99. 13.3. combine—Extract Multipart uuencoded Binaries
  100. 13.4. mailavg—Check Size of Mailboxes
  101. 13.5. adj—Adjust Lines for Text Files
  102. 13.6. readsource—Format Program Source Files for troff
  103. 13.7. gent—Get a termcap Entry
  104. 13.8. plpr—lpr Preprocessor
  105. 13.9. transpose—Perform a Matrix Transposition
  106. 13.10. m1—Simple Macro Processor
  107. A. Quick Reference for sed
  108. A.1. Command-Line Syntax
  109. A.2. Syntax of sed Commands
  110. A.3. Command Summary for sed
  111. B. Quick Reference for awk
  112. B.1. Command-Line Syntax
  113. B.2. Language Summary for awk
  114. B.3. Command Summary for awk
  115. C. Supplement for Chapter 12
  116. C.1. Full Listing of spellcheck.awk
  117. C.2. Listing of masterindex Shell Script
  118. C.3. Documentation for masterindex
  119. masterindex
  120. C.3.1. Background Details
  121. C.3.2. Coding Index Entries
  122. C.3.3. Output Format
  123. C.3.4. Compiling a Master Index
  124. Index
  125. About the Authors
  126. Colophon
  127. Copyright

Formatted Printing

Many of the scripts that we’ve written so far perform the data processing tasks just fine, but the output has not been formatted properly. That is because there is only so much you can do with the basic print statement. And since one of awk’s most common functions is to produce reports, it is crucial that we be able to format our reports in an orderly fashion. The filesum program performs the arithmetic tasks well but the report lacks an orderly format.

Awk offers an alternative to the print statement, printf, which is borrowed from the C programming language. The printf statement can output a simple string just like the print statement.

awk 'BEGIN { printf ("Hello, world\n") }'

The main difference that you will notice at the outset is that, unlike print, printf does not automatically supply a newline. You must specify it explicitly as “\n”.

The full syntax of the printf statement has two parts:

printf ( format-expression [, arguments] )

The parentheses are optional. The first part is an expression that describes the format specifications; usually this is supplied as a string constant in quotes. The second part is an argument list, such as a list of variable names, that correspond to the format specifications. A format specification is preceded by a percent sign (%) and the specifier is one of the characters shown in Table 7.6. The two main format specifiers are s for strings and d for decimal integers.[12]

Table 7.6. Format Specifiers Used in printf
CharacterDescription
cASCII character
dDecimal integer
iDecimal integer. (Added in POSIX)
eFloating-point format ([-]d.precisione[+-]dd)
EFloating-point format ([-]d.precisionE[+-]dd)
fFloating-point format ([-]ddd.precision)
ge or f conversion, whichever is shortest, with trailing zeros removed
GE or f conversion, whichever is shortest, with trailing zeros removed
oUnsigned octal value
sString
xUnsigned hexadecimal number. Uses a-f for 10 to 15
XUnsigned hexadecimal number. Uses A-F for 10 to 15
%Literal %

This example uses the printf statement to produce the output for rule 2 in the filesum program. It outputs a string and a decimal value found in two different fields:

printf("%d\t%s\n", $5, $9)

The value of $5 is to be output, followed by a tab (\t) and $9 and then a newline (\n).[13] For each format specification, you must supply a corresponding argument.

This printf statement can be used to specify the width and alignment of output fields. A format expression can take three optional modifiers following “%” and preceding the format specifier:

%-width.precision format-specifier

The width of the output field is a numeric value. When you specify a field width, the contents of the field will be right-justified by default. You must specify “-” to get left-justification. Thus, “%-20s” outputs a string left-justified in a field 20 characters wide. If the string is less than 20 characters, the field will be padded with whitespace to fill. In the following examples, a “|” is output to indicate the actual width of the field. The first example right-justifies the text:

printf("|%10s|\n", "hello")

It produces:

|     hello|

The next example left-justifies the text:

printf("|%-10s|\n", "hello")

It produces:

|hello     |

The precision modifier, used for decimal or floating-point values, controls the number of digits that appear to the right of the decimal point. For string values, it controls the maximum number of characters from the string that will be printed. Note that the default precision for the output of numeric values is “%.6g”.

You can specify both the width and precision dynamically, via values in the printf or sprintf argument list. You do this by specifying asterisks, instead of literal values.

printf("%*.*g\n", 5, 3, myvar);

In this example, the width is 5, the precision is 3, and the value to print will come from myvar.

The default precision used by the print statement when outputting numbers can be changed by setting the system variable OFMT. For instance, if you are using awk to write reports that contain dollar values, you might prefer to change OFMT to “%.2f”.

Using the full syntax of the format expression can solve the problem with filesum of getting fields and headings properly aligned. One reason we output the file size before the filename was that the fields had a greater chance of aligning themselves if they were output in that order. The solution that printf offers us is the ability to fix the width of output fields; therefore, each field begins in the same column.

Let’s rearrange the output fields in the filesum report. We want a minimum field width so that the second field begins at the same position. You specify the field width place between the % and the conversion specification. “%-15s” specifies a minimum field width of 15 characters in which the value is left-justified. “%10d”, without the hyphen, is right-justified, which is what we want for a decimal value.

printf("%-15s\t%10d\n", $9, $5)       # print filename and size

This will produce a report in which the data is aligned in columns and the numbers are right-justified. Look at how the printf statement is used in the END action:

printf("Total: %d bytes  (%d files)\n", sum, filenum)

The column header in the BEGIN rule is also changed appropriately. With the use of the printf statement, filesum now produces the following output:

$ filesum g*
FILE                 BYTES
g                       23
gawk                  2237
gawk.mail             1171
gawk.test               74
gawkro                 264
gfilesum               610
grades                  64
grades.awk             231
grepscript               6
Total: 4680 bytes  (9 files)


[12] The way printf does rounding is discussed in Appendix B.

[13] Compare this statement with the print statement in the filesum program that prints the header line. The print statement automatically supplies a newline (the value of ORS); when using printf, you must supply the newline, it is never automatically provided for you.