Table of Contents for
sed & awk, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition sed & awk, 2nd Edition by Arnold Robbins Published by O'Reilly Media, Inc., 1997
  1. sed & awk, 2nd Edition
  2. Cover
  3. sed & awk, 2nd Edition
  4. A Note Regarding Supplemental Files
  5. Dedication
  6. Preface
  7. Scope of This Handbook
  8. Availability of sed and awk
  9. Obtaining Example Source Code
  10. Conventions Used in This Handbook
  11. About the Second Edition
  12. Acknowledgments from the First Edition
  13. Comments and Questions
  14. 1. Power Tools for Editing
  15. 1.1. May You Solve Interesting Problems
  16. 1.2. A Stream Editor
  17. 1.3. A Pattern-Matching Programming Language
  18. 1.4. Four Hurdles to Mastering sed and awk
  19. 2. Understanding Basic Operations
  20. 2.1. Awk, by Sed and Grep, out of Ed
  21. 2.2. Command-Line Syntax
  22. 2.3. Using sed
  23. 2.4. Using awk
  24. 2.5. Using sed and awk Together
  25. 3. Understanding Regular Expression Syntax
  26. 3.1. That’s an Expression
  27. 3.2. A Line-Up of Characters
  28. 3.3. I Never Metacharacter I Didn’t Like
  29. 4. Writing sed Scripts
  30. 4.1. Applying Commands in a Script
  31. 4.2. A Global Perspective on Addressing
  32. 4.3. Testing and Saving Output
  33. 4.4. Four Types of sed Scripts
  34. 4.5. Getting to the PromiSed Land
  35. 5. Basic sed Commands
  36. 5.1. About the Syntax of sed Commands
  37. 5.2. Comment
  38. 5.3. Substitution
  39. 5.4. Delete
  40. 5.5. Append, Insert, and Change
  41. 5.6. List
  42. 5.7. Transform
  43. 5.8. Print
  44. 5.9. Print Line Number
  45. 5.10. Next
  46. 5.11. Reading and Writing Files
  47. 5.12. Quit
  48. 6. Advanced sed Commands
  49. 6.1. Multiline Pattern Space
  50. 6.2. A Case for Study
  51. 6.3. Hold That Line
  52. 6.4. Advanced Flow Control Commands
  53. 6.5. To Join a Phrase
  54. 7. Writing Scripts for awk
  55. 7.1. Playing the Game
  56. 7.2. Hello, World
  57. 7.3. Awk’s Programming Model
  58. 7.4. Pattern Matching
  59. 7.5. Records and Fields
  60. 7.6. Expressions
  61. 7.7. System Variables
  62. 7.8. Relational and Boolean Operators
  63. 7.9. Formatted Printing
  64. 7.10. Passing Parameters Into a Script
  65. 7.11. Information Retrieval
  66. 8. Conditionals, Loops, and Arrays
  67. 8.1. Conditional Statements
  68. 8.2. Looping
  69. 8.3. Other Statements That Affect Flow Control
  70. 8.4. Arrays
  71. 8.5. An Acronym Processor
  72. 8.6. System Variables That Are Arrays
  73. 9. Functions
  74. 9.1. Arithmetic Functions
  75. 9.2. String Functions
  76. 9.3. Writing Your Own Functions
  77. 10. The Bottom Drawer
  78. 10.1. The getline Function
  79. 10.2. The close( ) Function
  80. 10.3. The system( ) Function
  81. 10.4. A Menu-Based Command Generator
  82. 10.5. Directing Output to Files and Pipes
  83. 10.6. Generating Columnar Reports
  84. 10.7. Debugging
  85. 10.8. Limitations
  86. 10.9. Invoking awk Using the #! Syntax
  87. 11. A Flock of awks
  88. 11.1. Original awk
  89. 11.2. Freely Available awks
  90. 11.3. Commercial awks
  91. 11.4. Epilogue
  92. 12. Full-Featured Applications
  93. 12.1. An Interactive Spelling Checker
  94. 12.2. Generating a Formatted Index
  95. 12.3. Spare Details of the masterindex Program
  96. 13. A Miscellany of Scripts
  97. 13.1. uutot.awk—Report UUCP Statistics
  98. 13.2. phonebill—Track Phone Usage
  99. 13.3. combine—Extract Multipart uuencoded Binaries
  100. 13.4. mailavg—Check Size of Mailboxes
  101. 13.5. adj—Adjust Lines for Text Files
  102. 13.6. readsource—Format Program Source Files for troff
  103. 13.7. gent—Get a termcap Entry
  104. 13.8. plpr—lpr Preprocessor
  105. 13.9. transpose—Perform a Matrix Transposition
  106. 13.10. m1—Simple Macro Processor
  107. A. Quick Reference for sed
  108. A.1. Command-Line Syntax
  109. A.2. Syntax of sed Commands
  110. A.3. Command Summary for sed
  111. B. Quick Reference for awk
  112. B.1. Command-Line Syntax
  113. B.2. Language Summary for awk
  114. B.3. Command Summary for awk
  115. C. Supplement for Chapter 12
  116. C.1. Full Listing of spellcheck.awk
  117. C.2. Listing of masterindex Shell Script
  118. C.3. Documentation for masterindex
  119. masterindex
  120. C.3.1. Background Details
  121. C.3.2. Coding Index Entries
  122. C.3.3. Output Format
  123. C.3.4. Compiling a Master Index
  124. Index
  125. About the Authors
  126. Colophon
  127. Copyright

Testing and Saving Output

In our previous discussion of the pattern space, you saw that sed:

  1. Makes a copy of the input line.

  2. Modifies that copy in the pattern space.

  3. Outputs the copy to standard output.

What this means is that sed has a built-in safeguard so that you don’t make changes to the original file. Thus, the following command line:

$ sed -f sedscr testfile

does not make the change in testfile. It sends all lines to standard ouput (typically the screen)—the lines that were modified as well as the lines that are unchanged. You have to capture this output in a new file if you want to save it.

$ sed -f sedscr testfile > newfile

The redirection symbol “>” directs the output from sed to the file newfile. Don’t redirect the output from the command back to the input file or you will overwrite the input file. This will happen before sed even gets a chance to process the file, effectively destroying your data.

One important reason to redirect the output to a file is to verify your results. You can examine the contents of newfile and compare it to testfile. If you want to be very methodical about checking your results (and you should be), use the diff program to point out the differences between the two files.

$ diff testfile newfile

This command will display lines that are unique to testfile preceded by a “<” and lines unique to newfile preceded by a “>”. When you have verified your results, make a backup copy of the original input file and then use the mv command to overwrite the original with the new version. Be sure that the editing script is working properly before abandoning the original version.

Because these steps are repeated so frequently, you will find it helpful to put them into a shell script. While we can’t go into much depth about the workings of shell scripts, these scripts are fairly simple to understand and use. Writing a shell script involves using a text editor to enter one or more command lines in a file, saving the file and then using the chmod command to make the file executable. The name of the file is the name of the command, and it can be entered at the system prompt. If you are unfamiliar with shell scripts, follow the shell scripts presented in this book as recipes in which you make your own substitutions.

The following two shell scripts are useful for testing sed scripts and then making the changes permanently in a file. They are particularly useful when the same script needs to be run on multiple files.

testsed

The shell script testsed automates the process of saving the output of sed in a temporary file. It expects to find the script file, sedscr, in the current directory and applies these instructions to the input file named on the command line. The output is placed in a temporary file.

for x
do
	sed -f sedscr $x > tmp.$x
done

The name of a file must be specified on the command line. As a result, this shell script saves the output in a temporary file with the prefix "tmp.“. You can examine the temporary file to determine if your edits were made correctly. If you approve of the results, you can use mv to overwrite the original file with the temporary file.

You might also incorporate the diff command into the shell script. (Add diff $x tmp.$x after the sed command.)

If you find that your script did not produce the results you expected, remember that the easiest “fix” is usually to perfect the editing script and run it again on the original input file. Don’t write a new script to “undo” or improve upon changes made in the temporary file.

runsed

The shell script runsed was developed to make changes to an input file permanently. In other words, it is used in cases when you would want the input file and the output file to be the same. Like testsed, it creates a temporary file, but then it takes the next step: copying the file over the original.

#! /bin/sh

for x
do
   echo "editing $x: \c"
   if test "$x" = sedscr; then
      echo "not editing sedscript!" 
   elif test -s $x; then 
      sed -f sedscr $x > /tmp/$x$$
      if test -s /tmp/$x$$
      then 
         if cmp -s $x /tmp/$x$$
         then
            echo "file not changed: \c"
         else
            mv $x $x.bak  # save original, just in case
            cp /tmp/$x$$ $x
         fi
         echo "done"
      else 
         echo "Sed produced an empty file\c"
         echo " - check your sedscript."
      fi
      rm -f /tmp/$x$$
   else
      echo "original file is empty."
   fi
done
echo "all done"

To use runsed, create a sed script named sedscr in the directory where you want to make the edits. Supply the name or names of the files to edit on the command line. Shell metacharacters can be used to specify a set of files.

$ runsed ch0?

runsed simply invokes sed -f sedscr on the named files, one at a time, and redirects the output to a temporary file. runsed then tests this temporary file to make sure that output was produced before copying it over the original.

The muscle of this shell script (line 9) is essentially the same as testsed. The additional lines are intended to test for unsuccessful runsfor instance, when no output is produced. It compares the two files to see if changes were actually made or to see if an empty output file was produced before overwriting the original.

However, runsed does not protect you from imperfect editing scripts. You should use testsed first to verify your changes before actually making them permanent with runsed.