Table of Contents for
Learning Linux Shell Scripting

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Learning Linux Shell Scripting by Ganesh Sanjiv Naik Published by Packt Publishing, 2015
  1. Cover
  2. Table of Contents
  3. Learning Linux Shell Scripting
  4. Learning Linux Shell Scripting
  5. Credits
  6. About the Author
  7. Acknowledgments
  8. About the Reviewers
  9. www.PacktPub.com
  10. Preface
  11. What you need for this book
  12. Who this book is for
  13. Conventions
  14. Reader feedback
  15. Customer support
  16. 1. Getting Started and Working with Shell Scripting
  17. Tasks done by shell
  18. Working in shell
  19. Learning basic Linux commands
  20. Our first script – Hello World
  21. Compiler and interpreter – difference in process
  22. When not to use scripts
  23. Various directories
  24. Working more effectively with shell – basic commands
  25. Working with permissions
  26. Summary
  27. 2. Drilling Deep into Process Management, Job Control, and Automation
  28. Monitoring processes using ps
  29. Process management
  30. Process monitoring tools – top, iostat, and vmstat
  31. Understanding "at"
  32. Understanding "crontab"
  33. Summary
  34. 3. Using Text Processing and Filters in Your Scripts
  35. IO redirection
  36. Pattern matching with the vi editor
  37. Pattern searching using grep
  38. Summary
  39. 4. Working with Commands
  40. Command substitution
  41. Command separators
  42. Logical operators
  43. Pipes
  44. Summary
  45. 5. Exploring Expressions and Variables
  46. Working with environment variables
  47. Working with read-only variables
  48. Working with command line arguments (special variables, set and shift, getopt)
  49. Understanding getopts
  50. Understanding default parameters
  51. Working with arrays
  52. Summary
  53. 6. Neat Tricks with Shell Scripting
  54. The here document and the << operator
  55. The here string and the <<< operator
  56. File handling
  57. Debugging
  58. Summary
  59. 7. Performing Arithmetic Operations in Shell Scripts
  60. Using the let command for arithmetic
  61. Using the expr command for arithmetic
  62. Binary, octal, and hex arithmetic operations
  63. A floating-point arithmetic
  64. Summary
  65. 8. Automating Decision Making in Scripts
  66. Understanding the test command
  67. Conditional constructs – if else
  68. Switching case
  69. Implementing simple menus with select
  70. Looping with the for command
  71. Exiting from the current loop iteration with the continue command
  72. Exiting from a loop with a break
  73. Working with the do while loop
  74. Using until
  75. Piping the output of a loop to a Linux command
  76. Running loops in the background
  77. The IFS and loops
  78. Summary
  79. 9. Working with Functions
  80. Passing arguments or parameters to functions
  81. Sharing the data by many functions
  82. Declaring local variables in functions
  83. Returning information from functions
  84. Running functions in the background
  85. Creating a library of functions
  86. Summary
  87. 10. Using Advanced Functionality in Scripts
  88. Using the trap command
  89. Ignoring signals
  90. Using traps in function
  91. Running scripts or processes even if the user logs out
  92. Creating dialog boxes with the dialog utility
  93. Summary
  94. 11. System Startup and Customizing a Linux System
  95. User initialization scripts
  96. Summary
  97. 12. Pattern Matching and Regular Expressions with sed and awk
  98. sed – noninteractive stream editor
  99. Using awk
  100. Summary
  101. Index

Using awk

awk is a program, which has its own programming language for performing data processing and to generate reports.

The GNU version of awk is gawk.

awk processes data, which can be received from a standard input, input file, or as the output of any other command or process.

awk processes data similar to sed, such as lines by line. It processes every line for the specified pattern and performs specified actions. If pattern is specified, then all the lines containing specified patterns will be displayed. If pattern is not specified, then the specified actions will be performed on all the lines.

The meaning of awk

The name of the program awk is made from the initials of three authors of the language, namely Alfred Aho, Peter Weinberger and Brian Kernighan. It is not very clear why they selected the name awk instead of kaw or wak!

Using awk

The following are different ways to use awk:

  • Syntax while using only pattern:
    $ awk 'pattern' filename
    

    In this case, all the lines containing pattern will be printed.

  • Syntax using only action:
    $ awk '{action}' filename
    

    In this case, action will be applied on all lines

  • Syntax using pattern and action:
    $ awk 'pattern {action}' filename
    

    In this case, action will be applied on all the lines containing pattern.

As seen previously, the awk instruction consists of patterns, actions, or a combination of both.

Actions will be enclosed in curly brackets. Actions can contain many statements separated by a semicolon or a newline.

awk commands can be on the command line or in the awk script file. The input lines could be received from keyboard, pipe, or a file.

Input from files

Let's see a few examples by using the preceding syntax using input from files:

$ cat people.txt

The output is as follows:

Bill Thomas  8000  08/9/1968
Fred Martin  6500  22/7/1982
Julie Moore  4500  25/2/1978
Marie Jones  6000  05/8/1972
Tom Walker   7000  14/1/1977

Enter the next command as follows:

$ awk '/Martin/' people.txt

The output is as follows:

Fred Martin  6500  22/7/1982

This prints a line containing the Martin pattern.

For example:

$ cat people.txt

The output is as follows:

Bill Thomas  8000  08/9/1968
Fred Martin  6500  22/7/1982
Julie Moore  4500  25/2/1978
Marie Jones  6000  05/8/1972
Tom Walker   7000  14/1/1977

Enter the next command as follows:

$ awk '{print $1}' people.txt

The output is as follows:

Bill
Fred
Julie
Marie
Tom

This awk command prints the first field of all the lines from the people.txt file:

$ cat people.txt

The output is as follows:

Bill Thomas  8000  08/9/1968
Fred Martin  6500  22/7/1982
Julie Moore  4500  25/2/1978
Marie Jones  6000  05/8/1972
Tom Walker   7000  14/1/1977

For example:

$ awk '/Martin/{print $1, $2}' people.txt
Fred Martin 

This prints the first and second field of the line that contains the Martin pattern.

Input from commands

We can use the output of any other Linux command as an input to the awk program. We need to use the pipe to send an output of other command as the input to the awk program.

The syntax is as follows:

$ command | awk 'pattern'
$ command | awk '{action}'
$ command | awk 'patter	n {action}'

For example:

$ cat people.txt | awk '$3 > 6500'

The output is as follows:

Bill Thomas  8000  08/9/1968
Tom Walker   7000  14/1/1977

This prints all lines, in which field 3 is greater than 6500.

For example:

$ cat people.txt | awk '/1972$/{print $1, $2}'

The output is as follows:

Marie Jones

This prints fields 1 and 2 of the lines, which ends with the 1972 pattern:

$ cat people.txt | awk '$3 > 6500 {print $1, $2}'

This prints fields 1 and 2 of the line, in which the third field is greater than 6500.

How awk works

Let's understand how the awk program processes every line. We will consider a simple file, sample.txt:

$ cat sample.txt
Happy Birth Day
We should live every day.

Let's consider the following awk command:

$ awk '{print $1, $3}' 
sample.txt

The following diagram shows, how the awk will process every line in memory:

How awk works

The explanation about the preceding diagram is as follows:

  • awk reads a line from the file and puts it into an internal variable called $0. Each line is called record. By default, every line is terminated by a newline.
  • Then, every record or line is divided into separate words or fields. Every word is stored in numbered variables $1, $2, and so on. There can be as many as 100 fields per record.
  • awk has an internal variable called IFS (Internal Field Separator). IFS is normally whitespace. Whitespace includes tabs and spaces. The fields will be separated by IFS. If we want to specify any other IFS, such as colon : in the /etc/passwd file, then we will need to specify it in the awk command line.

When awk checks an action as '{print $1, $3}', it tells awk to print the first and third fields. Fields will be separated by space. The command will be as follows:

$ awk '{print $1, $3}' sample.txt

The output will be as follows:

Happy Day
We live

the explanation of the output is as follows:

  • There is one more internal variable called Output Field Separator (OFS). This is normally space. This will be used for separating fields, while printing as output.
  • Once the first line is processed, awk loads the next line in $0 and it continues as discussed earlier.

awk commands from within a file

We can put awk commands in a file. We will need to use the -f option before using the awk script file name to inform about using the awk script file for all processing instructions. awk will copy the first line from the data file to be processed in $0, and then, it will apply all processing instructions on that record. Then, it will discard that record and load the next line from the data file. This way, it will proceed till the last line of the data file. If the action is not specified, the pattern matching lines will be printed on screen. If the pattern is not specified, then the specified action will be performed on all lines of the data file.

For example:

$ cat people.txt
Bill Thomas  8000  08/9/1968
Fred Martin  6500  22/7/1982
Julie Moore  4500  25/2/1978
Marie Jones  6000  05/8/1972
Tom Walker   7000  14/1/1977
$ cat awk_script 
/Martin/{print $1, $2}

Enter the next command as follows:

$ awk -f awk_script people.txt

The output is as follows:

Fred Martin

The awk command file contains the Martin pattern and it specifies the action of printing fields 1 and 2 of the line, matching the pattern. Therefore, it has printed the first and second fields of the line, containing the Martin pattern.

Records and fields

Every line terminated by the newline is called record and every word separated by white space is called field. We will learn more about them in this section.

Records

awk does not see the file as one continuous stream of data; but it processes the file line by line. Each line is terminated by a new line character. It copies each line in the internal buffer called record.

The record separator

By default, a newline or carriage return is an input record separator and output record separator. The input record separator is stored in the built-in variable RS, and the output record separator is stored in ORS. We can modify the ORS and RS, if required.

The $0 variable

The entire line that is copied in buffer, such as record, is called $0.

Take the following command for example:

$ cat people.txt

The output is as follows:

Bill Thomas  8000  08/9/1968
Fred Martin  6500  22/7/1982
Julie Moore  4500  25/2/1978
Marie Jones  6000  05/8/1972
Tom Walker   7000  14/1/1977
$ awk '{print $0}' people.txt

The output is as follows:

Bill Thomas  8000  08/9/1968
Fred Martin  6500  22/7/1982
Julie Moore  4500  25/2/1978
Marie Jones  6000  05/8/1972
Tom Walker   7000  14/1/1977

This has printed all the lines of the text file. Similar results can be seen by the following command:

$ awk 
'{print}' people.txt

The NR variable

awk has a built-in variable called NR. It stores the record number. Initially, the value stored in NR is 1. Then, it will be incremented by one for each new record.

Take, for example, the following command:

$ cat people.txt

The output will be:

Bill Thomas  8000  08/9/1968
Fred Martin  6500  22/7/1982
Julie Moore  4500  25/2/1978
Marie Jones  6000  05/8/1972
Tom Walker   7000  14/1/1977
$ awk '{print NR, $0}'  people.txt

The output will be:

1 Bill Thomas  8000  08/9/1968
2 Fred Martin  6500  22/7/1982
3 Julie Moore  4500  25/2/1978
4 Marie Jones  6000  05/8/1972
5 Tom Walker   7000  14/1/1977

This has printed every record, such as $0 with record number, which is stored in NR. That is why we see 1, 2, 3, and so on before every line of output.

Fields

Every line is called record and every word in record is called field. By default, words or fields are separated by whitespace, that is, space or tab. awk has an internal built-in variable called NF, which will keep track of field numbers. Typically, the maximum field number will be 100, which will depend on implementation. The following example has five records and four fields.

For example:

$1    $2        $3       $4
Bill Thomas  8000  08/9/1968
Fred Martin  6500  22/7/1982
Julie Moore  4500  25/2/1978
Marie Jones  6000  05/8/1972
Tom Walker   7000  14/1/1977
$ awk '{print NR, $1, $2, $4}' people.txt

The output will be:

1 Bill Thomas 08/9/1968
2 Fred Martin 22/7/1982
3 Julie Moore 25/2/1978
4 Marie Jones 05/8/1972
5 Tom Walker  14/1/1977

This has printed record number and field numbers 1, 2, and so on, on the screen.

Field separators

Every word is separated by white space. We will learn more about them in this section.

The input field separator

We have already discussed that input field separator is whitespace, by default. We can change this IFS to other values on the command line or by using the BEGIN statement. We need to use the -F option to change IFS.

For example:

$ cat people.txt

The output will be:

Bill Thomas:8000:08/9/1968
Fred Martin:6500:22/7/1982
Julie Moore:4500:25/2/1978
Marie Jones:6000:05/8/1972
Tom Walker:7000:14/1/1977
$ awk -F: '/Marie/{print $1, $2}' people.txt

The output will be:

Marie Jones 6000

We have used the -F option to specify colon (:) as IFS instead of the default, IFS. Therefore, it has printed field 1 and 2 of the records in which the Marie pattern was matched. We can even specify more than one IFS on the command line as follows:

$ awk –F'[ :\t]'  '{print $1, $2, $3}' people.txt

This will use space, colon, and tab characters as the inter field separator or IFS.

Patterns and actions

While executing commands using awk, we need to define patterns and actions. Let's learn more about them in this section.

Patterns

awk uses the patterns to control the processing of actions. When pattern or regular expression is found in the record, then action is performed, or if no action is defined then awk simply prints the line on screen.

For example:

$ cat people.txt

The output will be:

Bill Thomas  8000  08/9/1968
Fred Martin  6500  22/7/1982
Julie Moore  4500  25/2/1978
Marie Jones  6000  05/8/1972
Tom Walker   7000  14/1/1977
$ awk '/Bill/' people.txt

The output will be:

Bill Thomas  8000  08/9/1968

In this example, when the Bill pattern is found in the record, that record is printed on screen:

$ awk '$3 > 5000' people.txt

The output will be:

Bill Thomas  8000  08/9/1968
Fred Martin  6500  22/7/1982
Marie Jones  6000  05/8/1972
Tom Walker   7000  14/1/1977

In this example, when field 3 is greater than 5000, that record is printed on screen.

Actions

Actions are performed when the required pattern is found in record. Actions are enclosed in curly brackets such as '{' and '}'. We can specify different commands in the same curly brackets; but those should be separated by a semicolon.

The syntax is as follows:

pattern{ action statement; action statement; .. }
     or
pattern
{    action statement
       action statement
}

The following example gives a better idea:

$ awk '/Bill/{print $1, $2 ", Happy Birth Day !"}' people.txt

Output:

Bill Thomas, Happy Birth Day !

Whenever a record contains the Bill pattern, awk performs the action of printing field 1, field 2 and prints the message Happy Birth Day.

Regular expressions

The regular expressions is a pattern enclosed in forward slashes. Regular expression can contain metacharacters. If the pattern matches any string in the record, then the condition is true and any associated action, if mentioned, will be executed. If no action is specified, then simply the record is printed on screen.

Metacharacters used in awk regular expressions are as follows:

Metacharacter

What it does

.

A single character is matched

*

Zero or more characters are matched

^

The beginning of the string is matched

$

The end of the string is matched

+

One or more of the characters are matched

?

Zero or one of the characters are matched

[ABC]

Any one character in the set of characters A, B, or C is matched

[^ABC]

Any one character not in the set of characters A, B, or C is matched

[A–Z]

Any one character in the range from A to Z is matched

a|b

Either a or b is matched

(AB)+

One or more sets of AB; such as AB, ABAB, and so on is matched

\*

A literal asterisk is matched

&

This is used to represent the replacement string when it is found in the search string

In the following example, all lines containing regular expression "Moore" will be searched and matching record's field 1 and 2 will be displayed on screen:

$ awk  '/Moore/{print $1, $2}' people.txt

The output is as follows:

Julie Moore

Writing the awk script file

Whenever we need to write multiple patterns and actions in a statement, then it is more convenient to write a script file. The script file will contain patterns and actions. If multiple commands are on the same line, then those should be separated by a semicolon; otherwise, we need to write them on separate lines. The comment line will start by using the pound (#) sign.

For example:

$ cat people.txt

The output is as folllows:

Bill Thomas  8000  08/9/1968
Fred Martin  6500  22/7/1982
Julie Moore  4500  25/2/1978
Marie Jones  6000  05/8/1972
Tom Walker   7000  14/1/1977

(The awk script)

$ cat report

The output is as follows:

/Bill/{print "Birth date of " $1, $2 " is " $4}
/^Julie/{print $1, $2 " has a salary of  $" $3 "."}
/Marie/{print NR, $0}

Enter the next command as follows:

$ awk -f report people.txt

The output will be:

Birth date of Bill Thomas is 08/9/1968
Julie Moore has a salary of $4500.
4 Marie Jones  6000  05/8/1972

In this example, the awk command is followed by the -f option, which specifies the script file as record and then processes all the commands on the text file people.txt.

In this script, regular expression Bill is matched, then print text, field 1, field 2, and then print the birth date information. If the regular expression Julie is matched at the start of the line, then print her salary information. If regular expression Marie is matched, then print the record number NR and print the complete record.

Using variables in awk

We can simply declare a variable in the awk script, without even any initialization. Variables can be of type string, number, or floating type and so on. There is no type declaration required like in C programming. awk will find out the type of variable by its right-hand side data type during initialization or its usage in the script.

Uninitialized variables will have the value 0 or strings will have a value null such as "", depending on how it is used inside scripts:

name = "Ganesh"

The variable name is of the string type:

j++

The variable j is a number. Variable j is initialized to zero and it is incremented by one:

value = 50

The variable value is a number with initial value 50.

The technique to modify the string type variable to the number type is as follows:

name + 0

The technique to modify the number type variable to the string type is as follows:

value " "

User-defined variables can be made up of letters, digits, and underscores. The variable cannot start with a digit.

Decision making using an if statement

In awk programming, the if statement is used for decision making. The syntax is as follows:

if (conditional-expression)
  action1
else
  action2

If the condition is true, then action1 will be performed, else action2 will be performed. This is very similar to C programming if constructs.

An example of using the if statement in the awk command is as follows:

$ cat person.txt

The output is as follows:

Bill Thomas  8000  08/9/1968
Fred Martin  6500  22/7/1982
Julie Moore  4500  25/2/1978
Marie Jones  6000  05/8/1972
Tom Walker   7000  14/1/1977
$ awk '{
if ($3 > 7000) { print "person with salary more than 7000 is \n", $1, " " , $2;}
}' people.txt

The output is as follows:

person with salary more than 7000 is
Bill Thomas

In this example, field 3 is checked for greater than 7000 in every record. If field 3 is greater than 7000 for any record, then the action of printing the name of the person and value of third record will be done.

Using the for loop

The for loop is used for doing certain actions repetitively. The syntax is as follows:

for(initialization; condition; increment/decrement)
actions

Initially, a variable is initialized. Then, the condition is checked, if it is true, then action or actions enclosed in curly brackets are performed. Then, the variable is incremented or decremented. Again, the condition is checked. If the condition is true, then actions are performed, otherwise, the loop is terminated.

An example of the awk command with the for loop is as follows:

$ awk '{ for( i = 1; i <= NF; i++) print NF, $i }' people.txt

Initially, the i variable is initialized to 1. Then, the condition is checked to see whether i is less than NF. If true, then the action of printing NF and the field is performed. Then i is incremented by one. Again, the condition is checked if it is true or false. If true, then it will perform actions again; otherwise, it will terminate the looping activity.

Using the while loop

Similar to C programming, awk has a while loop for doing the tasks repeatedly. while will check for the condition. If the condition is true, then actions will be performed. If condition is false, then it will terminate the loop.

The syntax is as follows:

  while(condition)
    actions

An example of using the while construct in awk is as follows:

$ cat people.txt
$ awk '{ i  = 1; while ( i <= NF ) { print NF, $i ; i++ } }' people.txt

NF is the number of fields in the record. The variable i is initialized to 1. Then, while i is smaller or equal to NF, the print action will be performed. The print command will print fields from the record from the file people.txt. In the action block, i is incremented by one. The while construct will perform the action repeatedly until i is less than or equal to NF.

Using the do while loop

The do while loop is similar to while loop; but the difference is, even if the condition is true, at least once the action will be performed unlike the while loop.

The syntax is as follows:

do
action
while (condition)

After the action or actions are performed, the condition is checked again. If the condition is true, then the action will be performed again, otherwise, the loop will be terminated.

The following is an example of using the do while loop:

$ cat awk_script
BEGIN {
  do {
    ++x
    print x
  } while ( x <= 4 )
}
$ awk -f awk_script
1
2
3
4
5

In this example, x is incremented to 1 and value of x is printed. Then the condition is checked to see whether x is less than or equal to 4. If the condition is true, then the action is performed again.