bash shell programming is a lot like any kind of programming, and that includes having variables—containers that hold strings and numbers, which can be changed, compared, and passed around. bash variables have some very special operators that can be used when you refer to a variable. bash also has some important built-in variables, ones that provide important information about the other variables in your script. This chapter takes a look at bash variables and some special mechanisms for referencing variables, and shows how they can be put to use in your scripts.
Variables in a bash script are often written as all-uppercase names, though that is not required—just a common practice. You don’t need to declare them; just use them where you want them. They are basically all of type string, though some bash operations can treat their contents as a number. They look like this in use:
# trivial script using shell variables# (but at least it is commented!)MYVAR="something"echo$MYVAR# similar but with no quotesMY_2ND=anotheroneecho$MY_2ND# quotes are needed here:MYOTHER="more stuff to echo"echo$MYOTHER
There are two significant aspects of bash variable syntax that may not be intuitively obvious. First, in the assignment, the name=value syntax is straightforward enough, but there cannot be any spaces around the equals sign.
Let’s consider for a moment why this is the case. Remember that the main purpose of the shell is to launch programs—you name the program on the command line and that is the program that gets launched. Any words of text that follow that name on the command line are passed along as arguments to the program. For example, when you type:
ls filename
the word ls is the name of the command, and filename is the first and only argument in this example.
Why is that relevant? Well, consider what a variable assignment would look like if you allowed spaces around the equals sign, like this:
MYVAR=something
Can you see that the shell would have a hard time distinguishing between the name of a command to invoke (like in the ls example) and the assignment of a variable? This would be especially true for commands that can use = symbols as one or more of their arguments (e.g., test). So to keep it simple, the shell doesn’t allow spaces around the equals sign in an assignment. The flip side of this is also worth noting—don’t use an equals sign in a filename, especially not one for a shell script (it is possible, just not recommended).
The second aspect of shell variable syntax worth noting is the use of the dollar sign ($) when referring to a variable. You don’t use the dollar sign on the variable name to assign it a value, but you do use the dollar sign to get the value of the variable. (The exception to this is using variables inside a $(( )) expression.) In compiler jargon, this difference in syntax for assigning and retrieving the value is the difference between the L-value and the R-value of the variable (for Left and Right side of an assignment operator).
Once again, the reason for this is simple disambiguation. Consider the following:
MYVAR=somethingechoMYVAR is now MYVAR
As this example tries to point out, how would one distinguish between the literal string MYVAR and the value of the $MYVAR variable? Use quotes, you say? If you were to require quoting around literal strings, then everything would get a lot messier—you would have to quote every non-variable name, which includes commands! Who wants to type:
"ls" "-l" "/usr/bin/xmms"
(Yes, for those of you who thought about trying it, it does work.) So rather than have to put quotes around everything, the onus is put on the variable reference by using the R-value syntax. Put a dollar sign on a variable name when you want to get at the value associated with that variable name:
MYVAR=somethingechoMYVAR is now$MYVAR
Just remember that since everything in bash is a string, we need the dollar sign to indicate a variable reference. We may also want to add braces around the variable name, for reasons we describe in Recipe 5.4.
Some people have described shell syntax, regular expressions, and other parts of shell scripting as write-only syntax, implying that it is nearly impossible to understand the intricacies of many shell scripts.
One of your best defenses against letting your shell scripts fall into this trap is the liberal use of comments (another is the use of meaningful variable names). It helps to put a comment before strange syntax or terse expressions:
# replace the semicolon with a spaceNEWPATH=${PATH/;/}## switch the text on either side of a semicolonsed -e's/^\(.*\);\(.*\)$/\2;\1/'<$FILE
Comments can even be typed in at the command prompt with an interactive shell. This feature can be turned off, but it is on by default. There may be a few occasions when it is useful to make interactive comments.
“shopt Options” in Appendix A to learn how to turn interactive comments on or off
Embed documentation in the script using the “do nothing” builtin (a colon) and a here-document, as illustrated in Example 5-1.
#!/usr/bin/env bash# cookbook filename: embedded_documentationecho'Shell script code goes here'# Use a : NOOP and here document to embed documentation,:<<'END_OF_DOCS'Embedded documentation such as Perl's Plain Old Documentation (POD),or even plain text here.Any accurate documentation is better than none at all.Sample documentation in Perl's Plain Old Documentation (POD) format adapted fromCODE/ch07/Ch07.001_Best_Ex7.1 and 7.2 in the Perl Best Practices example tarball"PBP_code.tar.gz".=head1 NAMEMY~PROGRAM--One-line description here=head1 SYNOPSISMY~PROGRAM [OPTIONS] <file>=head1 OPTIONS-h = This usage.-v = Be verbose.-V = Show version, copyright, and license information.=head1 DESCRIPTIONA full description of the application and its features.May include numerous subsections (i.e., =head2, =head3, etc.)[...]=head1 LICENSE AND COPYRIGHT=cutEND_OF_DOCS
Then, to extract and use that POD documentation, try these commands:
# To read on-screen, automatically paginated $ perldoc myscript # Just the "usage" sections $ pod2usage myscript # Create an HTML version $ pod2html myscript > myscript.html # Create a manpage $ pod2man myscript > myscript.1
Any plain-text documentation or markup can be used this way, either interspersed throughout the code, or better yet, collected at the end of the script. Since computer systems that have bash will probably also have Perl, its Plain Old Documentation (POD) format may be a good choice. Perl usually comes with pod2* programs to convert POD to HTML, LaTeX, manpage, text, and usage files.
Damian Conway’s Perl Best Practices (O’Reilly) has some excellent library module and application documentation templates that could be easily translated into any documentation format, including plain text. See CODE/ch07/Ch07.001_Best_Ex7.1 and 7.2 in that book’s examples tarball.
If you keep all of your embedded documentation at the very bottom of the script, you could also add an exit 0 right before the documentation begins. That will simply exit the script rather than forcing the shell to parse each line looking for the end of the here-document, so it will be a little faster. You need to be careful not to override a previous exit code from a command that failed, though, so consider using set -e. And do not use this trick if you intersperse code and embedded documentation in the body of the script.
Follow these best practices:
Document your script as noted in Recipe 5.1 and Recipe 5.2.
Indent and use vertical whitespace wisely.
Use meaningful variable names.
Use functions (Recipe 10.4), and give them meaningful names.
Break lines at meaningful places at less than 76 characters or so.
Put the most meaningful bits to the left.
Document your intent, not the trivial details of the code. If you follow the rest of the points, the code should be pretty clear. Write reminders, provide sample data layouts or headers, and make a note of all the details that are in your head now, as you write the code. Document the code itself too if it is subtle or obscure.
We recommend indenting using four spaces per level, with no tabs and especially no mixed tabs and spaces. There are many reasons for this, though it often is a matter of personal preference or company standards. Four spaces is always four spaces, no matter how your editor (excepting proportional fonts) or printer is set. It’s big enough to be easily visible as you glance across the script but small enough that you can have several levels of indenting without running the lines off the right side of your screen or printed page. We also suggest indenting continued lines with two additional spaces, or as needed, to make the code the most clear.
Use vertical whitespace, with separators if you like them, to create blocks of similar code. Of course, you’ll do that with functions as well.
Use meaningful names for variables and functions, and spell them out. The only time $i or $x is ever acceptable is in a for loop. You may think that short, cryptic names are saving you time and typing now, but we guarantee that you will lose that time 10- or 100-fold somewhere down the line when you have to fix or modify your script.
Break long lines at around 76 characters. Yes, we know that most screens (or rather, terminal programs) can handle a lot more than that, but 80-character paper and screens are still the default, and it never hurts to have some whitespace to the right of the code. Constantly having to scroll to the right or having lines wrap awkwardly on the screen or printout is annoying and distracting. Don’t cause it.
Unfortunately, there are sometimes exceptions to the long line rule. When creating lines to pass elsewhere, perhaps via Secure Shell (SSH), and in certain other cases, breaking up the line can cause many more code headaches than it solves. But in most cases, it makes sense.
Try to put the most meaningful bits to the left when you break a line—we read shell code left-to-right, so the unusual fact of a continued line will stand out more. It’s also easier to scan down the left edge of the code for continued lines, should you need to find them. Which is more clear?
# Good[-n"$results"]\&&echo"Got a good result in$results"\||echo'Got an empty result, something is wrong'# Also good[-n"$results"]&&echo"Got a good result in$results"\||echo'Got an empty result, something is wrong'# OK, but not ideal[-n"$results"]&&echo"Got a good result in$results"\||echo'Got an empty result, something is wrong'# Bad[-n"$results"]&&echo"Got a good result in$results"||\echo'Got an empty result, something is wrong'# Bad (trailing \s are optional here, but recommended for clarity)[-n"$results"]&&\echo"Got a good result in$results"||\echo'Got an empty result, something is wrong'
You need to print a variable along with other text. You are using the dollar sign in referring to the variable, but how do you distinguish the end of the variable name from other text that follows? For example, say you wanted to use a shell variable as part of a filename, as in:
forFN in12345dosomescript /tmp/rep$FNport.txtdone
How will the shell read that? It will think that the variable name starts with the $ and ends with the punctuation. In other words, it will think that $FNport is the variable name, not the intended $FN.
Because shell variables can contain only alphanumeric characters and the underscore, there are many instances where you won’t need to use the braces. Any whitespace or punctuation (except the underscore) provides enough of a clue to where the variable name ends. But when in doubt, use the braces. In fact, some people would argue that always using the braces is a good habit so you never have to worry about when they are needed or not, and provides a consistent look throughout your scripts. Others find that to be too much typing of characters that are optional but awkward to reach, and think they can make the code look very busy or noisy. Ultimately, it’s a matter of personal preference.
Export variables that you want to pass on to other scripts:
exportMYVARexportNAME=value
Sometimes it’s a good thing that one script doesn’t know about the other script’s variables. If you called a shell script from within a for loop in the first script, you wouldn’t want the second script messing up the iterations of your for loop (which it probably can’t do anyway since it’s almost certainly running in a subshell, but work with us here).
But sometimes you do want the information passed along. In those cases, you can export the variable so that its value is passed along to any other program that the script invokes.
If you want to see a list of all the exported variables, just type the command env (or use the builtin export -p) for a list of each variable and its value. All of these are available for your script when it runs. Many will have already been set up by the bash startup scripts (see Chapter 16 for more on configuring and customizing bash).
You can make the export part of any variable assignment, though that won’t work in old versions of the shell. You can also have the export statement just name the variable that will be exported. Though the export statement can be put anywhere prior to where you need the value to be exported, script writers often group these statements together, like variable declarations, at the top of a script.
Once exported, you can assign repeatedly to the variable without exporting it each time. So, sometimes you’ll see statements like:
exportFNAMEexportSIZEexportMAX ...MAX=2048SIZE=64FNAME=/tmp/scratch
and at other times you’ll see:
exportFNAME=/tmp/scratchexportSIZE=64exportMAX=2048 ...FNAME=/tmp/scratch2 ...FNAME=/tmp/stillexported
One word of caution: the exported variables are, in effect, call by value. Changing the value of the exported variable in the called script does not change that variable’s value back in the calling script.
This begs the question, “How would you pass back a changed value from the called script?” Answer: you can’t.
You can only design your scripts so that they don’t need to do this. What mechanisms have people used to cope with this limitation?
One approach might be to have the called script echo its changed value as output from the script, letting you read the output with the resulting changed value. For example, suppose one script exports a variable $VAL and then calls another script that modifies $VAL. To get the new value of $VAL in the original script, you have to write the changed value to standard output, capture it, and assign it to the variable, as in:
VAL=$(anotherscript)
(See Recipe 10.5 for an explanation of the $() syntax.) You could even change multiple values and echo them each in turn to standard output. The calling program could then use a shell read to capture each line of output one at a time into the appropriate variables. This requires that the called script write no other output to standard output (at least not before or among the variables), however, and sets up a very strong interdependency between the scripts (not good from a maintenance standpoint).
help export
Chapter 16 for more information on configuring and customizing bash
Recipe 10.5, “Using Functions: Parameters and Return Values”
Use the set command to see the values of all variables and function definitions in the current shell.
Use the env (or export -p) command to see only those variables that have been exported and would be available to a subshell.
In bash version 4 or newer, you can also use the declare -p command.
The set command, with no other arguments, produces (on standard output) a list of all the shell variables currently defined along with their values, in a name=value format. The env command is similar. If you run either, you will find a rather long list of variables, many of which you might not recognize. Those variables have been created for you, as part of the shell’s startup process.
The list produced by env is a subset of the list produced by set, since not all variables are exported.
If there are particular variables or values that are of interest, and you don’t want the entire list, just pipe it into a grep command. For example:
set | grep MY
will show only those variables whose name or value has the two-character sequence MY somewhere in it.
The output from the newer declare -p command shows the variable names and values as if they were being declared and initialized. Here is a snippet of output:
$ declare -p ... declare -i MYCOUNT="5" declare -x MYENV="10.5.1.2" declare -r MYFIXED="unchangeable" declare -a MYRA=([0]="5" [1]="10" [2]="15") ... $
The output is in the form of declare statements that could be used as source code in a shell script to recreate these variables and their values.
The various arguments (-i, -x, -r, -a) indicate that the variable is an integer, has been exported, is read-only, or is an array, respectively.
help set
help export
help declare
man env
Chapter 16 for more on configuring and customizing bash
Appendix A for reference lists for all of the built-in shell variables
Use command-line parameters. Any words on the command line that follow the name of a shell script are available to the script as numbered variables. Suppose we have the following script, simplest.sh:
# simple shell scriptecho$1
The script will echo the first parameter supplied on the command line when it is invoked. Here it is in action:
$ cat simplest.sh
# simple shell script
echo ${1}
$ ./simplest.sh you see what I mean
you
$ ./simplest.sh one more time
one
$
The other parameters are available as ${2}, ${3}, ${4}, ${5}, and so on. You don’t need the braces for the single-digit numbers, except to separate the variable name from the surrounding text. Typical scripts have only a handful of parameters, but when you get to ${10} you need to use the braces because the shell will interpret $10 as ${1} followed immediately by the literal string 0, as we see here:
$ cat tricky.sh
echo $1 $10 ${10}
$ ./tricky.sh I II III IV V VI VII VIII IX X XI
I I0 X
$
The tenth argument has the value X, but if you write $10 in your script, the shell will give you $1, the first parameter, followed immediately by a zero, the literal character that you put next to the $1 in your echo statement.
You want to take some set of actions for a given list of arguments. You could write your shell script to perform those actions for one argument and use $1 to reference the parameter. But what if you’d like to do this for a whole bunch of files? You would like to be able to invoke your script like this:
./actall *.txt
knowing that the shell will pattern match and build a list of filenames that match the *.txt pattern (any filename ending with .txt).
Use the shell special variable $* to refer to all of your arguments, and use that in a for loop as in Example 5-2.
#!/usr/bin/env bash# cookbook filename: chmod_all.1## change permissions on a bunch of files#forFN in$*doechochanging$FNchmod0750$FNdone
The variable $FN is our choice; we could have used any shell variable name we wanted there. The $* refers to all the arguments supplied on the command line. For example, if the user types:
./actall abc.txt another.txt allmynotes.txt
the script will be invoked with $1 equal to abc.txt, $2 equal to another.txt, and $3 equal to allmynotes.txt, but $* will be equal to the entire list. In other words, after the shell has substituted the list for $* in the for statement, it will be as if the script had read:
forFN in abc.txt another.txt allmynotes.txtdoechochanging$FNchmod0750$FNdone
The for loop will take the first value from the list, assign it to the variable $FN, and proceed through the list of statements between the do and the done. It will then repeat that loop for each of the other values.
But you’re not finished yet! This script works fine when filenames have no spaces in them, but sometimes you encounter filenames with spaces. Read the next two recipes to see how this script can be improved.
Thanks a lot, Apple! Trying to be user-friendly, the designers popularized the concept of space characters as valid characters in filenames, so users could name their files with names like My Report and Our Dept Data instead of the ugly and unreadable MyReport and Our_Dept_Data. (How could anyone possibly understand what those old-fashioned names meant?) Well, that makes life tough for the shell, where the space is the fundamental separator between words, so filenames were always kept to a single word. Not so anymore.
So how do we handle this?
Where a shell script once had simply ls -l $1, it is better to write ls -l "$1", with quotes around the parameter. Otherwise, if the parameter has an embedded space, it will be parsed into separate words, and only part of the name will be in $1. Let’s show you how this doesn’t work:
$ cat simpls.sh
# simple shell script
ls -l ${1}
$
$ ./simple.sh Oh the Waste
ls: Oh: No such file or directory
$
If we don’t put quotes around the filename when we invoke the script, bash sees three arguments and substitutes the first argument (Oh) for $1. The ls command runs with Oh as its only argument and can’t find that file.
So now let’s put quotes around the filename when we invoke the script:
$ ./simpls.sh "Oh the Waste" ls: Oh: No such file or directory ls: the: No such file or directory ls: Waste: No such file or directory $
Still not good. bash has taken the three-word filename and substituted it for $1 on the ls command line in our script. So far so good. Since we don’t have quotes around the variable reference in our script, however, ls sees each word as a separate argument, (i.e., as separate filenames). Again, it can’t find any of them.
Let’s try a script that quotes the variable reference:
$ cat quoted.sh
# note the quotes
ls -l "${1}"
$
$ ./quoted.sh "Oh the Waste"
-rw-r--r-- 1 smith users 28470 2007-01-11 19:22 Oh the Waste
$
When we quoted the reference "${1}" it was treated as a single word (a single file-name), and the ls then had only one argument—the filename—and it could complete its task.
Chapter 19 for common goofs
Recipe 1.8, “Using Shell Quoting” for tips on shell quoting
Appendix C for more information on command-line processing
OK, you have quotes around your variable as the previous recipe recommended. But you’re still getting errors. It’s just like the script from Recipe 5.8, but it fails when a file has a space in its name:
forFN in$*dochmod0750"$FN"done
It has to do with the $* in the script, used in the for loop. For this case we need to use a different but related shell variable, $@. When it is quoted, the resulting list has quotes around each argument separately. The shell script should be written as shown in Example 5-3.
#!/usr/bin/env bash# cookbook filename: chmod_all.2## change permissions on a bunch of files# with better quoting in case of filenames with spaces#forFN in"$@"dochmod0750"$FN"done
The parameter $* expands to the list of arguments supplied to the shell script. If you invoke your script like this:
myscript these are args
then $* refers to the three arguments these are args. And when it’s used in a for loop, such as:
forFN in$*
the first time through the loop $FN is assigned the first word (these), the second time the second word (are), etc.
If the arguments are filenames and they are put on the command line by pattern matching, as when you invoke the script this way:
myscript *.mp3
then the shell will match all the files in the current directory whose names end with the four characters .mp3, and they will be passed to the script. So consider an example where there are three MP3 files whose names are:
vocals.mp3 cool music.mp3 tophit.mp3
The second song title has a space in the filename between cool and music. When you invoke the script with:
myscript *.mp3
you’ll get, in effect:
myscript vocals.mp3 cool music.mp3 tophit.mp3
If your script contains the line:
forFN in$*
that will expand to:
for FN in vocals.mp3 cool music.mp3 tophit.mp3
which has four words in its list, not three. The second song title has a space as the fifth character (cool music.mp3), and the space causes the shell to see that as two separate words (cool and music.mp3), so $FN will be cool on the second iteration through the for loop. On the third iteration $FN will have the value music.mp3, but that is not the name of your file either, so you’ll get file-not-found error messages.
It might seem logical to try quoting the $*, but this:
forFN in"$*"
will expand to:
forFN in"vocals.mp3 cool music.mp3 tophit.mp3"
and you will end up with a single value for $FN equal to the entire list. You’ll get an error message like this:
chmod: cannot access 'vocals.mp3 cool music.mp3 tophit.mp3': No such file or directory
Instead, you need to use the shell variable $@ and quote it. Left unquoted, $* and $& give you the same thing. But when quoted, bash treats them differently. A reference to $* inside of quotes gives the entire list inside one set of quotes, as we just saw. But a reference to $@ inside of quotes returns not one string but a list of quoted strings, one for each argument.
In our example using the MP3 filenames, this:
forFN in"$@"
will expand to:
forFN in"vocals.mp3""cool music.mp3""tophit.mp3"
You can see that the second filename is now quoted, so that its space will be kept as part of its name and not considered a separator between two words.
The second time through this loop, $FN will be assigned the value cool music.mp3, which has an embedded space. So, be careful how you refer to $FN—you’ll probably want to put it in quotes too, so that the space in the filename is kept as part of that string and not used as a separator. That is, you’ll want to use "$FN", as in:
chmod 0750 "$FN"
Shouldn’t you always use "$@" in your for loop? Well, it’s a lot harder to type, so for quick-and-dirty scripts, when you know your filenames don’t have spaces, it’s probably OK to keep using the old-fashioned $* syntax. For more robust scripting though, we recommend "$@" as the safer way to go. We’ll probably use them interchangeably throughout this book, because even though we know better, old habits die hard—and some of us never use spaces in our filenames! (Famous last words.)
Use the shell builtin variable ${}. Example 5-4 shows some scripting to enforce an exact count of three arguments.
#!/usr/bin/env bash# cookbook filename: check_arg_count## Check for the correct # of arguments:# Use this syntax or use: if [ $# -lt 3 ]if(($#<3))thenprintf"%b""Error. Not enough arguments.\n">&2printf"%b""usage: myscript file1 op file2\n">&2exit1elif(($#>3))thenprintf"%b""Error. Too many arguments.\n">&2printf"%b""usage: myscript file1 op file2\n">&2exit2elseprintf"%b""Argument count correct. Proceeding...\n"fi
And here is what it looks like when we run it, once with too many arguments and once with the correct number of arguments:
$./myscriptmyfileiscopiedintoyourfileError.Toomanyarguments.usage:myscriptfile1opfile2$./myscriptmyfilecopyyourfileArgumentcountcorrect.Proceeding...
After the opening comments (always a helpful thing to have in a script), we have the if test to see whether the number of arguments supplied (found in $#) is greater than three. If so, we print an error message, remind the user of the correct usage, and exit.
The output from the error messages is redirected to standard error. This is in keeping with the intent of standard error as the channel for all error messages.
The script also has a different return value depending on the error that was detected. While not that significant here, it is useful for any script that might be invoked by other scripts, so that there is a programmatic way not only to detect failure (a nonzero exit value), but to distinguish between error types.
One word of caution: don’t confuse ${#} with ${#VAR} or even ${VAR#alt} just because they all use the hash character (#) inside of braces. The first gives the number of arguments, whereas the second gives the length of the value in the variable VAR and the third does a certain kind of substitution.
For any serious shell script, you are likely to have two kinds of arguments—options that modify the behavior of the script and the real arguments you want to work with. You need a way to get rid of the option arguments after you’ve processed them.
For example, you have this script:
forFN in"$@"doechochanging$FNchmod0750"$FN"done
It’s simple enough—it echoes the filename that it is working on, then it changes that file’s permissions. But you want it to work quietly sometimes, not echoing the filename. How can you add an option to turn off this verbose behavior while preserving the for loop?
Use shift to remove an argument after you’ve handled it, as illustrated in Example 5-5.
#!/usr/bin/env bash# cookbook filename: use_up_option## use and consume an option## parse the optional argumentVERBOSE=0if[[$1=-v]]thenVERBOSE=1shiftfi## the real work is here#forFN in"$@"doif((VERBOSE==1))thenechochanging$FNfichmod0750"$FN"done
We add a flag variable, $VERBOSE, to tell us whether or not to echo the filename as we work. But once the shell script has seen the -v and set the flag, we don’t want the -v in the argument list any more. The shift statement tells bash to shift its arguments down one position, getting rid of the first argument ($1) as $2 becomes $1, $3 becomes $2, and so on.
That way, when the for loop runs, the list of parameters (in $@) no longer contains the -v but starts with the next parameter.
This approach of parsing arguments is all right for handling a single option, but if you want more than one option, you need a bit more logic. By convention, options to a shell script should not be dependent on position; e.g., myscript -a -p should be the same as myscript -p -a. Moreover, a robust script should be able to handle repeated options and either ignore them or report an error. For more robust parsing, see the recipe on bash’s getopts builtin (Recipe 13.1).
There are a series of special operators available when referencing a shell variable. The :- operator says that if the specified parameter (here, $1) is not set or is null, whatever follows (/tmp in our example) should be used as the value. Otherwise, it will use the value that is already set. It can be used on any shell variable, not just the positional parameters ($1, $2, $3, etc.), but they are probably the most common use.
Of course, you could do this the long way by constructing an if statement and checking to see if the variable is null or unset (we leave that as an exercise to the reader), but this sort of thing is so common in shell scripts that this syntax has been welcomed as a convenient shorthand.
The bash manpage on parameter substitution
Learning the bash Shell, 3rd Edition, by Cameron Newham (O’Reilly), pages 91–92
Classic Shell Scripting by Nelson H. F. Beebe and Arnold Robbins (O’Reilly), pages 113–114
Your script relies on certain environment variables, either widely used ones (e.g., $USER) or ones specific to your own business. If you want to build a robust shell script, you should make sure that these variables each have a reasonable value. So how do you guarantee a reasonable default value?
The reference to $HOME in the example will return the current value of $HOME unless it is empty or not set at all. In those cases (empty or not set), it will return the value /tmp, which will also be assigned to $HOME so that further references to $HOME will have this new value.
We can see this in action here:
$ echo ${HOME:=/tmp}
/home/uid002
$ unset HOME # generally not wise to do
$ echo ${HOME:=/tmp}
/tmp
$ echo $HOME
/tmp
$ cd ; pwd
/tmp
$
Once we unset the variable, it no longer had any value. When we then used the := operator as part of our reference to it, the new value (/tmp) was substituted. The subsequent references to $HOME returned its new value.
One important exception to keep in mind about the assignment operator: this mechanism will not work with positional parameter arguments (e.g., $1 or $*). For those cases, use :- in expressions like ${1:-default}, which will return the value without trying to do the assignment.
As an aside, it might help you to remember some of these crazy symbols if you think of the visual difference between ${VAR:=value} and ${VAR:-value}. The := will do an assignment as well as returning the value to the right of the operator. The :- will do half of that—it returns the value but doesn’t do the assignment—so its symbol is only half of an equals sign (i.e., one horizontal bar, not two). If this doesn’t help, forget that we mentioned it.
You need to set a default value, but you want to allow an empty string as a valid value. You only want to substitute the default in the case where the value is unset.
The ${:=} operator has two cases where the new value will be used: first, when the value of the shell variable has previously not been set (or has been explicitly unset); and second, where the value has been set but is empty, as in HOME="" or HOME=$OTHER (where $OTHER has no value).
The shell can distinguish between these two cases, and omitting the colon (:) indicates that you want to make the substitution only if the value is unset. If you write only ${HOME=/tmp} without the colon, the assignment will take place only in the case where the variable is not set (never set or explicitly unset).
Let’s play with the $HOME variable again, but this time without the colon in the operator:
$ echo ${HOME=/tmp} # no substitution needed
/home/uid002
$ HOME="" # generally not wise
$ echo ${HOME=/tmp} # will NOT substitute
$ unset HOME # generally not wise
$ echo ${HOME=/tmp} # will substitute
/tmp
$ echo $HOME
/tmp
$
In the case where we simply made the $HOME variable an empty string, the = operator didn’t do the substitution since $HOME did have a value, albeit null. But when we unset the variable, the substitution occurred. If you want to allow for empty strings, use just the = with no colon. Most times, though, the := is used because you can do little with an empty value, deliberate or not.
You can use quite a bit more on the righthand side of these shell variable references. For example:
cd${BASE:="$(pwd)"}
As the example shows, the value that will be substituted doesn’t have to be just a string constant. Rather, it can be the result of a more complex shell expression, including running commands in a subshell (as in the example). In our example, if $BASE is not set, the shell will run the pwd builtin command (to get the current directory) and use the string that it returns as the value.
So what can you do on the righthand side of this (and the other similar) operators? The bash manpage says that what we put to the right of the operator “is subject to tilde expansion, parameter expansion, command substitution, and arithmetic expansion.”
Here is what that means:
Parameter expansion means that we could use other shell variables in this expression, as in ${BASE:=${HOME}}.
Tilde expansion means that we can use an expression like ~bob, and it will expand that to refer to the home directory of the user bob. Use ${BASE:=~uid17} to set the default value to the home directory for user uid17, but don’t put quotes around this string, as that will defeat the tilde expansion.
Command substitution is what we used in the example; it will run the commands and take their output as the value for the variable. Commands are enclosed in the single parentheses syntax, $(cmds).
Arithmetic expansion means that we can do integer arithmetic, using the $(( )) syntax in this expression. Here’s an example:
echo${BASE:=/home/uid$((ID+1))}
Those shorthands for giving a default value are cool, but sometimes you need to force the users to give you a value; otherwise, you don’t want to proceed. Perhaps if they left off a parameter, they don’t really understand how to invoke your script. You want to leave nothing to guesswork. Is there anything shorter than lots of if statements to check each of your several parameters?
Use the ${:?} syntax when referring to the parameters, as in Example 5-6. bash will print an error message and then exit if a parameter is unset or null.
#!/usr/bin/env bash# cookbook filename: check_unset_parms#USAGE="usage: myscript scratchdir sourcefile conversion"FILEDIR=${1:?"Error. You must supply a scratch directory."}FILESRC=${2:?"Error. You must supply a source file."}CVTTYPE=${3:?"Error.${USAGE}"}
Here’s what happens when we run that script with insufficient arguments:
$ ./myscript /tmp /dev/null ./myscript: line 7: 3: Error. usage: myscript scratchdir sourcefile conversion $
The check is made to see if each parameter is set (or null); if not, bash will print an error message and exit.
The third variable uses another shell variable in its message. You can even run another command inside it:
CVTTYPE=${3:?"Error.$USAGE.$(rm$SCRATCHFILE)"}
If parameter three is not set, then the error message will contain the phrase “Error.” along with the value of the variable named $USAGE and then any output from the command that removes the file named by the variable $SCRATCHFILE. OK, so we’re getting carried away. You can make your shell scripts awfully compact, and we do mean awfully. It is better to waste some whitespace and a few bytes to make the logic ever so much more readable, as in:
if[-z"$3"]thenecho"Error.$USAGE"rm$SCRATCHFILEfi
One other consideration: the error message produced by the ${:?} feature comes out with the shell script filename and line number. For example, the script fragment in Example 5-6 produces:
$ ./check_unset_parms ./check_unset_parms: line 5: 1: Error. You must supply a scratch directory. $ ./check_unset_parms somedir /tmp/check_unset_parms: line 6: 2: Error. You must supply a source file. $ ./check_unset_parms somedir somefile ./check_unset_parms: line 7: 3: Error. usage: myscript scratchdir sourcefile \ conversion
Because you have no control over this part of the message, and since it looks like an error in the shell script itself, combined with the issue of readability, this technique is not so popular in commercial-grade shell scripts. (It is handy for debugging, though.)
If you’d rather have this behavior for all variables without having to change each one of them, use the set -u command to “treat unset variables as an error when substituting”:
$ echo "$foo"$ set -u
$ echo "$foo" bash: foo: unbound variable
$ echo $? # exit code 1 $ set +u
$ echo "$foo"
$ echo $? # exit code 0 $
Use a bash parameter expansion feature that will remove text that matches a pattern, as illustrated in Example 5-7.
#!/usr/bin/env bash# cookbook filename: suffixer## rename files that end in .bad to be .bashforFN in *.baddomv"${FN}""${FN%bad}bash"done
The for loop will iterate over a list of filenames in the current directory that all end in .bad. The variable $FN will take the value of each name, one at a time. Inside the loop, the mv command will rename the file (move it from the old name to the new name). We need to put quotes around each filename in case the filename contains embedded spaces.
The crux of this operation is the reference to $FN that includes an automatic deletion of the trailing bad characters. The ${ } delimits the reference so that the bash adjacent to it is just appended right onto the end of the string.
Here it is broken down into a few more steps:
NOBAD="${FN%bad}"NEWNAME="${NOBAD}bash"mv"${FN}""${NEWNAME}"
This way you can see the individual steps of stripping off the unwanted suffix, creating the new name, and then renaming the files. Putting it all on one line isn’t so bad though, once you get used to the special operators.
Since we are not just removing a substring from the variable but are replacing the bad with bash, we might have used the substitution operator for variable references, the slash (/). Similar to editor commands (e.g., those found in vi and sed) that use the slash to delimit substitutions, we could have written:
# Not anchored, don't do thismv"${FN}""${FN/bad/bash}"
(Unlike with the editor commands, you don’t use a final slash—the righthand brace serves that function.)
However, one reason that we didn’t do it this way is because the substitution isn’t anchored, and can be made anywhere in the variable. If, for example, we had a file named subaddon.bad the substitution would leave us with subashdon.bad, which is not what we want. If we used a double slash in place of the first slash, it would substitute every occurrence within the variable. That would result in subashdon.bash, which isn’t what we want either. This is better:
# Add the "." to "anchor" the pattern; this is better, but not foolproofmv"${FN}""${FN/.bad/.bash}"
The ${FN%bad}bash we used in our solution is already anchored—it will only remove the text from the end of the string, which in this case is exactly what we want.
There are several operators that do various sorts of manipulation on the string values of variables when referenced. Table 5-1 summarizes them.
| Inside ${ … } | Action taken |
|---|---|
|
Return a substring of |
|
Return length of string |
|
Remove (shortest) front-anchored pattern |
|
Remove (longest) front-anchored pattern |
|
Remove (shortest) rear-anchored pattern |
|
Remove (longest) rear-anchored pattern |
|
Replace first occurrence |
|
Replace all occurrences |
Try them all. They are very handy.
Use string manipulation:
${MYVAR#-}
This is simple string manipulation. The # searches from the front of the string, looking for, in this case, the minus sign (-). If found, it will remove it. If no minus is found, it simply results in the original value. Either way, that leaves the value without a leading minus, which gives us its magnitude; i.e., its absolute value.
You could use if/then/else logic as a mathematically oriented approach:
# why bother?if((MYVAR <0))thenletMYVAR=MYVAR*-1fi
but as the comment says, why bother? The string manipulation technique is short and sweet. You may want to comment it for readability, though:
MYVAR=${MYVAR#-}# ABS(MYVAR)
Yes, bash can strip the directory path from a shell variable string and leave just the last part of the path (the filename). Where you may want to write:
FILE=$(basename$FULLPATHTOFILE)
instead you need only write:
FILE=${FULLPATHTOFILE##*/}
The big difference between the first and second examples is the braces.
The first example, using parentheses, will launch a subshell to run the executable basename with the argument that is the value of $FULLPATHTOFILE (the old way of doing this was ``). The second example uses curly braces, which is just part of the syntax for evaluating a shell variable—no subshell, no executable file. It looks for, and removes from the front of the string (because of the #), the longest match (because of the double ##) of the pattern described by the asterisk and the slash (*/). The asterisk matches any number of characters and the slash is just a literal slash. In the string /usr/local/bin/mycmd, that pattern will match (and thus remove) the /usr/local/bin/ part of the string, leaving mycmd as the value to be assigned into the variable $FILE.
The basename command will ignore a trailing slash in the path, so $(basename /usr/local/bin/) returns bin whereas our bash version would return an empty string (since the largest pattern to end in a slash is the whole string). To be compatible, we should remove any trailing slash first before the other substitutions.
The real basename command can also take a suffix to be removed as a second argument. In bash we can do that, too, but would need to do it in a separate step. So, a more complete replacement for:
FILE=$(basename$MYIMAGEFILE.jpg)
would be:
FILE=${MYIMAGEFILE%/}# remove a trailing slashFILE=${FILE##*/}# remove all chars up to last /FILE=${FILE%.jpg}# remove .jpg suffix if present
Yes. Use a string manipulation operator to remove the filename—the last part of a path in a string—leaving as much of the directory path to that filename as was in the string:
DIR=${MYPATHTOFILE%/*}
If the variable holds /usr/local/bin/mycmd, we want the result of this manipulation to give us just /usr/local/bin and drop the last part (the filename). Since each piece of the path is separated by a slash, we just remove from the righthand side (because of the %) the shortest string (because there is only one %, not two) that matches the pattern “a slash followed by any number of characters” (/*).
man dirname for other options and subtle differences
If you write LIST="${LIST},${NEWVAL}" inside a loop to build up the list, then the first time (when LIST is null) you’ll end up with a leading comma. You could special-case the initialization of LIST so that it gets the first element before entering the loop, but if that’s not practical, or to avoid duplicate code (for getting a new value), you can instead use the ${:+} syntax in bash:
LIST="${LIST}${LIST:+,}${NEWVAL}"
If ${LIST} is null or unset, then both expressions of $LIST are replaced with nothing.
That means that the first time through the loop LIST will be assigned NEWVAL’s value and nothing more. When LIST is not null, the second expression (${LIST:+,}) is replaced with a comma, separating the previous value from the new value.
Here is an example code segment for reading and constructing a CSV list:
## read names one at a time# and build a comma-separated list#whilereadNEWVALdoLIST="${LIST}${LIST:+,}${NEWVAL}"doneecho$LIST
Yes. bash has an array syntax for single-dimension arrays.
Arrays are easy to initialize if you know the values as you write the script. The format is simple:
MYRA=(first second third home)
Each element of the array is a separate word in the list enclosed in parentheses. Then you can refer to each this way:
echorunners on${MYRA[0]}and${MYRA[2]}
This output is the result:
runners on first and third
If you write only $MYRA, you will get only the first element, just as if you had written ${MYRA[0]}.
Learning the bash Shell, 3rd Edition, by Cameron Newham (O’Reilly), pages 157–161, for more information about arrays
Recipe 7.15, “Counting String Values with bash” for another type of array in bash, associative arrays
As of bash 4.0 there are a few operators to do case conversion when referencing a variable name. If $FN is the variable in which you put a filename (i.e., string) that you want converted to lowercase, then ${FN,,} will return that string in all lowercase. Similarly, ${FN^^} will return the string in all uppercase. There is even the ${FN~~} operator to swap case, changing all lower- to upper- and all upper- to lowercase characters (but why would you want to do that?).
Here is a for loop that will rename all the .JPG files to lowercase names:
forFN in *.JPGdomv"$FN""${FN,,}"done
or as a one-liner:
forFN in *.JPG;domv"$FN""${FN,,}";done
There is another approach, also available in version 4 of bash or newer: you can declare your variable to be a type that is always lowercase. Any text assigned to it will be converted to lowercase. Using that approach our for loop to rename files just does a simple assignment rather than requiring a string substitution operator:
declare-l lcfn# contents will be converted to lowercaseforFN in *.JPGdolcfn="$FN"mv"$FN""$lcfn"done
There are similar declarations for variables that change the case of all letters or only the first letter. Here’s a simple demonstration program to show how they work:
declare-u UP# all UPPERCASEdeclare-l dn# all lowercasedeclare-c Ca# only the first UppercasewhilereadTXTdoUP="${TXT}"dn="${TXT}"Ca="${TXT}"echo$TXT$UP$dn$Cadone
In the case of the variable declared with -c, only the first letter is capitalized even if there are multiple words in the string. Try running it and see how it works.
man rename
The parentheses around $TXT cause it to be treated as array initialization. Whitespace separating the words in the text delineates the array elements. The [@] notation references all the elements of the array at once (individually), and the ^ operator converts the first character (of each element) to uppercase.