These recipes cover tasks that come up in the course of using or administering computers. They are presented here because they don’t fit well anywhere else in the book.
We presented a simple loop to change file extensions in Recipe 5.18; see that recipe for more details. Here is a for loop example:
forFN in *.baddomv"${FN}""${FN%bad}bash"done
What about more arbitrary changes? For example, say you are writing a book and want the chapter filenames to follow a certain format, but the publisher has a conflicting format. You could name the files like chNN=Title=Author.odt, then use a simple for loop and cut in a command substitution to rename them:
fori in *.odt;domv"$i""$(echo"$i"|cut -d'='-f1,3)";done
You should always use quotes around file arguments in case there’s a space. While testing the code in the solution we also used echo and angle brackets to make it very clear what the arguments are (using set -x is also helpful). Once we were very sure our command worked, we removed the angle brackets and replaced echo with mv:
# Testing $ for i in *.odt; do echo "<$i>" "<$(echo "$i" | cut -d'=' -f1,3)>"; done <ch01=Beginning Shell Scripting=JP.odt><ch01=JP.odt> <ch02=Standard Output=CA.odt><ch02=CA.odt> <ch03=Standard Input=CA.odt><ch03=CA.odt> <ch04=Executing Commands=CA.odt><ch04=CA.odt> [...] # Even more testing $ set -x $ for i in *.odt; do echo "<$i>" "<$(echo "$i" | cut -d'=' -f1,3)>"; done ++xtrace 1: echo ch01=Beginning Shell Scripting=JP.odt ++xtrace 1: cut -d= -f1,3 +xtrace 535: echo '<ch01=Beginning Shell Scripting=JP.odt>' '<ch01=JP.odt>' <ch01=Beginning Shell Scripting=JP.odt><ch01=JP.odt> ++xtrace 1: echo ch02=Standard Output=CA.odt ++xtrace 1: cut -d= -f1,3 +xtrace 535: echo '<ch02=Standard Output=CA.odt>' '<ch02=CA.odt>' <ch02=Standard Output=CA.odt><ch02=CA.odt> ++xtrace 1: echo ch03=Standard Input=CA.odt ++xtrace 1: cut -d= -f1,3 +xtrace 535: echo '<ch03=Standard Input=CA.odt>' '<ch03=CA.odt>' <ch03=Standard Input=CA.odt><ch03=CA.odt> ++xtrace 1: echo ch04=Executing Commands=CA.odt ++xtrace 1: cut -d= -f1,3 +xtrace 535: echo '<ch04=Executing Commands=CA.odt>' '<ch04=CA.odt>' <ch04=Executing Commands=CA.odt><ch04=CA.odt> $ set +x +xtrace 536: set +x
We have for loops like this throughout the book since they’re so handy. The trick here is plugging the right values into the arguments to mv, or cp, or whatever. In this case we’d already used the = as a delimiter, and all we cared about was the first field, so it was pretty easy.
To figure out the values you need, use the ls (or find) command to list the files you are working on and pipe them into whatever toolchain seems appropriate—often cut, awk, or sed. bash parameter expansion (Recipe 5.18) is also very handy here:
ls *.odt | cut -d'=' -f1
Hopefully, a recipe somewhere in the book will give you the details you need to come up with the right values for the arguments; then you can just plug all the pieces in and go. Be sure to test using echo first and watch out for spaces or other odd characters in filenames: they’ll get you every time.
Don’t name your script rename. We are aware of at least two different rename commands in major Linux flavors, and there are certainly many others. Red Hat’s util-linux package includes a rename from_string to_string file_name tool. Debian and derivatives include Larry Wall’s Perl-based rename in their Perl packages, and have a related renameutils package. And Solaris, HP-UX, and some BSDs document a rename system call, though that is not easily end user–accessible. Try the rename manpage on your system and see what you get.
Pipe the info command into a useful pager, such as less, but note you will lose info’s link navigation features:
info bash | less
info is basically a standalone version of the Emacs info reader, so if you are an Emacs fan, maybe it will make sense to you. However, piping it into less is a quick and simple way to view the documentation using a tool with which you’re already familiar.
The idea behind Texinfo is good: generate various output formats from a single source. It’s not new, since many other markup languages exist to do the same thing; we even talk about one in Recipe 5.2. But if that’s the case, why isn’t there a TeX to man output filter? Perhaps because manpages follow a standard, structured, and time-tested format while Texinfo is more free-form.
There are other Texinfo viewers and converters if you don’t like info, such as pinfo, info2www, tkman, and even info2man (which cheats and converts to POD and then to manpage format).
Put the pattern in single quotes, because unlike most other Unix commands, unzip handles file globbing patterns itself:
unzip '*.zip'
You could also use a loop to unzip each file:
forx in /path/to/date*/name/*.zip;dounzip"$x";done
or:
forx in$(ls /path/to/date*/name/*.zip 2>/dev/null);dounzip$x;done
Unlike many Unix commands (e.g., gzip and bzip2), the last argument to unzip isn’t an arbitrarily long list of files. To process the command unzip *.zip, the shell expands the wildcard, so (assuming you have files named zipfile1.zip to zipfile4.zip) unzip *.zip expands to unzip zipfile1.zip zipfile2.zip zipfile3.zip zipfile4.zip. This command attempts to extract zipfile2.zip, zipfile3.zip, and zipfile4.zip from zipfile1.zip. The command will fail unless zipfile1.zip actually contains files with those names.
The first method in the Solution section prevents the shell from expanding the wildcard by using single quotes. However, that only works if there is only one wildcard. The second and third methods work around that by running an explicit unzip command for each ZIP file found when the shell expands the wildcards, or returns the result of the ls command.
The ls version is used because the default behavior of bash (and sh) is to return unmatched patterns unchanged. That means you would be trying to unzip a file called /path/to/date*/name/*.zip if no files matched the wildcard pattern. ls will simply return null on STDOUT, and an error that we throw away on STDERR. You can set the shopt -s nullglob option to cause filename patterns that match no files to expand to a null string, rather than themselves.
You run long processes over SSH, perhaps over the WAN, and when you get disconnected you lose a lot of work. Or perhaps you started a long job from work, but need to go home and be able to check on the job later; you could run your process using nohup, but then you won’t be able to reattach to it when your connection comes back or you get home.
Using screen is very simple. Type screen or screen -a. The -a option includes all of screen’s capabilities, at the expense of some redraw (thus bandwidth) efficiency. Honestly, we use -a but have never noticed a difference.
When you do this, it will look like nothing happened, but you are now running inside a screen. echo $SHLVL should return a number greater than one if this worked (see also $SHLVL in Recipe 16.2). To test it, do an ls -la, then kill your terminal (do not exit cleanly, as you will exit screen as well). Log back into the machine and type screen -r to reconnect to screen. If that doesn’t put you back where you left off, try screen -d -r. If that doesn’t work, try ps auwx | grep [s]creen to see if screen is still running, and then try man screen for troubleshooting information—but it should just work. If you run into problems with that ps command on a system other than Linux, see Recipe 17.21.
Starting screen with something like the following will make it easier to figure out what session to reattach to later if necessary:
screen -aS "$(whoami).$(date$$'$$ $$+$$%Y-%m-%d$$_$$%H:%M:%S%z$$'$$)
See the run_screen script in Recipe 16.22.
To exit out of screen and your session, keep typing exit until all the sessions are gone. You can also type Ctrl-A Ctrl-\ or Ctrl-A quit to exit screen itself (assuming you haven’t changed the default meta key of Ctrl-A yet).
According to the screen website:
Screen is a full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells). Each virtual terminal provides the functions of the DEC VT100 terminal and, in addition, several control functions from the ANSI X3.64 (ISO 6429) and ISO 2022 standards (e.g., insert/delete line and support for multiple character sets). There is a scrollback history buffer for each virtual terminal and a copy-and-paste mechanism that allows the user to move text regions between windows.
That means you can have more than one session in a single SSH terminal (think DeskView on i286/386). But it also allows you to SSH into a machine, start a process, disconnect your terminal and go home, then reconnect and pick up—not where you left off, but where the process has continued to. And it allows multiple people to share a single session for training, troubleshooting, or collaboration (see Recipe 17.5).
screen is often installed by default on Linux, but rarely on other systems. The screen binary must run as SUID root so it can write to the appropriate /usr/dev pseudoterminals (PTYs). If screen doesn’t work, this is a likely reason why (to fix it, run the command chmod u+s /usr/bin/screen as root).
Also, screen interferes with inline transfer protocols like zmodem. Newer versions of screen have configuration settings that deal with this; see the manpages.
The default Emacs mode of bash command-line editing uses Ctrl-A to go to the start of the line. That’s also the screen command mode, or meta key, so if you use Ctrl-A a lot (like we do), you may want to add the following to your ~/.screenrc file:
# Sample settings for ~/.screenrc# Change the C-a default to C-n (use C-n n to send literal ^N)escape ^Nn# Yes annoying audible bell, pleasevbell off# Detach on hangupautodetach on# Make the shell in every window a login shellshell -$SHELL
Use GNU screen in multiuser mode. The following assumes that you have not changed the default meta key from Ctrl-A, as described in Recipe 17.4. If you have, then use your new meta key (e.g., Ctrl-N) instead.
As the host, do the following:
Enter screen -S session_name (no spaces allowed); e.g., screen -S training.
Type Ctrl-A addacl usernames, listing the accounts (comma-delimited, no spaces!) that may access the display; e.g., Ctrl-A addacl alice,bob,carol. Note this allows full read/write access.
Use the Ctrl-A chacl usernames permbits list command to refine permissions if needed.
Turn on multiuser mode with Ctrl-A multiuser on.
As the viewer, do this:
Use screen -x user/name to connect to a shared screen; e.g., screen -x host/training.
Hit Ctrl-A K to kill the window and end the session.
See Recipe 17.4 for necessary details.
For multiuser mode, /tmp/screens must exist and be world-readable and executable.
screen versions 3.9.15-8 to 4.0.1-1 from Red Hat (i.e., RHEL3) are broken and should not be used if you want multiuser mode to work. Version 4.0.2-5 or later should work; for example, http://bit.ly/2y9ufL4 (or later) works even on RHEL3. Once you start using the new version of screen, existing screen sockets in $HOME/.screen are not found and are thus orphaned and unusable. Log out of all sessions, and use the new version to create new sockets in /tmp/screens/S-$USER, then remove the $HOME/.screen directory.
You need to capture all the output from an entire session or a long batch job.
There are many ways to solve this problem, depending on your needs and environment.
The simplest solution is to turn on logging to memory or disk in your terminal program. The problems with that are that your terminal program may not allow it, and when it gets disconnected you lose your log.
The next simplest solution is to modify the job to log itself, or redirect the entire thing to tee or a file. For example, one of the following might work:
long_noisy_job >&log_file long_noisy_job 2>&1|tee log_file(long_noisy_job)>&log_file(long_noisy_job)2>&1|tee log_file
The problems here are that you may not be able to modify the job, or the job itself may do something that precludes these solutions (e.g., if it requires user input, it could get stuck asking for the input before the prompt is actually displayed). That can happen because STDOUT is buffered, so the prompt could be in the buffer waiting to be displayed when more data comes in, but no more data will come in since the program is waiting for input.
The third solution is to use an interesting program called script that exists for this very purpose, and its probably already on your system. You run script, and it logs everything that happens to the logfile (called a typescript) you’ve given it, which is OK if you want to log the entire session—just start script, then run your job. But if you only want to capture part of the session, there is no way to have your code start script, run something to log it, then stop script again. You can’t script script because once you run it, you’re in a subshell at a prompt (i.e., you can’t do something like script file_to_log_to some_command_to_run).
Our final solution uses the terminal multiplexer screen. With screen, you can turn whole session logging on or off from inside your script. Once you are already running screen, do the following in your script:
# Set a logfile and turn on loggingscreen -X logfile /path/to/logfile&&screen -X log on# Your commands here# Turn logging back offscreen -X logfile1# Set buffer to 1 secsleep3# Wait to avoid file truncation...screen -X log off
We suggest you try the solutions in order, and use the first one that meets your needs. Unless you have very specific needs, script will probably work. But just in case, it can be handy to know about the screen option.
man script
man screen
Put the clear command in your ~/.bash_logout (Example 17-1, reproduced from Recipe 16.22).
# cookbook filename: bash_logout# settings/bash_logout: execute on shell logout# Clear the screen on logout to prevent information leaks, if not already# set as an exit trap elsewhere[-n"$PS1"]&&clear
Or set a trap to run clear on shell termination:
# Trap to clear the screen on exit from the shell to prevent# information leaks, if not already set in ~/.bash_logouttrap' [ -n "$PS1" ] && clear '0
Note that if you are connecting remotely and your client has a scrollback buffer, whatever you were working on may still be in there. clear also has no effect on your shell’s command history.
Setting a trap to clear the screen is probably overkill, but could conceivably cover an error situation in which ~/.bash_logout is not executed. If you are really paranoid you can set both, but in that case you may also wish to look into TEMPEST and Faraday cages.
If you skip the test to determine whether the shell is interactive, you’ll get errors like these under some circumstances:
# e.g., from tputNo valuefor$TERMand no -T specified# e.g., from clearTERM environment variable not set.
You want to create a list of files and details about them for archive purposes; for example, to verify backups, recreate directories, etc. Or maybe you are about to do a large chmod -R and need a backout plan, or perhaps you keep /etc/* in a revision control system that does not preserve permissions or ownership.
Use GNU find with some printf formats, as seen in Example 17-2.
#!/usr/bin/env bash# cookbook filename: archive_meta-dataprintf"%b""Mode\tUser\tGroup\tBytes\tModified\tFileSpec\n"> archive_file find /\(-path /proc -o -path /mnt -o -path /tmp -o -path /var/tmp\-o -path /var/cache -o -path /var/spool\)-prune\-o -type d -printf'd%m\t%u\t%g\t%s\t%t\t%p/\n'\-o -type l -printf'l%m\t%u\t%g\t%s\t%t\t%p -> %l\n'\-o -printf'%m\t%u\t%g\t%s\t%t\t%p\n'>> archive_file
Note that the -printf expression is in the GNU version of find.
The (-path /proc -o -path…) -prune part removes various directories you probably don’t want to bother with. -type d is for directories. The printf format is prefixed with a d, then uses an octal mode, user, group, and so forth. -type l is for symbolic links and also shows you where each link points. With the contents of this file and some additional scripting, you can determine at a high level if anything has changed, or recreate mangled ownership or permissions. Note that this does not take the place of more security-oriented programs like Tripwire, AIDE, or Samhain.
Use the find command in conjunction with head, grep, or other commands that can parse out comments or summary information from each file.
For example, if the second line of all your shell scripts follows the format “name— description” then this example will create a nice index:
fori in$(grep -El'#![[:space:]]?/bin/sh'*);dohead -2$i|tail -1;done
As noted, this technique depends on each file having some kind of summary information, such as comments, that may be parsed out. We then look for a way to identify the type of file, in this case a shell script, and grab the second line of each file.
If the files do not have easily parsed summary information, you can try something like this and manually work through the output to create an index:
fordir in$(find . -type d);dohead -15$dir/*;done
Watch out for binary files!
man find
man grep
man head
man tail
If you are creating a simple patch for a single file, use:
$ diff -u original_file modified_file > your_patch $
If you are creating a patch for multiple files in parallel directory structures, use:
$ cp -pR original_dirs/ modified_dirs/ $ # Make changes here $ diff -Nru original_dirs/ modified_dirs/ > your_comprehensive_patch $
To be especially careful, force diff to treat all files as ASCII using -a, and set your language and time zone to the universal defaults as shown:
$ LC_ALL=C TZ=UTC diff -aNru original_dirs/ modified_dirs/ \ > > your_comprehensive_patch $ $ LC_ALL=C TZ=UTC diff -aNru original_dirs/ modified_dirs/ diff -aNru original_dirs/changed_file modified_dirs/changed_file --- original_dirs/changed_file 2006-11-23 01:04:07.000000000 +0000 +++ modified_dirs/changed_file 2006-11-23 01:04:35.000000000 +0000 @@ -1,2 +1,2 @@ This file is common to both dirs. -But it changes from one to the other. +But it changes from 1 to the other. diff -aNru original_dirs/only_in_mods modified_dirs/only_in_mods --- original_dirs/only_in_mods 1970-01-01 00:00:00.000000000 +0000 +++ modified_dirs/only_in_mods 2006-11-23 01:05:58.000000000 +0000 @@ -0,0 +1,2 @@ +While this file is only in the modified dirs. +It also has two lines, this is the last. diff -aNru original_dirs/only_in_orig modified_dirs/only_in_orig --- original_dirs/only_in_orig 2006-11-23 01:05:18.000000000 +0000 +++ modified_dirs/only_in_orig 1970-01-01 00:00:00.000000000 +0000 @@ -1,2 +0,0 @@ -This file is only in the original dirs. -It has two lines, this is the last.
To apply a patch file, cd to the directory of the single file or to the parent of the directory tree and use the patch command:
$ cd /path/to/files $ patch -Np1 < your_patch
The -N argument to patch prevents it from reversing patches or reapplying patches that have already been made. -p number removes number of leading directories to allow for differences in directory structure between whoever created the patch and whoever is applying it. Using -p1 will often work; if not, experiment with -p0, then -p2, etc. It’ll either work or complain and ask you what to do, in which case you cancel and try something else unless you really know what you are doing.
diff can produce output in various forms, some of which are more useful than others. Unified output, using -u, is generally considered the best because it is both reasonably human-readable and very robust when used with patch. It provides three lines of context around the change, which allows a human reader to get oriented, and allows the patch command to work correctly even if the file to be patched is different from the one used to create the patch. As long as the context lines are intact, patch can usually figure it out. Context output, using -c, is similar to -u output but is more redundant and not quite as easy to read. The ed format, using -e, produces a script suitable for use with the ancient ed editor. Finally, the default output is similar to the ed output, with a little more human-readable context:
# Unified format (preferred) $ diff -u original_file modified_file --- original_file 2006-11-22 19:29:07.000000000 -0500 +++ modified_file 2006-11-22 19:29:47.000000000 -0500 @@ -1,9 +1,9 @@ -This is original_file, and this line is different. +This is modified_file, and this line is different. This line is the same. So is this one. And this one. Ditto. -But this one is different. +But this 1 is different. However, not this line. And this is the last same, same, same. # Context format $ diff -c original_file modified_file *** original_file Wed Nov 22 19:29:07 2006 --- modified_file Wed Nov 22 19:29:47 2006 *************** *** 1,9 **** ! This is original_file, and this line is different. This line is the same. So is this one. And this one. Ditto. ! But this one is different. However, not this line. And this is the last same, same, same. --- 1,9 --- ! This is modified_file, and this line is different. This line is the same. So is this one. And this one. Ditto. ! But this 1 is different. However, # 'ed' format $ diff -e original_file modified_file 6c But this 1 is different. . 1c This is modified_file, and this line is different. . # Normal format $ diff original_file modified_file 1c1 < This is original_file, and this line is different. --- > This is modified_file, and this line is different. 6c6 < But this one is different. --- > But this 1 is different.
The -r and -N arguments to diff are simple yet powerful. -r means, as usual, recursive operation though the directory structure, while -N causes diff to pretend that any file found in one directory structure also exists in the other as an empty file. In theory, that has the effect of creating or removing files as needed; however, in practice -N is not supported on all systems (notably Solaris) and it may end up leaving zero-byte files lying around on others. Some versions of patch default to using -b, which leaves lots of .orig files laying around, and some versions (notably Linux) are less chatty than others (notably BSD). Many versions (not Solaris) of diff also support the -p argument, which tries to show which C function the patch affects.
Resist the urge to do something like diff -u prog.c.orig prog.c. This has the potential to cause all kinds of confusion since patch may also create .orig files. Also resist the urge to do something like diff -u prog/prog.c new/prog/prog.c, since patch will get very confused about the unequal number of directory names in the paths.
man diff
man patch
man cmp
http://furius.ca/xxdiff/ for a great GUI diff (and more) tool
Count the hunks (i.e., sections of changed data) in diff’s output:
$ diff -C0 original_file modified_file | grep -c "^\*\*\*\*\*" 2 $ diff -C0 original_file modified_file *** original_file Fri Nov 24 12:48:35 2006 --- modified_file Fri Nov 24 12:48:43 2006 *************** *** 1 **** ! This is original_file, and this line is different. --- 1 --- ! This is modified_file, and this line is different. *************** *** 6 **** ! But this one is different. --- 6 --- ! But this 1 is different.
If you only need to know whether the files are different and not how many differences there are, use cmp. It will exit at the first difference, which can save time on large files. Like diff, it is silent if the files are identical, but it reports the location of the first difference if not:
$ cmp original_file modified_file original_file modified_file differ: char 9, line 1
Hunk is actually the technical term, though we’ve also seen hunks referred to as chunks in some places. Note that it is possible, in theory, to get slightly different results for the same files across different machines or versions of diff, since the number of hunks is a result of the algorithm diff uses. You will certainly get different answers when using different diff output formats, as demonstrated in the following examples.
We find a zero-context contextual diff to be the easiest to use for this purpose, and using -C0 instead of -c creates fewer lines for grep to have to search. A unified diff tends to combine more changes than expected into one hunk, leading to fewer differences being reported:
$ diff -u original_file modified_file | grep -c "^@@" 1 $ diff -u original_file modified_file --- original_file 2006-11-24 12:48:35.000000000 -0500 +++ modified_file 2006-11-24 12:48:43.000000000 -0500 @@ -1,8 +1,8 @@ -This is original_file, and this line is different. +This is modified_file, and this line is different. This line is the same. So is this one. And this one. Ditto. -But this one is different. +But this 1 is different. However, not this line. And this is the last same, same, same.
A normal or ed-style diff works too, but the grep pattern is more complicated. Though not shown in this example, a multiline change in normal grep output might look like 2,3c2,3, thus requiring character classes and more typing than is the case using -C0:
$ diff -e original_file modified_file | egrep -c '^[[:digit:],]+[[:alpha:]]+' 2 $ diff original_file modified_file | egrep -c '^[[:digit:],]+[[:alpha:]]+' 2 $ diff original_file modified_file 1c1 < This is original_file, and this line is different. --- > This is modified_file, and this line is different. 6c6 < But this one is different. --- > But this 1 is different.
man diff
man cmp
man grep
You need to remove or rename a file that was created with a special character that causes rm or mv to behave in unexpected ways. The canonical example of this is any file starting with a dash, such as -f or --help, which will cause any command you try to use to interpret the filename as an argument.
If the filename begins with a dash, use -- to signal the end of arguments to the command, or use a full (/tmp/-f) or relative (./-f) path. If the file contains other special characters that are interpreted by the shell, such as a space or asterisk, use shell quoting. If you use filename completion (the Tab key by default), it will automatically quote special characters for you. You can also use single quotes around the troublesome name:
$ ls
--help this is a *crazy* file name!
$ mv --help help
mv: unknown option -- -
usage: mv [-fiv] source target
mv [-fiv] source ... directory
$ mv -- --help my_help
$ mv this\ is\ a\ \*crazy\*\ file\ name\! this_is_a_better_name
$ ls
my_help this_is_a_better_name
To understand what is actually being executed after shell expansion, preface your command with echo:
$ rm * rm: unknown option -- - usage: rm [-f|-i] [-dPRrvW] file ... $ echo rm * rm --help this is a *crazy* file name!
You can also create a file named -i in a directory to prevent rm * from deleting all the files without asking first:
$ mkdir del-test ; cd $_ $ > -i $ touch important_file $ ll total 0 -rw-r--r-- 1 jp jp 0 Jun 12 22:28 -i -rw-r--r-- 1 jp jp 0 Jun 12 22:28 important_file $ rm * rm: remove regular empty file 'important_file'? n
Question 11 in the GNU Core Utilities FAQ
Sections 2.1 and 2.2 of the Unix FAQs
Use cat in a subshell:
temp_file="temp.$RANDOM$RANDOM$$"(echo'static header line1';cat data_file)>$temp_file\&&cat$temp_file> data_file rm$temp_fileunsettemp_file
You could also use sed, the streaming editor. To prepend static text, note that back-slash escape sequences are expanded in GNU sed but not in some other versions. Also, under some shells the trailing backslashes may need to be doubled:
# Any sed, e.g., Solaris 10 /usr/bin/sed $ sed -e '1i\ > static header line1 > ' data_file static header line1 1 foo 2 bar 3 baz $ sed -e '1i\ > static header line1\ > static header line2 > ' data_file static header line1 static header line2 1 foo 2 bar 3 baz # GNU sed $ sed -e '1istatic header line1\nstatic header line2' data_file static header line1 static header line2 1 foo 2 bar 3 baz
To prepend an existing file:
$ sed -e '$r data_file' header_file Header Line1 Header Line2 1 foo 2 bar 3 baz
This one seems to be a love/hate kind of thing. People either love the cat solution or love the sed solution, but not both. The cat version is probably faster and simpler; the sed solution is arguably more flexible.
You can also store a sed script in a file, instead of leaving it on the command line. Of course, you would usually redirect the output into a new file, like sed -e '$r data' header > new_file, but note that will change the file’s inode and may change other attributes, such as permissions or ownership. To preserve everything but the inode, use -i for in-place editing if your version of sed supports that. Don’t use -i with the reversed header file prepend form shown previously, though, or you will edit your header file! Also note that Perl has a similar -i option that also writes a new file, though Perl itself works rather differently than sed for this example:
# Show inode $ ls -i data_file 509951 data_file $ sed -i -e '1istatic header line1\nstatic header line2' data_file $ cat data_file static header line1 static header line2 1 foo 2 bar 3 baz # Verify inode has changed $ ls -i data_file 509954 data_file
To preserve everything (or if your sed does not have -i or you want to use the prepend file method mentioned earlier):
# Show inode $ ls -i data_file 509951 data_file # $RANDOM is bash-only; you can use mktemp on other systems $ temp_file=$RANDOM$RANDOM $ sed -e '$r data_file' header_file > $temp_file # Only cat if the source exists and is not empty! $ [ -s "$temp_file" ] && cat $temp_file > data $ unset temp_file $ cat data_file Header Line1 Header Line2 1 foo 2 bar 3 baz # Verify inode has NOT changed $ ls -i data_file 509951 data
Prepending a header file to a datafile is interesting because it’s rather counterintuitive. If you try to read the header_file file into the data_file file at line one, you get this:
$ sed -e '1r header_file' data_file 1 foo Header Line1 Header Line2 2 bar 3 baz
So instead, we simply append the data to the header file and write the output to another file. Again, don’t try to use sed -i or you will edit your header file.
Another way to prepend data is to use cat reading from STDIN with a here-document or a here-string. Note that here-strings are only available in bash 2.05b or newer, and they don’t do backslash escape sequence expansion, but they avoid all the sed version issues:
# Using a here-document $ cat - data_file <<EoH > Header line1 > Header line2 > EoH Header line1 Header line2 1 foo 2 bar 3 baz # Using a here-string in bash-2.05b+, no backslash escape sequence expansion $ cat - data_file <<<'Header Line1' Header Line1 1 foo 2 bar 3 baz
This is trickier than it sounds because many tools you might ordinarily use, such as sed, will write to a new file (thus changing the inode) even if they go out of their way to preserve other attributes.
The obvious solution is to simply edit the file and make your updates. However, we admit that that may be of limited use in a scripting situation. Or is it?
In Recipe 17.13, you saw that sed writes a brand new file one way or another; however, there is an ancestor of sed that doesn’t do that. It’s called, anticlimactically, ed, and it is just as ubiquitous as its other famous descendant, vi. And interestingly, ed is scriptable. So here is our “prepend a header” example again, this time using ed:
# Show inode $ ls -i data_file 306189 data_file # Use printf "%b" to avoid issues with 'echo -e' or not $ printf "%b" '1i\nHeader Line1\nHeader Line2\n.\nw\nq\n' | ed -s data_file 1 foo $ cat data_file Header Line1 Header Line2 1 foo 2 bar 3 baz # Verify inode has NOT changed $ ls -i data_file 306189 data_file
Of course you can store an ed script in a file, just as you can with sed. In this case, it might be useful to see what that file looks like, to explain the mechanics of the ed script:
$ cat ed_script 1i Header Line1 Header Line2 . w q $ ed -s data_file < ed_script 1 foo $ cat data_file Header Line1 Header Line2 1 foo 2 bar 3 baz
The 1i in the ed script means to go to the first line and then go into insert mode, and the next two lines are literal. A single . all by itself on a line exits insert mode, w writes the file, and q quits. The -s suppresses diagnostic output, specifically for use in scripts.
One disadvantage to ed is that there isn’t that much documentation for it anymore. It’s been around since the beginning of Unix, but it’s not commonly used anymore even though it exists on every system we checked. Since both vi (via ex) and sed (spiritually at least1) are descended from ed, however, you should be able to figure out anything you might want to do. Note that ex is a symbolic link to vi or a variant on many systems, while ed is just ed.
Another way to accomplish the same effect is to use sed or some other tool, write the changed file into a new file, then cat it back into the original file. This is obviously inefficient. It is also easier to say than to do safely because if the change fails for any reason you could end up writing nothing back over the original file (see the example in Recipe 17.13).
man ed
man ex
ls -l which ex
Use sudo to run a subshell in which you may group your commands and use pipe-lines and redirection:
sudo bash -c 'command1 && command2 || command3'
This requires the ability to run a shell as root. If you can’t, have your system administrator write a quick script and add it to your sudo privilege specification.
If you try something like sudo command1 && command2 || command3 you’ll find that command2 and command3 are running as you, not as root. That’s because sudo’s influence only extends to the first command and your shell is doing the rest. That is, the sudo command only extends as far as the ampersands; the shell sees them as the separator between commands.
Note the use of the -c argument to bash, which causes it to just execute the given commands and exit. Without that you will just end up running a new interactive root shell, which is probably not what you wanted. With -c you are still running a noninteractive root shell, and you need to have the sudo rights to do that. macOS and some Linux distributions, such as Ubuntu, actually disable the root user to encourage you to only log in as a normal user and sudo as needed (the Mac hides this better) for administration. If you are using an OS like that, or have rolled your own sudo setup, you should be fine. However, if you are running a locked-down environment, this recipe may not work for you.
To learn whether you may use sudo and what you are and are not allowed to do, use sudo -l. Almost any other use of sudo will probably trigger a security message to your administrator tattling on you. You can try using sudo sudo -V | less as a regular user or just sudo -V | less if you are already root to get a lot of information about how sudo is compiled and configured on your system.
Sort the files and isolate the data of interest using cut or awk if necessary, and then use comm, diff, grep, or uniq depending on your needs.
comm is designed for just this type of problem:
$ cat left record_01 record_02.left only record_03 record_05.differ record_06 record_07 record_08 record_09 record_10 $ cat right record_01 record_02 record_04 record_05 record_06.differ record_07 record_08 record_09.right only record_10 # Only show lines in the left file $ comm -23 left right record_02.left only record_03 record_05.differ record_06 record_09 # Only show lines in the right file $ comm -13 left right record_02 record_04 record_05 record_06.differ record_09.right only # Only show lines common to both files $ comm -12 left right record_01 record_07 record_08 record_10
diff will quickly show you all the differences from both files, but its output is not terribly pretty and you may not need to know all the differences. GNU diff’s -y and -W options can be handy for readability, but you can get used to the regular output as well:
$ diff -y -W 60 left right record_01 record_01 record_02.left only | record_02 record_03 | record_04 record_05.differ | record_05 record_06 | record_06.differ record_07 record_07 record_08 record_08 record_09 | record_09.right only record_10 record_10 $ diff -y -W 60 --suppress-common-lines left right record_02.left only | record_02 record_03 | record_04 record_05.differ | record_05 record_06 | record_06.differ record_09 | record_09.right only $ diff left right 2,5c2,5 < record_02.left only < record_03 < record_05.differ < record_06 --- > record_02 > record_04 > record_05 > record_06.differ 8c8 < record_09 --- > record_09.right only
Some systems (e.g., Solaris) may use sdiff instead of diff -y or have a separate binary such as bdiff to process very large files.
grep can show you when lines exist only in one file and not the other, and you can figure out which file if necessary. But since it’s doing regular expression matches, it will not be able to handle differences within the line unless you edit the file that becomes the pattern file, and it will also get very slow as the file sizes grow.
This example shows all the lines that exist in the file left but not in the file right:
$ grep -vf right left record_03 record_06 record_09
Note that only “record_03” is really missing; the other two lines are simply different. If you need to detect such variations, you’ll need to use diff. If you need to ignore them, use cut or awk as necessary to isolate the parts you need into temporary files.
uniq -u can show you only lines that are unique in the files, but it will not tell you which file the line came from (if you need to know that, use one of the previous solutions). uniq -d will show you only lines that exist in both files:
$ sort right left | uniq -u record_02 record_02.left only record_03 record_04 record_05 record_05.differ record_06 record_06.differ record_09 record_09.right only $ sort right left | uniq -d record_01 record_07 record_08 record_10
man cmp
man diff
man grep
man uniq
Create an ordered list of the objects, pass them as arguments to a function, shift the arguments by N, and return the remainder, as shown in Example 17-3.
# cookbook filename: func_shift_by# Pop a given number of items from the top of a stack,# such that you can then perform an action on whatever is left.# Called like: shift_by <# to keep> <ls command, or whatever># Returns: the remainder of the stack or list## For example, list some objects, then keep only the top 10.## It is CRITICAL that you pass the items in order with the objects to# be removed at the top (or front) of the list, since all this function# does is remove (pop) the number of entries you specify from the top# of the list.## You should experiment with echo before using rm!## For example:# rm -rf $(shift_by $MAX_BUILD_DIRS_TO_KEEP $(ls -rd backup.2006*))#functionshift_by{# If $1 is zero or greater than $#, the positional parameters are# not changed. In this case that is a BAD THING!if(($1==0||$1>($#-1)));thenecho''else# Remove the given number of objects (plus 1) from the list.shift$(($1+1))# Return whatever is left.echo"$*"fi}
If you try to shift the positional parameters by zero or by more than the total number of positional parameters ($#), shift will do nothing. If you are using shift to process a list then delete what it returns, that will result in you deleting everything. Make sure to test the argument to shift to make sure that it’s not zero and it is greater than the number of positional parameters. Our shift_by function does this.
For example:
$ source shift_by
$ touch {1..9}
$ ls ?
1 2 3 4 5 6 7 8 9
$ shift_by 3 $(ls ?)
4 5 6 7 8 9
$ shift_by 5 $(ls ?)
6 7 8 9
$ shift_by 5 $(ls -r ?)
4 3 2 1
$ shift_by 7 $(ls ?)
8 9
$ shift_by 9 $(ls ?)
# Keep only the last 5 objects
$ echo "rm -rf $(shift_by 5 $(ls ?))"
rm -rf 6 7 8 9
# In production we'd test this first! See discussion.
$ rm -rf $(shift_by 5 $(ls ?))
$ ls ?
1 2 3 4 5
Make sure you fully test both the argument returned and what you intend to do with it. For example, if you are deleting old data, use echo to test the command that would be performed before doing it live. Also test that you have a value at all, or else you could end up doing rm -rf and getting an error. Never do something like rm -rf /$variable, because if $variable is ever null you will start deleting the root directory, which is particularly bad if you are running as root!
Using the function in the solution to delete files in production might look like this:
$files_to_nuke=$(shift_by 5 $(ls ?)) [ -n $files_to_nuke ] && rm -rf "$files_to_nuke"
This recipe takes advantage of the fact that arguments to a function are affected by the shift command inside that function, which makes it trivial to pop objects off the stack (otherwise we’d have to do some fancy substring or for loop operations). We must shift by n+1 because the first argument ($1) is actually the count of the items to shift, leaving $2..N as the number of objects in the stack. We could also write it more verbosely this way:
functionshift_by{shift_count=$1shiftshift$shift_countecho"$*"}
It’s possible you may run afoul of your system’s ARG_MAX (see Recipe 15.13 for details) if the paths to the objects are very long or you have a very large number of objects to handle. In the former case, you may be able to create some breathing room by changing directories closer to the objects to shorten the paths, or by using symbolic links. In the latter case, you can use this more complicated for loop:
objects_to_keep=5counter=1forfile in /path/with/many/many/files/*e*;doif[$counter-gt$objects_to_keep];thenremainder="$remainder$file"fi((counter++))done[-n"$remainder"]&&echo"rm -rf$remainder"
A common method of doing a similar operation is a trickle-down scheme such as the following:
rm -rf backup.3/ mv backup.2/ backup.3/ mv backup.1/ backup.2/ cp -al backup.0/ backup.1/
This works very well in many cases, especially when combined with hard links to conserve space while allowing multiple backups—see Hack #42 in Rob Flickenger’s Linux Server Hacks (O’Reilly). However, if the number of existing objects fluctuates or is not known in advance, this method won’t work.
help for
help shift
Linux Server Hacks by Rob Flickenger (O’Reilly), Hack #42
Recipe 15.13, “Working Around “Argument list too long” Errors”
Write the data into circular set of files or directories, such as days of the week or month, or months. You also need to have a way to clear the old data when you circle around again.
This will only work if you have some well-defined series that can be circular, such as hours of the day, days of the week, days of the month, or months. But it turns out that those cover a lot of ground.
It helps to start with an example, so circular days of the week logfiles might look like this:
1_Mon.log 2_Tue.log 3_Wed.log 4_Thu.log 5_Fri.log 6_Sat.log 7_Sun.log
We use the slightly odd strftime format %u_%a to make the files sort in a human-readable way (yes, sort can handle days of the week, but ls can’t). Then all of Monday’s log messages go into 1_Mon.log, and so on, and on Sunday at midnight we wrap around to Monday again.
Typical formats include:
$ printf "%(%u_%a)T" # day of week 2_Tue $ printf "%(%d)T" # day of month 06 $ printf "%(%m_%b)T" # month 12_Dec
The only tricky part is clearing out the data from last Monday before you start writing data for this Monday. If you have a log statement that is always the first to run on a new day, then have that statement truncate the output file using > instead of the >> you need to use to append everywhere else. But watch out for race conditions—it really has to be guaranteed to be the very first log line of the correct day. Perhaps a safer way is to use a cron job to delete tomorrow’s data a few minutes before midnight. There’s no race condition there, since you know the last time you wrote to that file was a week (or whatever period) ago, but there is a risk that if the cron job fails to run correctly that data will not be purged.
Another way to do it is to have every call to the logging function delete the data for tomorrow. This is robust but inefficient, since most of the time there will be nothing to delete. It also reduces the window to N–1, since “tomorrow” is always deleted.
For example:
functionmylog{localtoday tomorrow# Log for todayprintf-v today"%(%u_%a)T"echo"$*">>$HOME/weekly_logs/$today.txt# e.g., 1_Mon# Purge data from tomorrowtomorrow=$(date -d'tomorrow''+%u_%a')rm -f$HOME/weekly_logs/$tomorrow.txt}
Note how we use both the bash builtin printf %(strftime format)T and the GNU date command with the very useful -d or --date argument of tomorrow. Using printf is more efficient since bash already knows what time it is and there is no need for a subshell and external program, but that can’t tell you what tomorrow will be.
Here are some example cron entries for a script that just keeps an eye on something:
# Keep an eye on whatever it is every hour...06****/home/user/report/keep-an-eye-on-it.sh# Keep weekly reports0200**Monln-fs"queue-report_$(date'+\%F').txt"/home/user/report/keep-an-eye-on-it.txt# Start the day fresh (which means rolling 6-7 days...)0300***rm-f/home/user/report/$(date'+\%u_\%a')/*

Run the script every hour.

Create a symlink like keep-an-eye-on-it.txt → keep-an-eye-on-it_2017-10-09.txt so when the script writes to keep-an-eye-on-it.txt output actually goes to a weekly keep-an-eye-on-it_2017-10-09.txt report you can archive. %F is a shortcut in some versions of date for %Y-%m-%d.

Remove the contents of “tomorrow’s” directory, just before midnight. Note that in some versions of cron (e.g., Vixie-cron) you must escape % signs or you will get an error like “Syntax error: EOF in backquote substitution.”
Write the backups into circular set of files or directories, such as days of the week or month, or months. You also need to have a way to clear the old data when you circle around again.
We’ve found that every once in a while Firefox will lose its session restore feature, so we have a simple script to back up and restore that (Example 17-4).
#!/usr/bin/env bash# cookbook filename: ff-sessions# Save/Restore FF sessions# Run from cron like:# 45 03,15 * * * opt/bin/ff-sess.sh qsaveFF_DIR="$HOME/.mozilla/firefox"date=$(date'+%u_%a_%H')# e.g.: 3_Wed_15case"$1"inqsave)# Quiet savecd$FF_DIRrm-fff_sessions_$date.zipzip-9qrff_sessions_$date.zip*/session*;;save)# Noisy save (calls qsave)echo"SAVING '$FF_DIR/*/session*' data into '$date' file"$0qsave;;restore)[-z"$2"]&&{echo"Need a date to restore from!";exit1;}date="$2"echo"Restoring session data from '$date' file"cd$FF_DIRunzip-off_sessions_$date.zip;;*)echo'Save/Restore FF sessions'echo"$0save"echo"$0restore <date>"echo"e.g.,$0restore 3_Wed_15";;esac

Run from cron with a line like in the comment, in this case twice a day at 3:45 a.m. and 3:45 p.m.

As in Recipe 17.18, we prefix the human-readable day of the week with a number to make it sort correctly, then we add the hour at which the job ran.

zip will normally append to a ZIP file, so we remove any existing file just in case you have added or removed a profile. The -f (force) option will prevent rm from generating an error if the file does not exist.

We use -9 for maximum compression, -q for quiet, and -r for recursive zip operation, then we back up anything in the Firefox profile directories that starts with session.

The “save” argument will display a message about what it’s doing.

Then it will call the “quiet” save. Normally for cron jobs you only want output if something went wrong; otherwise you get an email every time the job runs.

We’ve compressed what might otherwise be several lines into one line here because, while the sanity check is important, we don’t want to distract from the main point of the block.

We assign $2 to $date for later code clarity. This may seem silly in so small a block, but it’s generally a good practice to follow and it’s better to be consistent and not waste time thinking, “Should I?”

We use -o for unzip to overwrite the existing files, if any, so we’re not prompted about that.

Finally, if we provide no options or the wrong ones, we get a helpful reminder about usage.
This script can easily be extended to save weekly, monthly, and yearly backups by either adding more options or changing the script to take an argument instead of hardcoding “now” as we did, then adding more cron jobs with the appropriate arguments. Note that in some versions of cron (e.g., Vixie-cron) you must escape % signs or you will get an error like “Syntax error: EOF in backquote substitution.”
man zip
man unzip
http://kb.mozillazine.org/Session_Restore (“Troubleshooting”)
Change the pattern you are looking for so that it is a valid regular expression that will not match the literal text that ps will display:
$ ps aux | grep 'ssh' root 366 0.0 1.2 340 1588 ?? Is 20Oct06 0:00.68 /usr/sbin/sshd root 25358 0.0 1.9 472 2404 ?? Ss Wed07PM 0:02.16 sshd: root@ttyp0 jp 27579 0.0 0.4 152 540 p0 S+ 3:24PM 0:00.04 grep ssh $ ps aux | grep '[s]sh' root 366 0.0 1.2 340 1588 ?? Is 20Oct06 0:00.68 /usr/sbin/sshd root 25358 0.0 1.9 472 2404 ?? Ss Wed07PM 0:02.17 sshd: root@ttyp0 $
This works because [s] is a regular expression character class containing a single lowercase letter s, meaning that [s]sh will match ssh but not the literal string grep [s]sh that ps will display.
The other (less efficient and more clunky) solution you might see is something like this:
ps aux | grep 'ssh' | grep -v grep
man ps
man pgrep
man grep
If you don’t already have a PID, grep the output of the ps command to see if the program you are looking for is running (see Recipe 17.20 for details on why our pattern is [s]sh):
ps -ef | grep -q 'bin/[s]shd' && echo 'ssh is running' || echo 'ssh not running'
That’s nice, but you know it’s not going to be that easy, right? Right. It’s difficult because ps can be wildly different from system to system.
Example 17-5 is a script you can use to find out if a process is running if you don’t have a PID.
# cookbook filename: is_process_running# Can you believe this?!?case`uname`in Linux|AIX)PS_ARGS='-ewwo pid,args';;SunOS)PS_ARGS='-eo pid,args';;*BSD)PS_ARGS='axwwo pid,args';;Darwin)PS_ARGS='Awwo pid,command';;esacifps$PS_ARGS|grep -q'bin/[s]shd';thenecho'sshd is running'elseecho'sshd not running'fi
If you do have a PID, say from a lockfile or an environment variable, just search for it (be careful to match the PID up with some other recognizable string so that you don’t have a collision where some other random process just happens to have a stale PID that matches the one you are using). Use the PID in the grep or in a -p argument to ps:
# Linux $ ps -wwo pid,args -p 1394 | grep 'bin/sshd' 1394 /usr/sbin/sshd # BSD $ ps ww -p 366 | grep 'bin/sshd' 366 ?? Is 0:00.76 /usr/sbin/sshd
If your system has pgrep installed, you can use that too. It has many options, but we’re only using -f to search the full command line instead of just the process name, and -a to display the full command line:
$ pgrep -fa 'bin/[s]shd' ; echo $? 1278 /usr/sbin/sshd -D
The test and grep portion of the first solution requires a little explanation. You need the " " around the $() so that if grep outputs anything, the test is true. If the grep is silent because nothing matches, then the test is false. You just have to make sure your ps and greps do exactly what you want.
Unfortunately, the ps command is one of the most fragmented in all of Unix. It seems like every flavor of Unix and Linux has different arguments and processes them in different ways. All we can tell you is that you’ll need to thoroughly test against all systems on which your script will be running.
You can easily search for anything you can express as a regular expression, but make sure your expressions are specific enough not to match anything else. That’s why we used bin/[s]shd instead of just [s]shd, which would also match user connections (see Recipe 17.20). At the same time, /usr/sbin/[s]shd might be bad in case some crazy system doesn’t use that location. There is often a fine line between too much and not enough specificity. For example, you may have a program that can run multiple instances using different configuration files, so make sure you search for the config file as well if you need to isolate the correct instance. The same thing may apply to users, if you are running with enough rights to see other users’ processes.
man ps
man grep
Recipe 17.20, “Grepping ps Output Without Also Getting the grep Process Itself”
man pgrep
man pidof
man killall
Pipe the appropriate data into a while read loop and printf as needed. For example, this prints the $HOSTNAME, followed by a tab, followed by any nonblank lines of output from the last command:
last|whilereadi;do[[-n"$i"]]&&printf"%b""$HOSTNAME\t$i\n";done
Or you can use awk to add text to each line:
last | awk "BEGIN { OFS=\"\t\" } ! /^\$/ { print \"$HOSTNAME\", \$0}"
Or, to write a new logfile, use:
last|whilereadi;do[[-n"$i"]]&&printf"%b""$HOSTNAME\t$i\n";\done> last_$HOSTNAME.log
or:
last | awk "BEGIN { OFS=\"\t\" } ! /^\$/ { print \"$HOSTNAME\", \$0}" \
> last_$HOSTNAME.log
We use [[ -n "$i" ]] to remove any blank lines from the last output, and then we use printf to display the data. Quoting for this method is simpler, but it uses more steps (last, while, and read, as opposed to just last and awk). You may find one method easier to remember, more readable, or faster than the other, depending on your needs.
There is a trick to the awk command we used here. Often you will see single quotes surrounding awk commands to prevent the shell from interpreting awk variables as shell variables. However, in this case we want the shell to interpolate $HOSTNAME, so we surround the command with double quotes. That requires us to use backslash escapes on the elements of the command that we do not want the shell to handle, namely the internal double quotes, the $ end-of-line anchor, and the awk $0 variable, which contains the current line.
For a suffix, simply move the $0 variable:
last|whilereadi;do[[-n"$i"]]&&printf"%b""$i\t$HOSTNAME\n";done
or with awk:
last|awk"BEGIN { OFS=\"\t\" } ! /^\$/ { print \$0, \"$HOSTNAME\"}"
last|perl -ne"print qq($HOSTNAME\t\$_) if ! /^\s*$/;"
or sed (note the → denotes a literal tab character, typed by pressing Ctrl-V then Ctrl-I):
last|sed"s/./$HOSTNAME→ &/; /^$/d"
In the Perl command, we use qq() instead of double quotes to avoid having to escape the parts of the command we don’t want the shell to interpret. The last part is a regular expression that matches a line containing either nothing or only whitespace, and $_ is the Perl idiom for the current line. In the sed command we replace any line containing at least one character with the prefix and the character that matched (&), then delete any blank lines.
Effective awk Programming, 4th Edition, by Arnold Robbins
sed & awk, 2nd Edition, by Arnold Robbins and Dale Dougherty
Thanks to Michael Wang for contributing the following shell-only implementation and reminding us about cat -n. Note that our sample file named lines has a trailing blank line:
$ i=0; while IFS= read -r line; do (( i++ )); echo "$i $line"; done < lines 1 Line 1 2 Line 2 3 4 Line 4 5 Line 5 6
$ cat -n lines
1 Line 1
2 Line 2
3
4 Line 4
5 Line 5
6
$ cat -b lines
1 Line 1
2 Line 2
3 Line 4
4 Line 5
If you only need to display the line numbers on the screen, you can use less -N:
$ /usr/bin/less -N filename
1 Line 1
2 Line 2
3
4 Line 4
5 Line 5
6
lines (END)
Line numbers are broken in old versions of less on some obsolete Red Hat systems. Check your version with less -V. Version 358+iso254 (e.g., Red Hat 7.3 & 8.0) is known to be bad. Version 378+iso254 (e.g., RHEL3) and version 382 (RHEL4, Debian Sarge) are known to be good; we did not test other versions. The problem is subtle and may be related to an older iso256 patch. You can easily compare last line numbers as the vi and Perl examples are correct.
You can also use vi (or view, which is read-only vi) with the :set nu! command:
$ vi filename
1 Line 1
2 Line 2
3
4 Line 4
5 Line 5
6
~
:set nu!
vi has many options, so you can start vi by doing things like vi +3 -c 'set nu!' filename to turn on line numbering and place your cursor on line 3. If you’d like more control over how the numbers are displayed, you can also use nl, awk, or perl:
$ nl lines
1 Line 1
2 Line 2
3 Line 4
4 Line 5
$ nl -ba lines
1 Line 1
2 Line 2
3
4 Line 4
5 Line 5
6
$ awk '{ print NR, $0 }' filename
1 Line 1
2 Line 2
3
4 Line 4
5 Line 5
6
$ perl -ne 'print qq($.\t$_);' filename
1 → Line 1
2 → Line 2
3 →
4 → Line 4
5 → Line 5
6 →
NR and $. are the line number in the current input file in awk and Perl respectively, so it’s easy to use them to print the line number. Note that we are using a → to denote a tab character in the Perl output, while awk uses a space by default.
man cat
man nl
man awk
man less
man vi
On some systems, notably Solaris, awk will hang waiting for a file unless you give it one, such as /dev/null. This has no effect on other systems, so it’s fine to use everywhere.
Note that the variable in the print statement is i, not $i. If you accidentally use $i it will be interpolated as a field from the current line being processed. Since we’re processing nothing, that’s what you’ll get if you use $i by accident (i.e., nothing).
The BEGIN and END patterns allow for startup or cleanup operations when actually processing files. Since we’re not processing a file, we need to use one of them so that awk knows to actually do something even though it has no normal input. In this case, it doesn’t matter which we use.
There is a GNU utility called seq that does exactly what this recipe calls for, but it does not exist by default on many systems (for example, Solaris and older macOS and BSDs). It offers some useful formatting options and is numeric only, but be aware that you may find differences between the BSD and GNU versions.
Thankfully, as of bash 2.04 and later, you can do integer arithmetic in for loops:
# Bash 2.04+ only, integer only $ for ((i=1; i<=5; i++)); do echo "$i text"; done 1 text 2 text 3 text 4 text 5 text
As of bash 3.0 there is also the {x..y} brace expansion, which allows integers or single characters:
# Bash 3.0+ only, integer or single character only
$ printf "%s text\n" {1..5}
1 text
2 text
3 text
4 text
5 text
$ printf "%s text\n" {a..e}
a text
b text
c text
d text
e text
In bash 4.0 and later, you may use leading zeros in the {x..y} brace expansion:
# Bash 4.0+ only, optional leading zeros with integers$fornumin{01..16};doechosshserver$num;donesshserver01sshserver02sshserver03...sshserver14sshserver15sshserver16
man seq
man awk
To do that, use the read -n1 -p command in a function:
pause(){read-n1 -p'Press any key when ready...'}
-n was introduced in bash 2.04. If you must omit the -n1 (though really, if you’re using a bash that old you should upgrade it), then the prompt as shown is not correct, because you must end the input by hitting the Enter key. You should use something like this instead: read -p Press the ENTER key when ready....
The -nnchars option will return after reading nchars, or a newline. So, -n1 returns after (wait for it…) any key. The -p option followed by a string argument prints the string before reading input. In this case the string is the same as the DOS pause command’s output.
help read
Depending on your system and configuration, you may be able to use printf’s format flag with a suitable locale. Thanks to Chet Ramey for this solution, which is by far the easiest if it works:
$ LC_NUMERIC=en_US.UTF-8 printf "%'d\n" 123456789 123,456,789 $ LC_NUMERIC=en_US.UTF-8 printf "%'f\n" 123456789.987 123,456,789.987000 $
Thanks to Michael Wang for contributing the shell-only implementation and the relevant discussion.
# cookbook filename: func_commifyfunctioncommify{typesettext=${1}typesetbdot=${text%%.*}typesetadot=${text#${bdot}}typeseti commified((i=${#bdot}-1))while((i>=3))&&[[${bdot:i-3:1}==[0-9]]];docommified=",${bdot:i-2:3}${commified}"((i -=3))doneecho"${bdot:0:i+1}${commified}${adot}"}
Or you can try one of the sed solutions from the sed FAQ. For example:
sed ':a;s/\B[0-9]\{3\}\>/,&/;ta' /path/to/file # GNU sed
sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta' /path/to/file # other seds
The shell function is written to follow the same logical process as a person using a pencil and paper. First you examine the string and find the decimal point, if any. You ignore everything after the dot, and work on the string before the dot.
The shell function saves the string before the dot in $bdot, and after the dot (including the dot) in $adot. If there is no dot, then everything is in $bdot, and $adot is empty. Next, a person would move from right to left in the part before the dot and insert a comma when these two conditions are met:
There are four or more characters left.
The character before the comma is a number.
The function implements this logic in the while loop.
Recipe 2.16 in Tom Christiansen and Nathan Torkington’s Perl Cookbook, 2nd Edition (O’Reilly) also provides a string processing solution, reproduced in Example 17-7.
# cookbook filename: perl_sub_commify#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++# Add comma thousands separator to numbers# Returns: input string, with any numbers commified# From Perl Cookbook2 2.16, pg 84sub commify{@_==1or carp('Sub usage: $withcomma = commify($somenumber);');# From _Perl_Cookbook_1 2.17, pg 64, or _Perl_Cookbook_2 2.16, pg 84my$text=reverse$_[0];$text=~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g;returnscalar reverse$text;}
Perl Cookbook, 2nd Edition, Recipe 2.16, by Tom Christiansen and Nathan Torkington (O’Reilly)