Chapter 9. Finding Files: find, locate, slocate

How easy is it for you to search for files throughout your filesystem?

For the first few files that you created, it was probably easy enough just to remember their names and where you kept them. Then when you got more files, you created subdirectories (or folders in GUI-speak) to clump your files into related groups. Soon there were subdirectories inside of subdirectories, and now you are having trouble remembering where you put things. And of course, with larger and larger disks it is getting easier to just keep creating and never deleting any files (and for some of us, this getting older thing isn’t helping either).

But how do you find that file you were just editing last week? Or the attachment that you saved in a subdirectory (which seemed such a logical choice at the time)? Or maybe your filesystem has become cluttered with MP3 files scattered all over it, and you want to collect them all up.

Various attempts have been made to provide graphical interfaces to help you search for files, which is all well and good—but how do you use the results from a GUI-style search as input to other commands?

bash and the GNU tools can help. They provide some very powerful search capabilities that enable you to search by filename, dates of creation or modification, even content. They send the results to standard output, perfect for use in other commands or scripts.

So stop your wondering—here’s the information you need.

9.1 Finding All Your MP3 Files

Problem

You have MP3 audio files scattered all over your filesystem. You’d like to move them all into a single location so that you can organize them and then copy them onto a music player.

Solution

The find utility can locate all of those files and then execute a command to move them where you want. For example:

find . -name '*.mp3' -print -exec mv '{}' ~/songs \;

Discussion

The syntax for the find utility is unlike that of other Unix tools. It doesn’t use options in the typical way, with dash and single-letter collections up front followed by several words of arguments. Rather, the options look like short words, and are ordered in a logical sequence describing the logic of which files are to be found, and what to do with them, if anything, when they are found. These word-like options are often called predicates.

A find command’s first arguments are the directory or directories in which to search. A typical use is simply (.) for the current directory, but you can provide a whole list of directories, or even search the entire filesystem (permissions allowing) by specifying the root of the filesystem (/) as the starting point.

In our example the first option (the -name predicate) specifies the pattern we will search for. Its syntax is like the bash pattern-matching syntax, so *.mp3 will match all filenames that end in the characters “.mp3”. Any file that matches this pattern is considered to return true and is passed along to the next predicate of the command.

Think of it this way: find will climb around in the filesystem, and each filename that it finds it will present to this gauntlet of predicates that must be run. Any predicate that is true is passed. Encounter a false, and that filename’s turn is immediately over, and the next filename is processed.

The -print predicate is easy. It is always true and it has the side effect of printing the name to standard output, so any file that has made it this far in the sequence of predicates will have its name printed.

The -exec is a bit odd. Any filename making it this far will become part of a command that is executed. The remainder of the line, up to the \;, is the command to be executed. The {} is replaced by the name of the file that was found. So in our example, if find encounters a file named mhsr.mp3 in the ./music/jazz subdirectory, then the command that will be executed will be:

mv ./music/jazz/mhsr.mp3 ~/songs

The command will be issued for each file that matches the pattern. If lots and lots of matching files are found, lots and lots of commands will be issued. Sometimes this is too demanding of system resources, and it can be a better idea to use find just to find the files and print the filenames into a datafile, and issue fewer commands by consolidating arguments several to a line. (But with machines getting faster all the time, this is less and less of an issue. It might even be something worthwhile for your dual-core or quad-core processor to do.)

9.2 Handling Filenames Containing Odd Characters

Problem

You used a find command like the one in Recipe 9.1, but the results were not what you intended because many of your filenames contain odd characters.

Solution

First, understand that to Unix folks, odd means “anything not a lowercase letter, or maybe a number.” So uppercase letters, spaces, punctuation, and character accents are all odd, but you’ll find all of those and more in the names of many songs and bands.

Depending on the oddness of the characters and your system, tools, and goal, it might be enough to simply quote the replacement string (i.e., put single quotes around the {}, as in '{}') . You did test your command first, right?

If that’s no good, try using the -print0 argument to find and the -0 argument to xargs. -print0 tells find to use the null character (\0) instead of whitespace as the output delimiter between pathnames found. -0 then tells xargs the input delimiter. These will always work, but they are not supported on every system.

The xargs command takes whitespace-delimited (except when using -0) pathnames from standard input and executes a specified command on as many of them as possible (up to a bit less than the system’s ARG_MAX value; see Recipe 15.13). Since there is a lot of overhead associated with calling other commands, using xargs can drastically speed up operations because you are calling the other command as few times as possible, rather than each time a pathname is found.

So, to rewrite the solution from Recipe 9.1 to handle odd characters:

find . -name '*.mp3' -print0 | xargs -i -0 mv '{}' ~/songs

Here is a similar example demonstrating how to use xargs to work around spaces in a path or filename when locating and then copying files:

locate P1100087.JPG PC220010.JPG PA310075.JPG PA310076.JPG | xargs -i cp '{}' .

Discussion

There are two problems with this approach. One is that not all versions of xargs support the -i option, and the other is that the -i option eliminates argument grouping, thus negating the speed increase we were hoping for. The problem is that the mv command needs the destination directory as the final argument, but traditional xargs will simply take its input and tack it onto the end of the given command until it runs out of space or input. The results of that behavior applied to an mv command would be very, very ugly. Some versions of xargs provide a -i switch that defaults to using {} (like find), but using -i results in the command being run repeatedly, once for each argument. So the only benefit over using find’s -exec is the odd-character handling.

The xargs utility is most effective when used in conjunction with find and a command like chmod that just wants a list of arguments to process. You can really see a vast speed improvement when handling large numbers of pathnames. For example:

find some_directory -type f -print0 | xargs -0 chmod 0644

9.3 Speeding Up Operations on Found Files

Problem

You used a find command like the one in Recipe 9.1, but the resulting operations took a long time because you found a lot of files. You want to speed it up.

Solution

See the discussion on xargs in Recipe 9.2.

9.5 Finding Files Irrespective of Case

Problem

Some of your MP3 files end with .MP3 rather than .mp3. How do you find those?

Solution

Use the -iname predicate (if your version of find supports it) to run a case-insensitive search, rather than just -name. For example:

find . -follow -iname '*.mp3' -print0 | xargs -i -0 mv '{}' ~/songs

Discussion

Sometimes you care about the case of the filename and sometimes you don’t. Use the -iname option when you don’t care; i.e., in situations like this, where .mp3 and .MP3 both indicate that the file is probably an MP3 file. (We say probably because on Unix-like systems you can name a file anything that you want. It isn’t forced to have a particular extension.)

One of the most common places where you’ll see the upper- and lowercase issue is when dealing with Microsoft Windows–compatible filesystems, especially older or “lowest common denominator” filesystems. A digital camera that we use stores its files with filenames like PICT001.JPG, incrementing the number with each picture. If you were to try:

find . -name '*.jpg' -print

you wouldn’t find many pictures. In this case you could also try:

find . -name '*.[Jj][Pp][Gg]' -print

since that regular expression will match either letter in brackets, but that isn’t as easy to type, especially if the pattern that you want to match is much longer. In practice, using -iname is an easier choice. The catch is that not every version of find supports the -iname predicate. If your system doesn’t support it, you could try tricky regular expressions as shown here, use multiple -name options with the case variations you expect, or install the GNU version of find.

See Also

  • man find

9.6 Finding Files by Date

Problem

Someone sent you a JPEG image file that you saved on your filesystem a few months ago. Now you don’t remember where you put it. How can you find it?

Solution

Use a find command with the -mtime predicate, which checks the date of last modification. For example:

find . -name '*.jpg' -mtime +90 -print

Discussion

The -mtime predicate takes an argument to specify the time frame for the search. The 90 stands for 90 days. By using a plus sign on the number (+90) we indicate that we’re looking for a file modified more than 90 days ago. Write -90 (using a minus sign) for less than 90 days. Use neither a plus nor a minus to mean exactly 90 days.

There are several predicates for searching based on file modification times, and each takes a quantity argument. Using a plus, minus, or no sign indicates greater than, less than, or equal to, respectively, for all of those predicates.

The find utility also has logical AND, OR, and NOT constructs, so if you know that the file was at least one week (7 days) but not more than 14 days old, you can combine the predicates like this:

find . -mtime +7 -a -mtime -14 -print

You can get even more complicated, using OR as well as AND and even NOT to combine predicates, as in:

find . -mtime +14 -name '*.text' -o \( -mtime -14 -name '*.txt' \) -print

This will print out the names of files ending in .text that are older than 14 days, as well as those that are newer than 14 days but have .txt as their last 4 characters.

You will likely need parentheses to get the precedence right. Two predicates in sequence are like a logical AND, which binds tighter than an OR (in find as in most languages). Use parentheses as much as you need to make it unambiguous.

Parentheses have a special meaning to bash, so we need to escape that meaning and write them as \( and \) or inside of single quotes as '(' and ')'. You cannot use single quotes around the entire expression though, as that will confuse the find command. It wants each predicate as its own word.

See Also

  • man find

9.7 Finding Files by Type

Problem

You are looking for a directory with the word “java” in its name. When you tried:

find . -name '*java*' -print

you got way too many files—including all the Java source files in your part of the filesystem.

Solution

Use the -type predicate to select only directories:

find . -type d -name '*java*' -print

Discussion

We put the -type d first, followed by the -name '*java*'. Either order would have found the same set of files, but putting the -type d first in the list of predicates makes the search slightly more efficient: as each file is encountered, the test will be made to see if it is a directory and then only directories will have their names checked against the pattern. All files have names; relatively few are directories. So, this ordering eliminates most files from further consideration before we ever do the string comparison. Is it a big deal? With processors getting faster all the time, it matters less. With disk sizes getting bigger all the time, it matters more. There are several types of files for which you can check, not just directories. Table 9-1 lists the single characters used to find these types of files.

Table 9-1. Characters used by find’s -type predicate
Key Meaning

b

Block special file

c

Character special file

d

Directory

p

Pipe (or “fifo”)

f

Plain ol’ file

l

Symbolic link

s

Socket

D

Door (Solaris only)

See Also

  • man find

9.8 Finding Files by Size

Problem

You want to do a little housecleaning, and to get the most out of your effort you are going to start by finding your largest files and deciding if you need to keep them around. But how do you find your largest files?

Solution

Use the -size predicate in the find command to select files above, below, or of exactly a certain size. For example:

find . -size +3000k -print

Discussion

Like the numeric argument to -mtime, the -size predicate’s numeric argument can be preceded by a minus sign, a plus sign, or no sign at all to indicate less than, greater than, or exactly equal to the numeric argument. In our example, we’re looking for files that are greater than the size indicated.

The size indicated includes a unit of k for kilobytes. If you use c for the unit, that means just bytes (or characters). If you use b, or don’t put any unit, that indicates a size in blocks. (The block is a 512-byte block, historically a common unit in Unix systems.) So, we’re looking for files that are greater than 3 MB in size.

Tip

If you want to delete the files and are using a version of find that supports it, the -delete action is much easier than trying to use rm or xargs rm.

See Also

9.9 Finding Files by Content

Problem

You wrote an important letter and saved it as a text file, putting .txt on the end of the filename, but you’ve forgotten the rest of the name. Beyond that, the only thing you remember about the content of the letter is that you used the word “portend.” How do you find a file with some known content?

Solution

If you are in the vicinity of that file, say within the current directory, you can start with a simple grep:

grep -i portend *.txt

With the -i option, grep will ignore upper- and lowercase differences. This command may not be sufficient to find what you’re looking for, but start simply. Of course, if you think the file might be in one of your many subdirectories, you can try to reach all the files that are in subdirectories of the current directory with this command:

grep -i portend */*.txt

Let’s face it, though, that’s not a very thorough search.

If that doesn’t do it, let’s use a more complete solution: the find command. Use the -exec option on find so that if the predicates are true up to that point, it will execute a command for each file it finds. You can invoke grep or other utilities like this:

find . -name '*.txt' -exec grep -Hi portend '{}' \;

Discussion

We use the -name '*.txt' construct to help narrow down the search. Any such test will help, since having to run a separate executable for each file the command finds is costly in time and CPU horsepower. Maybe you have a rough idea of how old the file is (e.g., -mdate -5 or some such); if so, add that too.

The '{}' is where the filename is put when executing the command. The \; indicates the end of the command, in case you want to continue with more predicates. Both the braces and the semicolon need to be escaped, so we quote one and use the backslash for the other. It doesn’t matter which way we escape them, only that we do escape them so that bash doesn’t misinterpret them.

On some systems, the -H option will print the name of the file if grep finds something. Normally, with only one filename on the command, grep won’t bother to name the file; it just prints out the matching line that it finds. Since we’re searching through many files, we need to know which file was grepped.

If you’re running a version of grep that doesn’t have the -H option, then just put /dev/null as one of the filenames on the grep command. The grep command will then have more than one file to open, and will print out the filename if it finds the text.

See Also

  • man find

9.10 Finding Existing Files and Content Fast

Problem

You’d like to be able to find files without having to wait for a long find command to complete, or you need to find a file with some specific content.

Solution

If your system has locate, slocate, Beagle, Spotlight, or some other indexer, you are already set. If not, look into them.

As we discussed in Recipe 1.5, locate and slocate consult database files about the system (usually compiled and updated by a cron job) to find file or command names almost instantly. The location of the actual database files, what is indexed therein, and how often may vary from system to system. Consult your system’s manpages for details. Here’s an example:

$ locate apropos
/usr/bin/apropos
/usr/share/man/de/man1/apropos.1.gz
/usr/share/man/es/man1/apropos.1.gz
/usr/share/man/it/man1/apropos.1.gz
/usr/share/man/ja/man1/apropos.1.gz
/usr/share/man/man1/apropos.1.gz

locate and slocate don’t index content, though, so see Recipe 9.9 for that.

Most modern graphical operating systems now include local search tools that use an indexer to crawl, parse, and index the names and contents of all of the files (and usually email messages) in your personal file space; i.e., your home directory on a Unix or Linux system. This information is then almost instantly available to you when you look for it. These tools are usually very configurable, graphical, and operate on a per-user basis.

Discussion

slocate stores permission information (in addition to filenames and paths), so it will not list programs to which the user does not have access. On most Linux systems locate is a symbolic link to slocate; other systems may have separate programs, or may not have slocate at all. Both of these are command-line tools that crawl and index the entire filesystem, more or less, but they only contain filenames and locations.

9.11 Finding a File Using a List of Possible Locations

Problem

You need to execute, source, or read a file, but it could be located in a number of different places in or outside of the $PATH.

Solution

If you are going to source the file and it’s located somewhere on the $PATH, just source it. bash’s builtin source command (also known by the shorter-to-type but harder-to-read POSIX name .) will search the $PATH if the sourcepath shell option is set, which it is by default:

source myfile

If you want to execute a file only if you know it exists in the $PATH and is executable, and you have bash version 2.05b or higher, use type -P to search the $PATH. Unlike the which command, type -P only produces output when it finds the file, which makes it much easier to use in this case:

LS=$(type -P ls)
[ -x "$LS" ] && $LS

# --OR--

LS=$(type -P ls)
if [ -x "$LS" ]; then
    : commands involving $LS here
fi

If you need to look in a variety of locations, possibly including the $PATH, use a for loop. To search each of the elements of the $PATH, use the variable substitution operator ${variable//pattern/replacement} to replace all of the : separators with a space, thereby rendering them as separate words, and then use for as usual to iterate over a list of words. To search the $PATH and other possible locations, just list them in the for statement as in these examples:

for path in ${PATH//:/ }; do
    [ -x "$path/ls" ] && $path/ls
done

# --OR--

for path in ${PATH//:/ } /opt/foo/bin /opt/bar/bin; do
    [ -x "$path/ls" ] && $path/ls
done

If the file is not in the $PATH but could be in a list of other locations, possibly even under different names, list the full paths for each:

for file in /usr/local/bin/inputrc /etc/inputrc ~/.inputrc; do
    [ -f "$file" ] && bind -f "$file" && break # Use the first one found
done

Perform any additional tests as needed. For example, you may wish to use screen when logging in if it’s present on the system:

for path in ${PATH//:/ }; do
    if [ -x "$path/screen" ]; then
        # If screen(1) exists and is executable:
        for file in /opt/bin/settings/run_screen ~/settings/run_screen; do
            [ -x "$file" ] && $file && break # Execute the first one found
        done
    fi
done

See Recipe 16.22 for more details on this code fragment.

Discussion

Using for to iterate through each possible location may seem like overkill, but it’s actually very flexible and allows you to search wherever you need to, apply whatever other tests are appropriate, and then do whatever you want with the file if found. By replacing each : with a space in the $PATH, we turn it into the kind of space-delimited list for expects (but as we also saw, any space-delimited list will work). Adapting this technique as needed will allow you to write some very flexible and portable shell scripts that can be highly tolerant of file locations.

You may be tempted to set $IFS=':' to directly parse the $PATH, rather than preparsing it into $path. That will work, but involves extra work with variables and isn’t as flexible.

You may also be tempted to do something like the following:

[ -n "$(which myfile)" ] && bind -f $(which myfile)

The problem here is not when the file exists, but when it doesn’t. The which utility behaves differently on different systems. The Red Hat which is aliased to provide details when the argument is an alias and to set various command-line switches, and it returns a not found message (while which on Debian or FreeBSD does not). But if you try that line on NetBSD, you could end up trying to bind no myfile in /sbin /usr/sbin /bin /usr/bin /usr/pkg/sbin /usr/pkg/bin /usr/X11R6/bin /usr/ local/sbin /usr/local/bin, which is not what you meant.

The command command is also interesting in this context. It’s been around longer than type -P and may be useful under some circumstances.

Red Hat Enterprise Linux 4.x behaves like this:

$ alias which
alias which='alias | /usr/bin/which --tty-only --read-alias --show-dot --show-tilde'

$ which rd
alias rd='rmdir'
        /bin/rmdir

$ which ls
alias ls='ls --color=auto -F -h'
        /bin/ls

$ which cat
/bin/cat

$ which cattt
/usr/bin/which: no cattt in (/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/
X11R6/bin:/home/jp/bin)

$ command -v rd
alias rd='rmdir'

$ command -v ls
alias ls='ls --color=auto -F -h'

$ command -v cat
/bin/cat

Debian and FreeBSD (but not NetBSD or OpenBSD) behave like this:

$ alias which
-bash3: alias: which: not found

$ which rd

$ which ls
/bin/ls

$ which cat
/bin/cat

$ which cattt

$ command -v rd
-bash: command: rd: not found

$ command -v ls
/bin/ls

$ command -v cat
/bin/cat

$ command -v ll
alias ll='ls -l'