This is the second installment of my quest to explore and better understand some common UNIX commands I thought I already knew. I had planned on covering
awk in this installment, but it turns out that these tools are more complex than I thought. They are not simply command line utilities, they are powerful text manipulation tools with languages of their own. Perhaps I will review them in the future, but for the time being I cannot give these tools the attention they deserve for a thorough treatment.
touch command is designed to update timestamps associated with files. Running
touch on a file or set of files will update the access and modification timestamps to the current time. If you try to
touch a file that does not exist, the command will create an empty file with that filename for you.
This is a very simple command (borderline trivial), and in the past I have used it almost exclusively for creating new empty files. However, updating file timestamps can be useful in a variety of contexts and knowing how to do this programmatically is useful. For example, in many supercomputing environments, some disk partitions are checked regularly and any file that has not been accessed or modified in the last 7 days is deleted. If you have some important data files on a scratch disk that you’re not currently using, but don’t want to lose (you’re going to use them soon), then updating the timestamps on the data files is what you need to do. Of course, you can open each file in
nano, but this becomes ridiculous if you have a lot of data files. Instead, simply run
touch on the files to update their timestamps.
Here are a few useful options I just learned.
-a: change only the access time, not the modification time
-c: only update timestamps for existing files, do not create any new files
-d STRING: instead of using the current time, set the timestamp to
STRING; the value of
STRINGis flexible and can be anything from
Sun, 29 Feb 2004 16:21:42 -0800to
-m: change only the modification time, not the access time
ls command is one of the first any UNIX user learns. It is used to list the contents of the current working directory. In my UNIX working environments, I alias the command
ls -lhp. There are a few other options for sorting and otherwise managing the output of this command.
-a/-A: list all directory contents, including hidden files beginning with ., the current working directory ., and the parent directory .. (the
-Aoption excludes these last two)
-d: treat files normally, but for directories, list the directory itself instead of directory contents
-h: make output more human readable (such as when used with
-l: print using a detailed listing format
/indicator to directories
-r: reverse the order of sorting
-R: list subdirectories recursively
-S: sort by file size
-t: sort by modification time
-X: sort by file extension
xargs command is used to dynamically build and execute commands on the command line. Typically the output of other commands or programs is piped into
xargs, which is then used to dynamically run commands based on the output. Here are some useful options.
The basic usage of
xargs is as follows.
somecommand | xargs -I % someothercommand -arg1 value1 -arg2 % sometext
-I option indicates which character(s) in the following commands should be replaced by the
xargs input (in this example I chose the
% character, but you’ve got some flexibility there). For example, if the
somecommand command generates two lines of output (
bar), then the commands executed by xargs would be as follows.
someothercommand -arg1 value1 -arg2 foo sometext someothercommand -arg1 value1 -arg2 bar sometext
Here is a non-trivial example.
find /data/runs -mindepth 4 -type d | xargs -I % mv % /data/backup
This command will start in the
/data/runs directory, look for any directories that are nested 4 levels deep, and them move them to the
There are a few options that allow you to make slight modifications to
-a file: read input from the given file rather than from the standard input
-d delim: use the given delimiter instead of the default newline character
cut command is designed to process data files (especially file in tabular format) and extract out relevant data. For example, if you have a tab- or comma-delimited file with several columns, the
cut command can be used to cut out particular columns from that file. This is a very useful command in bioinformatics, despite the fact that it’s pretty simple and straightforward. However, the manual did teach me a few options that I wasn’t aware of before. Here are some helpful options.
-d delim: use the given delimiter instead of the default tab character
-f FIELDS: extract the given fields (columns) from the file (separate field/column numbers with commas)
--complement: extract the complement of the fields specified by
-s: only process lines that contain the delimiter; this can be useful for files that contain comments or other types of metadata that you don’t want to process
sort command will (you guessed it!) sort the lines of input. Looking at the
sort manual didn’t reveal any spectacularly interesting options for this command, but it does provide a variety of different ways to sort the input (in contrast to the default ascii-cographical order). Knowing these options is helpful.
-d: sort by dictionary order, only considering blanks and alphanumeric characters
-f: case-insensitive sort
-h: sort by human-readable number value (the human-readable values generated by other UNIX commands, such as 2K or 1G)
-n: numeric sort
-R: random sort
-r: reverse the natural order of the sort
-o FILE: write output to
FILEinstead of the standard output
uniq command is useful for reporting and counting duplicated lines of input. This command expects the input to be sorted ascii-cographically, so it is often used in conjunction with the
sort. By default,
uniq will print all of the lines of input and remove any duplicates. However, there are a few options that enable you to adapt this default behavior.
-c: along with each line, print the number of occurrences of that line in the input
-d: only print lines with more than one occurrence in the input
-i: ignore case differences when comparing lines
-s N: skip the first
Ncharacters when comparing lines
-u: only print lines that occur once in the input
-w N: only compare
Ncharacters when comparing lines