File management tools include those for splitting, comparing, and compressing files, making backup archives, and tracking file revisions. Other management tools exist for determining the contents of a file, and for changing its timestamp.
Determining File Type and Format
When we speak of a file's type, we are referring to the kind of data it contains, which may include text, executable commands, or some other data; this data is organized in a particular way in the file, and this organization is called its format. For example, an image file might contain data in the JPEG image format, or a text file might contain unformatted text in the English language or text formatted in the TeX markup language.
The file
tool analyzes files and indicates their type and -- if known -- the format of the data they contain. Supply the name of a file as an argument to file
and it outputs the name of the file, followed by a description of its format and type.
$ file /usr/doc/HOWTO/example.gz [Enter]
/usr/doc/HOWTO/example.gz: gzip compressed data, deflated, original
Filename last modified: Sun Apr 26 02:51:48 1998, os: UNIX
Changing File Modification Time
Use touch
to change a file's timestamp without modifying its contents. Give the name of the file to be changed as an argument. The default action is to change the timestamp to the current time.
To change the timestamp of file `example' to the current date and time, type:
$ touch example [ENTER]
To specify a timestamp other than the current system time, use the `-d' option, followed by the date and time that should be used enclosed in quote characters. You can specify just the date, just the time, or both.
To change the timestamp of file `example' to `17 May 1999 14:16', type:
$ touch -d '17 May 1999 14:16' example [ENTER]
To change the timestamp of file `example' to `14 May', type:
$ touch -d '14 May' example [ENTER]
To change the timestamp of file `example' to `14:16', type:
$ touch -d '14:16' example [ENTER]
Splitting a File into Smaller Ones
It's sometimes necessary to split one file into a number of smaller ones. For example, suppose you have a very large sound file in the near-CD-quality MPEG2, level 3 ("MP3") format. Your file, `example.mp3', is 4,394,422 bytes in size, and you want to transfer it from your desktop to your laptop, but your laptop and desktop are not connected on a network -- the only way to transfer files between them is by floppy disk. Because this file is much too example to fit on one floppy, you use split
.
The split
tool copies a file, chopping up the copy into separate files of a specified size. It takes as optional arguments the name of the input file (using standard input if none is given) and the file name prefix to use when writing the output files (using `x' if none is given). The output files' names will consist of the file prefix followed by a group of letters: `aa', `ab', `ac', and so on -- the default output file names would be `xaa', `xab', and so on.
Specify the number of lines to put in each output file with the `-l' option, or use the `-b' option to specify the number of bytes to put in each output file. To specify the output files' sizes in kilobytes or megabytes, use the `-b' option and append `k' or `m', respectively, to the value you supply. If neither `-l' nor `-b' is used, split defaults to using 1,000 lines per output file.
To split `example.mp3' into separate files of one megabyte each, whose names begin with `example.mp3.',
$ split -b1m example.mp3 example.mp3 [Enter]
This command creates five new files whose names begin with `example.mp3.'. The first four files are one megabyte in size, while the last file is 200,118 bytes -- the remaining portion of the original file. No alteration is made to `example.mp3'.
You could then copy these five files onto four floppies (the last file fits on a floppy with one of the larger files), copy them all to your laptop, and then reconstruct the original file with cat (see Concatenating Text).
To reconstruct the original file from the split files, type:
$ cat example.mp3.* > example.mp3 [Enter]
In this example, the rm tool is used to delete all of the split files after the original file has been reconstructed.
$ rm example.mp3.* [Enter]
Comparing Files
There are a number of tools for comparing the contents of files in different ways; these recipes show how to use some of them. These tools are especially useful for comparing passages of text in files, but that's not the only way you can use them.
- Cmp: Comparing two files to see if they differ.
- Diff: Showing the differences between files.
- Patch: Applying a difference report to a file.
Determining Whether Two Files Differ
Use cmp to determine whether or not two text files differ. It takes the names of two files as arguments, and if the files contain the same data, cmp outputs nothing. If, however, the files differ, cmp outputs the byte position and line number in the files where the first difference occurs.
To determine whether the files `test’ and `example' differ, type:
$ cmp test example [ENTER]
Finding the Differences between Files
Use diff to compare two files and output a difference report (sometimes called a "diff") containing the text that differs between two files. The difference report is formatted so that other tools (namely, patch---see Patching a File with a Difference Report) can use it to make a file identical to the one it was compared with.
To compare two files and output a difference report, give their names as arguments to diff.
To compare the files `exscript.old' and `exscript.new', type:
$ diff exscript.old exscript.new [ENTER]
The difference report is output to standard output; to save it to a file, redirect the output to the file to save to:
$ diff exscript.old exscript.new > exscript.diff [ENTER]
In the preceding example, the difference report is saved to a file called `exscript.diff'.
The difference report is meant to be used with commands such as patch, in order to apply the differences to a file. See Info file `diff.info', node `Top', for more information on diff and the format of its output.
To better see the difference between two files, use sdiff instead of diff; instead of giving a difference report, it outputs the files in two columns, side by side, separated by spaces. Lines that differ in the files are separated by `|'; lines that appear only in the first file end with a `<', and lines that appear only in the second file are preceded with a `>'.
To peruse the files `test' and `example1' side by side on the screen, with any differences indicated between columns, type:
$ sdiff test example1 | dev [ENTER]
Patching a File with a Difference Report
To apply the differences in a difference report to the original file compared in the report, use patch
. It takes as arguments the name of the file to be patched and the name of the difference report file (or "patchfile"). It then applies the changes specified in the patchfile to the original file. This is especially useful for distributing different versions of a file -- small patchfiles may be sent across networks easier than large source files.
To update the original file `exscript.new' with the patchfile `exscript.diff', type:
$ patch exscript.new exscript.diff [ENTER]
Compressed Files
File compression is useful for storing or transferring large files. When you compress a file, you shrink it and save disk space. File compression uses an algorithm to change the data in the file; to use the data in a compressed file, you must first uncompress it to restore the original data (and original file size).
The following recipes explain how to compress and uncompress files.
Compressing Files: Making files smaller.
Expanding Files: Making files bigger.
Compressing a File
Use the gzip ("GNU zip") tool to compress files. It takes as an argument the name of the file or files to be compressed; it writes a compressed version of the specified files, appends a `.gz' extension to their file names, and then deletes the original files.
To compress the file `test', type:
$ gzip test[ENTER]
This command compresses the file `test, putting it in a new file named `test.gz'; gzip then deletes the original file, `test.
Decompressing a File
To access the contents of a compressed file, use gunzip to decompress (or "uncompress") it.
Like gzip, gunzip takes as an argument the name of the file or files to work on. It expands the specified files, writing the output to new files without the `.gz' extensions, and then deletes the compressed files.
To expand the file `test.gz', type:
$ gunzip test.gz [ENTER]
This command expands the file `test.gz' and puts it in a new file called `test; gunzip then deletes the compressed file, `test.gz'.
File Archives
An archive is a single file that contains a collection of other files, and often directories. Archives are usually used to transfer or make a backup copy of a collection of files and directories -- this way, you can work with only one file instead of many. This single file can be easily compressed as explained in the previous section, and the files in the archive retain the structure and permissions of the original files.
Use the tar
tool to create, list, and extract files from archives. Archives made with tar
are sometimes called "tar files," "tar archives," or -- because all the archived files are rolled into one---"tarballs."
Two common options used with all three of these operations are `-f' and `-v': to specify the name of the archive file, use `-f' followed by the file name; use the `-v' ("verbose") option to have tar
output the names of files as they are processed. While the `-v' option is not necessary, it lets you observe the progress of your tar
operation.
NOTE: The name of this tool comes from "tape archive," because it was originally made to write the archives directly to a magnetic tape device. It is still used for this purpose, but today, archives are almost always saved to a file on disk.
Creating Archives: Creating an archive of files.
Listing Archives: Listing the contents of an archive.
Extracting Archives: Extracting the files from an archive.
Creating a File Archive
To create an archive with tar
, use the `-c' ("create") option, and specify the name of the archive file to create with the `-f' option. It's common practice to use a name with a `.tar' extension, such as `example.tar'.
Give as arguments the names of the files to be archived; to create an archive of a directory and all of the files and subdirectories it contains, give the directory's name as an argument.
To create an archive called `example.tar' from the contents of the `example’ directory, type:
$ tar -cvf example.tar example [Enter]
This command creates an archive file called `example.tar' containing the `example' directory and all of its contents. The original `example' directory remains unchanged.
Use the `-z' option to compress the archive as it is being written. This yields the same output as creating an uncompressed archive and then using gzip
to compress it, but it eliminates the extra step.
To create a compressed archive called `example.tar.gz' from the contents of the `example' directory, type:
$ tar -zcvf example.tar.gz example [Enter]
This command creates a compressed archive file, `example.tar.gz', containing the `example' directory and all of its contents. The original `example' directory remains unchanged.
NOTE: When you use the `-z' option, you should specify the archive name with a `.tar.gz' extension and not a `.tar' extension, so the file name shows that the archive is compressed. This is not a requirement, but it serves as a reminder and is the standard practice.
Listing the Contents of an Archive
To list the contents of a tar
archive without extracting them, use tar
with the `-t' option.
To list the contents of an archive called `example.tar', type:
$ tar -tvf example.tar [Enter]
This command lists the contents of the `example.tar' archive. Using the `-v' option along with the `-t' option causes tar
to output the permissions and modification time of each file, along with its file name -- the same format used by the ls
command with the `-l' option (see Listing File Attributes).
Include the `-z' option to list the contents of a compressed archive. To list the contents of a compressed archive called `example.tar.gz', type:
$ tar -ztvf example.tar [Enter]
Extracting Files from an Archive
To extract (or unpack) the contents of a tar
archive, use tar
with the `-x' ("extract") option. To extract the contents of an archive called `example.tar', type:
$ tar -xvf example.tar [Enter]
This command extracts the contents of the `example.tar' archive into the current directory. If an archive is compressed, which usually means it will have a `.tar.gz' or `.tgz' extension, include the `-z' option.
To extract the contents of a compressed archive called `example.tar.gz', type:
$ tar -zxvf example.tar.gz [Enter]
NOTE: If there are files or subdirectories in the current directory with the same name as any of those in the archive, those files will be overwritten when the archive is extracted. If you don't know what files are included in an archive, consider listing the contents of the archive first.
Another reason to list the contents of an archive before extracting them is to determine whether the files in the archive are contained in a directory. If not, and the current directory contains many unrelated files, you might confuse them with the files extracted from the archive.
To extract the files into a directory of their own, make a new directory, move the archive to that directory, and change to that directory, where you can then extract the files from the archive.
Reference: Linux Cookbook