diff from the Linux command line by OnWorks

Skip to content

diff

Like the comm program, diff is used to detect the differences between files. However, diff is a much more complex tool, supporting many output formats and the ability to process large collections of text files at once. diff is often used by software developers to examine changes between different versions of program source code, and thus has the ability to recursively examine directories of source code, often referred to as source trees. One common use for diff is the creation of diff files or patches that are used by pro- grams such as patch (which we’ll discuss shortly) to convert one version of a file (or files) to another version.

If we use diff to look at our previous example files:

[me@linuxbox ~]$ diff file1.txt file2.txt

1d0

< a 4a4

> e

[me@linuxbox ~]$ diff file1.txt file2.txt

1d0

< a 4a4

> e

we see its default style of output: a terse description of the differences between the two files. In the default format, each group of changes is preceded by a change command in the form of range operation range to describe the positions and types of changes required to convert the first file to the second file:

Table 20-4: diff Change Commands

Change Description

r1ar2 Append the lines at the position r2 in the second file to the position

r1 in the first file.

r1cr2 Change (replace) the lines at position r1 with the lines at the position r2 in the second file.

r1dr2 Delete the lines in the first file at position r1, which would have appeared at range r2 in the second file

In this format, a range is a comma-separated list of the starting line and the ending line. While this format is the default (mostly for POSIX compliance and backward compatibil- ity with traditional Unix versions of diff), it is not as widely used as other, optional for- mats. Two of the more popular formats are the context format and the unified format.

When viewed using the context format (the -c option), we will see this:

[me@linuxbox ~]$ diff -c file1.txt file2.txt

*** file1.txt 2008-12-23 06:40:13.000000000 -0500

--- file2.txt 2008-12-23 06:40:34.000000000 -0500

***************

*** 1,4 ****

- a b c d

--- 1,4 ----

b c d

+ e

[me@linuxbox ~]$ diff -c file1.txt file2.txt

*** file1.txt 2008-12-23 06:40:13.000000000 -0500

--- file2.txt 2008-12-23 06:40:34.000000000 -0500

***************

*** 1,4 ****

- a b c d

--- 1,4 ----

b c d

+ e

The output begins with the names of the two files and their timestamps. The first file is marked with asterisks and the second file is marked with dashes. Throughout the remain- der of the listing, these markers will signify their respective files. Next, we see groups of changes, including the default number of surrounding context lines. In the first group, we see:

*** 1,4 ***

which indicates lines 1 through 4 in the first file. Later we see:

--- 1,4 ---

which indicates lines 1 through 4 in the second file. Within a change group, lines begin with one of four indicators:

Table 20-5: diff Context Format Change Indicators

Indicator Meaning

blank A line shown for context. It does not indicate a difference between the two files.

- A line deleted. This line will appear in the first file but not in the second file.

+ A line added. This line will appear in the second file but not in the first file.

! A line changed. The two versions of the line will be displayed, each in its respective section of the change group.

The unified format is similar to the context format but is more concise. It is specified with the -u option:

[me@linuxbox ~]$ diff -u file1.txt file2.txt

--- file1.txt 2008-12-23 06:40:13.000000000 -0500

+++ file2.txt 2008-12-23 06:40:34.000000000 -0500

@@ -1,4 +1,4 @@

-a b c d

+e

[me@linuxbox ~]$ diff -u file1.txt file2.txt

--- file1.txt 2008-12-23 06:40:13.000000000 -0500

+++ file2.txt 2008-12-23 06:40:34.000000000 -0500

@@ -1,4 +1,4 @@

-a b c d

+e

The most notable difference between the context and unified formats is the elimination of the duplicated lines of context, making the results of the unified format shorter than those of the context format. In our example above, we see file timestamps like those of the con- text format, followed by the string @@ -1,4 +1,4 @@. This indicates the lines in the first file and the lines in the second file described in the change group. Following this are the lines themselves, with the default three lines of context. Each line starts with one of three possible characters:

Table 20-6: diff Unified Format Change Indicators