CLI Magic: Patching the differences

58

Author: Shashank Sharma

Working with free and open source software, one frequently hears terms such as bugs, updates, and patches. When developers come across shortcomings in their software, instead of repackaging the software with the changes, they can provide a patchfile that contains details of all required changes. Two important tools used in the process are diff, which creates a patchfile, and patch, which applies it. You can use both tools with text or HTML files.

User Level: Intermediate

As the name suggests, diff documents differences between two files. diff compares files line by line. Running the diff old_file new_file command displays the differences between files on your screen. The -u switch creates output in the unified diff format, which displays each difference with a few unchanged context lines above and below the change. A unified diff file can help you determine where changes have been made.

To create a unified diff format patchfile, run the command diff -u old_file new_file > patchfile. Here is a quick example to illustrate how diff works. We have files named first.txt and second.txt with inventory lists of sports equipment:

File first.txt

1 ball
2 bats
3 nets
2 caps
5 clubs
1 golf ball
File second.txt

3 balls
2 bats
3 nets
2 caps
4 clubs
1 golf ball

When we compare these two files with diff -u first.txt second.txt > patchfile, the patchfile contains the following:

--- first.txt     2006-01-21 16:20:40.271039432 +0530
+++ second.txt    2006-01-21 16:21:00.538958240 +0530
@@ -1,6 +1,6 @@
-1 ball
+3 balls
 2 bats
 3 nets
 2 caps
-5 clubs
+4 clubs
 1 golf ball

The --- line shows the name of the first file, which has the original inventory list. The +++ line shows the name of the second file, which contains the updated inventory list. The @@ line is called the header, and the section below the header is called the hunk. The hunk shows the actual changes between the two files. A large diff file will have several hunks, each with a unique header.

In the hunk, the lines that are not preceded by - and + symbols are the context lines. Lines starting with - indicate a line that was in the original file but not in the new file. Conversely, lines starting with + indicate a line that is in the new file but not in the original file. In our example, -1 ball means that the line was present in the original file but absent from the new file. The line +4 clubs indicates a line was not in the original file.

To determine whether two files differ, use the -q switch. For example, the command diff -q first.txt second.txt will display the string Files first.txt and second.txt differ.

Once you know the differences between two files, you can create a patchfile, which is applied using the patch tool.

Working with patch

In workgroups, many people work on the same software, documentation, and text files. If you want to apply changes to all copies of a file, you can use a patchfile and the patch command. For example, in order to update changes to the inventory list in the first.txt file, we can apply the patchfile we created earlier with the command patch first.txt < patchfile.

The filename is optional. A simple patch command also works, because patch looks at the patchfile to determine the name of the file to patch. This works in most cases because the filenames are the same on the machine where the patchfile is generated and where it is applied.

Sometimes, a file has been modified before a patch is applied. If a user has modified the file, then patch uses the context lines that were generated with the diff -u switch to determine the lines that need to be changed.

If you wish to keep a copy of the original file, use the -b switch; by default, patch replaces the original file with the patched file.

If you are unsure about whether to apply the changes, use the --dry-run switch, which displays the results of applying the patch without actually changing the file.

The man page is an excellent resource for reviewing the many options available with the patch command.

Conclusion

As is the case with all free and open source software, there are plenty of similar tools to choose from. For example, you can use vimdiff, a comparatively modern alternative to diff that uses vim to highlight the differences between two or three files, making comparison easy. You can also use diff3, an enhanced version of diff to compare three files. cmp is another tool that can be used to compare two files. It works by inspecting the files byte-by-byte. No project can do without diff and patch, however, for making quick changes across files in multiple locations.

Shashank Sharma is studying for a degree in computer science. He specializes in writing about free and open source software for new users.