OnWorks Linux and Windows Online WorkStations

Logo

Free Hosting Online for WorkStations

< Previous | Contents | Next >

tr

The tr program is used to transliterate characters. We can think of this as a sort of char- acter-based search-and-replace operation. Transliteration is the process of changing char- acters from one alphabet to another. For example, converting characters from lowercase to uppercase is transliteration. We can perform such a conversion with tr as follows:


[me@linuxbox ~]$ echo "lowercase letters" | tr a-z A-Z

LOWERCASE LETTERS

[me@linuxbox ~]$ echo "lowercase letters" | tr a-z A-Z

LOWERCASE LETTERS


As we can see, tr operates on standard input, and outputs its results on standard output. tr accepts two arguments: a set of characters to convert from and a corresponding set of characters to convert to. Character sets may be expressed in one of three ways:

1. An enumerated list. For example, ABCDEFGHIJKLMNOPQRSTUVWXYZ

2. A character range. For example, A-Z. Note that this method is sometimes subject to the same issues as other commands, due to the locale collation order, and thus should be used with caution.

3. POSIX character classes. For example, [:upper:].

In most cases, both character sets should be of equal length; however, it is possible for the first set to be larger than the second, particularly if we wish to convert multiple char- acters to a single character:



[me@linuxbox ~]$ echo "lowercase letters" | tr [:lower:] A

AAAAAAAAA AAAAAAA

[me@linuxbox ~]$ echo "lowercase letters" | tr [:lower:] A

AAAAAAAAA AAAAAAA


In addition to transliteration, tr allows characters to simply be deleted from the input stream. Earlier in this chapter, we discussed the problem of converting MS-DOS text files to Unix-style text. To perform this conversion, carriage return characters need to be re- moved from the end of each line. This can be performed with tr as follows:

tr -d '\r' < dos_file > unix_file


where dos_file is the file to be converted and unix_file is the result. This form of the com- mand uses the escape sequence \r to represent the carriage return character. To see a complete list of the sequences and character classes tr supports, try:


[me@linuxbox ~]$ tr --help

[me@linuxbox ~]$ tr --help


image

ROT13: The Not-So-Secret Decoder Ring

One amusing use of tr is to perform ROT13 encoding of text. ROT13 is a trivial type of encryption based on a simple substitution cipher. Calling ROT13 “encryp- tion” is being generous; “text obfuscation” is more accurate. It is used sometimes on text to obscure potentially offensive content. The method simply moves each character 13 places up the alphabet. Since this is half way up the possible 26 char- acters, performing the algorithm a second time on the text restores it to its original form. To perform this encoding with tr:

echo "secret text" | tr a-zA-Z n-za-mN-ZA-M

frperg grkg

Performing the same procedure a second time results in the translation:

echo "frperg grkg" | tr a-zA-Z n-za-mN-ZA-M

secret text

A number of email programs and Usenet news readers support ROT13 encoding. Wikipedia contains a good article on the subject:

http://en.wikipedia.org/wiki/ROT13


tr can perform another trick, too. Using the -s option, tr can “squeeze” (delete) re- peated instances of a character:



[me@linuxbox ~]$ echo "aaabbbccc" | tr -s ab

abccc

[me@linuxbox ~]$ echo "aaabbbccc" | tr -s ab

abccc


Here we have a string containing repeated characters. By specifying the set “ab” to tr, we eliminate the repeated instances of the letters in the set, while leaving the character that is missing from the set (“c”) unchanged. Note that the repeating characters must be adjoining. If they are not:


[me@linuxbox ~]$ echo "abcabcabc" | tr -s ab

abcabcabc

[me@linuxbox ~]$ echo "abcabcabc" | tr -s ab

abcabcabc


the squeezing will have no effect.


Top OS Cloud Computing at OnWorks: