Linux Potpourri

This post is being continuously updated.

Shell

NOTE: All the contents are based on bash shell.

Shortcuts

  • Ctrl + A: Move to the beginning of the line.
  • Ctrl + E: Move to the end of the line.
  • Ctrl + U: Delete from the cursor to the beginning of the line.
  • Ctrl + K: Delete from the cursor to the end of the line.
  • Ctrl + W: Delete the word before the cursor.
  • Ctrl + L: Clear the screen.
  • Ctrl + Z: Send SIGTSTP to the foreground process (suspend).
  • Ctrl + C: Send SIGINT to the foreground process (terminate).
  • Ctrl + \: Send SIGQUIT to the foreground process (terminate and create core dump).
  • Ctrl + D: Send EOF to the foreground process (end of file).

Wildcards

Wildcards are symbols that can be used to match multiple files or directories. Those symbols are supported by the shell, and they can be used in any command as long as the shell supports them.

Different shells may have different wildcard symbols, there are some wildcards in the bash shell:

  • *: matches any number of characters (including 0).
  • **: matches all files and directories recursively.
  • **/: matches all directories recursively.
  • ?: matches a single character.
  • []:matches any character in the brackets. For example, [abc] matches a or b or c.
  • [!] or [^]: matches any character not in the brackets. For example, [!abc] matches any character except a, b and c.

NOTE: ** and **/ are only available when globstar shell option is enabled (use shopt | grep globstar to check). You can use shopt -s globstar to enable it.

In the square brackets, you can use - to represent a range, for example, [0-9] matches any digit, and [a-z] matches any lowercase letter.

There is a symbol {} which is not a wildcard but is often used in the same way. It is used to represent multiple strings, for example, a{b,c}d will be parsed as abd or acd. For example, mv file_{old,new} will be parsed as mv file_old file_new. You can use nested curly brackets, for example, echo a{b{c,d},e}f will be parsed as echo abcf abdf aef, which works like distributive law in mathematics. .. is used to represent a range, for example, echo {a..z} will be parsed as echo a b c ... z.

Note: these wildcards are not all same with regex. In regex, * means 0 or more times, ? means 0 or 1 time, and ^ also means the beginning of a line.

When extglob is on (use shopt | grep extglob to check), those below are supported:

  • ?(pattern-list): Matches zero or one occurrence of the given patterns.
  • *(pattern-list): Matches zero or more occurrences of the given patterns.
  • +(pattern-list): Matches one or more occurrences of the given patterns.
  • @(pattern-list): Matches one of the given patterns.
  • !(pattern-list): Matches anything except one of the given patterns.

POSIX Character Classes

  • [:alnum:]: Letter or digit.
  • [:alpha:]: Letter.
  • [:blank:]: Space or tab.
  • [:cntrl:]: Control character.
  • [:digit:]: Digit.
  • [:xdigit:]: Hexadecimal digit (0-9, a-f, A-F).
  • [:graph:]: Printable character except space.
  • [:lower:]: Lowercase letter.
  • [:upper:]: Uppercase letter.
  • [:print:]: Printable character including space.
  • [:punct:]: Punctuation character.
  • [:space:]: Space, tab, newline, carriage return, form feed, or vertical tab.

Quoting

Quoting is to cancel the special meaning of some characters in the shell. For example, |, &, ;, (, ), <, >, space, tab, newline and ! are special characters in the shell. You must quote them if you want to use them as normal characters.

Escape Character

\ is the escape character in the shell. It can be used to cancel the special meaning of the next character.

It is worth noting that if the next character is a newline, the newline will be treated as a line continuation. We use this to avoid writing long commands in one line.

Single Quotes

Characters in enclosing single quotes are preserved literally.

NOTE: You can not put a single quote between enclosing single quotes even if preceded by a backslash. You can put a double quote between enclosing single quotes directly.

Double Quotes

Characters in enclosing double quotes are preserved literally, except for $, `, \, and ! (when history expansion is enabled).

NOTE: When the shell is in POSIX mode, the ! has no special meaning within double quotes, even when history expansion is enabled.

$ and \` retain their special meaning within double quotes. For example:

# $variable will be replaced by the value of the variable
echo "Hello, $USER" # Hello, kaiser
# `command` will be replaced by the output of the command
echo "Today is `date +%A`" # Today is Monday
# \ will be preserved literally unless followed by $ or ` or " or \ or newline
echo "Hello, \$USER" # Hello, $USER
echo "Hello, \"" # Hello, "
echo "Hello, \ no used" # Hello, \ no used
echo "Heloo, \
World" # Hello, World

When history expansion is enabled, ! will be interpreted as a history expansion character:

ls
# !!: the last command
echo "!!" # ls

You can use $"string" to translate the string according to the current locale. If there is no translation available, the string $ will be ignored.

Variables

You can use name=[value] (square braces here mean that the value is optional) to define a variable and set its value. If the value is omitted, the empty string is assigned to it.

NOTE: You can not put a space between name and =. For example, name = value is invalid.

You can also use declare command to define a variable and set its value. Those below are available declare attributes options:

Option Meaning
-a indexed array
-A associative array
-i integer
-l lowercase value automatically
-u uppercase value automatically
-r readonly
-x export
-n name reference to another variable
-t trace, rarely used

NOTE: You can use + instead of - to unset an attribute.

If you use declare -i to set the integer attribute of a variable, the value will be evaluated as an arithmetic expression automatically, for example:

a=1+2
echo $a # 1+2
# You can use $((...)) to evaluate an arithmetic expression
echo $((a)) # 3
declare -i a=1+2
echo $a # 3

You can use += to expand a variable:

declare -i num=5
num+=3
echo $num # 8
num+=2+5
echo $num # 15

str="Hello"
str+=" World"
echo "$str" # Hello World

arr=("apple" "banana")
arr+=("cherry" "date")
echo "${arr[@]}" # apple banana cherry date

arr=([0]="a" [2]="b") # Non-continuous index
arr+=("c") # Appends at index 3 (next max index + 1)
echo "${!arr[@]}" # 0 2 3 (indexes of the array)

# Must use declare -A aarr to define aarr as an associative array,
# which is similar with a dictionary or map
declare -A aarr
aarr=([name]="Alice" [age]=30)
aarr+=([city]="Paris" [job]="Engineer") # Add new key-value pairs
echo "${aarr[@]}" # Alice 30 Paris Engineer (unordered)

The * in the value of a variable is not expanded, but is treated as a normal character. You can use (*.txt) to expand to all .txt files.

NOTE: The variable can be unset by the unset command.

Positional Parameters

A positional parameter is a parameter denoted by one or more digits, other than the single digit 0.

Positional parameters are assigned from the shell’s arguments when it is invoked, and may be reassigned using the set builtin command. Positional parameters may not be assigned to with assignment statements. The positional parameters are temporarily replaced when a shell function is executed.

When a positional parameter consisting of more than a single digit is expanded, it must be enclosed in braces.

You can update positional parameters with set command, for example:

# Set all the positional parameters with 1 2 3 4
set -- 1 2 3 4

args=("$@") # Copy positional args into an array
args[1]="mango" # Change the second element (index 1)
set -- "${args[@]}" # Reset positional arguments

set -- "$@" "bird" # Append "bird"

set -- "fish" "$@" # Prepend "fish"

# Remove the first positional parameter
# After this, the $1 is the value of original $2
shift # or shift 1

args=("$@")
unset 'args[2]' # Remove the second positional parameter
set -- "${args[@]}"

NOTE: 0 is not a positional parameter, it is a special parameter, which will be expanded to the name of the shell or shell script.

Special Parameters

  • $*: All positional parameters, each of which expands to a separate word.
  • "$*": A single string with all positional parameters separated by the first character of the IFS variable. If IFS is unset, the parameters are separated by spaces. If IFS is null, the parameters are joined without intervening separators.
  • $@: All positional parameters, each of which expands to a separate word.
  • "$@": Equivalent to "$1" "$2" ... "$N" (where N is the number of positional parameters). prefix"$@"suffix will be parsed as prefix"$1" "$2" ... "$N"suffix.
  • $#: The number of positional parameters in decimal.
  • $?: The exit status of the most recently executed foreground pipeline.
  • $$: The process ID of the shell. In a sub-shell, it expands to the process ID of the current shell, not the sub-shell.
  • $!: The process ID of the job most recently placed into the background.
  • $0: The name of the shell or shell script. If bash is invoked with a file of commands, $0 is set to the name of that file. If bash is started with the -c option, then $0 is set to the first argument after the string to be executed, if one is present. Otherwise, it is set to the filename used to invoke bash, as given by argument zero.
  • $_: The last argument of the previous command or script path.
  • $-: The current option flags as specified upon invocation, by the set builtin command, or those set by the shell itself (such as the -i option).

The characters of $- and their meanings:

Flag Meaning
h hashall (remembers command locations in $PATH)
i interactive shell
m monitor mode (job control enabled)
H history expansion enabled (e.g., !! expands to the last command)
B brace expansion enabled (e.g., {a, b} expands to a b)
s compounds read from stdin (e.g., bash -s

Arrays

An indexed array is created automatically if any variable is assigned to using the syntax name[subscript]=value. The subscript is treated as an arithmetic expression that must evaluate to a number.

For an indexed array, you can use negative index, which will count back from the end of the array. For example, a[-1] means the last element, and a[-2] means the second to last.

You can reference an element of any array with ${name[subscript]}. You can not omit the braces.

You can use ${name[@]} or ${name[*]} to get all assigned values, and ${!name[@]} or ${!name[@]} to get all assigned indices. They difference between them when they are double-quoted is similar with the one between "$@" and "$*".

${#name[subscript]} expands to the length of ${name[subscript]}. If subscript is * or @, the expansion is the number of elements in the array.

$name is equivalent to ${name[0]}.

unset ${name[0]} can destroy the first element of the array.

unset name or unset ${name[@]} or unset ${name[*]} will removes the entire array.

You can use declare -a name to create an array called name.

You can use declare -A name to create an associative array called name.

declare -a -A name is equivalent to declare -A name.

For an associative array, Using name=( key1 value1 key2 value2...) and name=( [key1]=value1 [key2]=value2 ...) to assign value are both OK. But you can not mixed these two types like myarr=( key1 value1 [key2]=value2 ). If you leave off a value at the end, it’s treated as the empty string. In declare -A myarr=( key1 value1 key2 ), myarr[key2] is an empty string.

When using a variable name with a subscript as an argument to a command, such as with unset (unset arr[i]), without using the word expansion syntax described above (unset ${arr[i]}), the argument is subject to pathname expansion (expands to unset arri). If pathname expansion is not desired, the argument should be quoted (unset 'arr[i]' or unset "arr[i]").

Expansion

The order of expansions is:

  • brace expansion;
  • tilde expansion, parameter and variable expansion, arithmetic expansion, command substitution (done in a left-to-right fashion) and process substitution (if the system supports it);
  • word splitting;
  • pathname expansion.

After these expansions are performed, quote characters present in the original word are removed unless they have been quoted themselves (quote removal).

Brace Expansion

This is the expansion of {} see Wildcards.

Tilde Expansion

  • ~: Current user’s home ($HOME).
  • ~username: Home directory of username.
  • ~+: Current working directory ($PWD).
  • ~-: Previous working directory ($OLDPWD).
  • "~" or '~': Literal ~ (no expansion).
  • ~+number: The number-th entry of the output dirs (0-indexed).
  • ~-number: The number-last entry of the output dirs (1-indexed).
  • ~number: Same as ~+number.

NOTE: You can use pushd to add directory to dirs and popd to remove directory from dirs.

Parameter and Variable Expansion

  • ${parameter}: The value of parameter is substituted. Sometimes, the braces can be omitted.
  • ${!parameter}: Expands to the value of the variable named by parameter. For example:
foo='Hello'
bar='foo'
echo "${!bar}" # Hello
  • ${!nameref}: Expands to the referenced name, for example:
declare -n nameref_var="target_var"  # nameref_var is a reference to target_var
target_var="Hello"

echo "$nameref_var" # Hello (dereferences automatically)
echo "${!nameref_var}" # target_var (returns the referenced name)
  • ${!prefix*} or ${prefix@}: Expands to the values of variables whose names begin with prefix. For example:
a=1
aa=11
aaa=111
echo "${!a*}" # 1 11 111 (all variables starting with a)
echo "${!a@}" # 1 11 111 (all variables starting with a)
  • ${parameter:offset}: Expands to the substring of the value of parameter starting at the character specified by offset.
  • ${parameter:offset:length}: Expands to the substring of the value of parameter starting at the character specified by offset and extending for length characters. For examples:
# Basic usage
str="Hello, World!"
echo "${str:7}" # World! (substring starting at index 7)
echo "${str:7:5}" # World (substring starting at index 7 and length 5)
echo "${str:7:-1}" # World (substring starting at index 7 and end at -1)
# The space between `:` and `-` is required to avoid confusion with the `:-` expansion
echo "${str: -6}" # World! (substring starting at index -6)
echo "${str: -6:5}" # World (substring starting at index -6 and length 5)
echo "${str: -6:-1}" # World (substring starting at index -6 and end at -1)

# For @
set -- A B C D E
echo "${@:0:1}" # the name of the script or the shell
echo "${@:2}" # B C D E (substring starting at index 2)
echo "${@:2:3}" # B C D (substring starting at index 2 and length 3)
echo "${@: -3}" # C D E (substring starting at index -3)
echo "${@: -3:2}" # C D (substring starting at index -3 and length 2)
# echo "${@:2:-1}" # Error, the length can not be negative for @

# For indexed array
arr=(A B C D E)
echo "${arr[@]:2}" # C D E (substring starting at index 2)
echo "${arr[@]:2:3}" # C D E (substring starting at index 2 and length 3)
echo "${arr[@]: -3}" # C D E (substring starting at index -3)
echo "${arr[@]: -3:2}" # C D (substring starting at index -3 and length 2)
# echo "${arr[@]:2:-1}" # Error, the length can not be negative for an indexed array

# Undefined results for associative array
  • ${!array[@]} or ${!array[*]}: Expands to the indices of the array array.
  • ${#parameter}: The length of the value of parameter is substituted.
  • ${#*} or ${#@}: Same with $#: the number of positional parameters.

For those below, you can remove : to make it only work for unset variables:

  • ${parameter:-word}: Expands to word if parameter is unset or null; otherwise, it expands to the value of parameter.
  • ${parameter:=word}: Assigns word to parameter and expands to word if parameter is unset or null; otherwise, it expands to the value of parameter. You can not use this to positional parameters.
  • ${parameter:?word}: word is written to standard error if parameter is unset or null, if it is not interactive, exits; otherwise, it expands to the value of parameter.
  • ${parameter:+word}: Nothing is substituted if parameter is null or unset; otherwise, the expansion of word is substituted.

For those below, if parameter is @ or * or an array subscripted with @ or *, the pattern removal operation is applied to each element in turn, and the expansion is the resultant list:

  • ${parameter#word}: Removes the shortest match of word from the beginning of parameter. Wildcards are allowed in word. See Wildcards.
  • ${parameter##word}: Removes the longest match of word from the beginning of parameter. Wildcards are allowed in word. See Wildcards.
  • ${parameter%word}: Similar to ${parameter#word} but removes the suffix instead of the prefix.
  • ${parameter%%word}: Similar to ${parameter##word} but removes the suffix instead of the prefix.
  • ${parameter@U}: Converts the value of parameter to uppercase.
  • ${parameter@u}: Converts the first character of the value of parameter to uppercase.
  • ${parameter@L}: Converts the value of parameter to lowercase.
  • ${parameter@a}: Expands to the attributes of parameter.
  • ${parameter@E}: Expands to a string with all the escaped characters expanded (such as \n -> newline).
  • ${parameter@A}: Expands to a string whose value, if evaluated, will recreate parameter with its attributes and value. If used for array variables, you should use ${a[@]@A} to get the string.
  • ${parameter@Q}: Expands to a single-Quoted string with any special characters (such as \n, \t, etc.) escaped. For examples:
a='Hello World'
b=('Hello' 'World')
declare -A c=([first]='Hello' [second]='World')
echo "${a@Q}" # 'Hello World'
echo "${b[@]@Q}" # 'Hello' 'World'
echo "${c[@]@Q}" # 'World' 'Hello' (unordered)
  • ${parameter@K}: Similar to ${parameter@Q}, but this will print the values of indexed and associative arrays as a sequence of quoted key-value pairs. For examples:
a='Hello World'
b=('Hello' 'World')
declare -A c=([first]='Hello' [second]='World')
echo "${a@K}" # 'Hello World'
echo "${b[@]@K}" # 0 "Hello" 1 "World"
echo "${c[@]@K}" # first "Hello" second "World"
  • ${parameter@P}: Expands as if it were a prompt string. For examples:
PS1='\u@\h:\w\$ '
echo "${PS1@P}" # user@host:/path$  (expands prompt codes)
  • ${parameter/pattern/string}: Replace the longest match of pattern with string. If pattern begins with /, all matches of pattern are replaced with string. If pattern begins with #, it must match at the beginning of the expanded value of parameter. If pattern begins with %, it must match at the end of the expanded value of parameter. If string is null, matches of pattern are deleted and the / following pattern may be omitted. If the nocasematch shell option is enabled, the match is performed without regard to the case of alphabetic characters.
  • ${paramter^pattern}: Convert the first match of pattern to uppercase. The pattern can only match one character. If pattern is omitted, it is treated like a ?, which matches every character. Wildcards are allowed in pattern. See Wildcards.
  • ${paramter^^pattern}: Convert all matches of pattern to uppercase. The pattern can only match one character. If pattern is omitted, it is treated like a ?, which matches every character. Wildcards are allowed in pattern. See Wildcards.
  • ${paramter,pattern}: Convert the first match of pattern to lowercase. The pattern can only match one character. If pattern is omitted, it is treated like a ?, which matches every character. Wildcards are allowed in pattern. See Wildcards.
  • ${paramter,,pattern}: Convert all matches of pattern to lowercase. The pattern can only match one character. If pattern is omitted, it is treated like a ?, which matches every character. Wildcards are allowed in pattern. See Wildcards.

Arithmetic Expansion

  • $$(expression): The value of expression is substituted.

Command Substitution

  • $(command) or `command`: The standard output of command is substituted with trailing newlines deleted.

Process Substitution

  • <(command): Provides the output of command as a file that can be read from.
  • >(command): Provides a file that, when written to, becomes the input for command

Word Splitting

The part depends on the IFS variable.

The shell scans the results of parameter expansion, command substitution, and arithmetic expansion that did not occur within double quotes for word splitting.

You can put variables in double quotes to prevent word splitting, for examples:

a='1 2   3'
echo $a # 1 2 3
echo "$a" # 1 2   3

set -- $a
echo $# # 3
set -- "$a"
echo $# # 1

The IFS default to space, tab, and newline.

Pathname Expansion

See Wildcards.

Quote Removal

After the preceding expansions, all unquoted occurrences of the characters \, ', and " that did not result from one of the above expansions are removed.

Environment Variables

Locality

Variable Description
LC_ALL Overrides all locale settings
LC_CTYPE Character classification and case conversion
LC_COLLATE String collation order
LC_MESSAGES Language for system messages
LC_TIME Date and time formatting
LC_NUMERIC Number formatting
LC_MONETARY Currency formatting
LC_PAPER Paper size and format
LC_NAME Name formatting
LC_ADDRESS Address formatting
LC_TELEPHONE Telephone number formatting
LC_MEASUREMENT Measurement units
LC_IDENTIFICATION Locale identification
LANG Default locale setting

The priority of locale settings:

LC_ALL > LC_* (specific category) > LANG

NOTE: There is another variable called LANGUAGE which is used to specify the language priority list for messages.

Commands

awk

Option Description
-F Specify the input field separator
-f Specify the file containing the awk script

Digressing: awk comes from the initials of its three creators: Alfred Aho, Peter Weinberger, and Brian Kernighan.

Column Variables

  • $0: the whole line.
  • $1: the first column.
  • ...
  • $NF: the last column (where NF is the number of fields in the current record).

For example, awk '{print $1, $2, $NF}' file will print the first column, second column, and last column of each line in file.

Separators

You can specify the output field separator (OFS) using the OFS variable. For example, awk '{print $1,$2,$NF}' OFS="," will print the first column, second column, and last column of each line in file, separated by a comma.

Is is also possible to specify OFS in the BEGIN mode string, for example: awk 'BEGIN {OFS=","} {print $1,$2,$NF}' file does the same thing.

Besides, FS is used to specify the input field separator.

If you want to change the separators, you may write the wrong command like awk '{print}' FS=':' OFS=' ' trying to change : to a space, but this will not work as expected. In awk, it will only use the new OFS when printing multiple fields, or when the fields are modified. Therefore, we can use awk '($1=$1) || 1' FS=':' OFS=' ' to update the OFS, which is trying to change the first field to itself. And we use || 1 here to make sure the empty lines are also printed.

NOTE: The parentheses around $1=$1 are necessary, because without them, awk would interpret it as $1=($1 || 1), which would assign 1 to $1 instead of keeping its original value.

Predefined Variables

OFS, FS are predefined variables in awk, which are used to specify the output and input field separators respectively.

And there are some other predefined variables in awk:

  • NR: current record number starting from 1.
  • NF: number of fields in the current record .
  • RS: input record separator, default is newline.
  • ORS: output record separator, default is newline.
  • FILENAME: the name of the current input file.
  • FNR: current record number in the current file.

We can use awk 'NR > 1' to print all lines except the first line, and awk 'NF > 0' to print non-empty lines (i.e., remove empty lines).

We can use awk 'NR == 1, NR == 4' or awk 'NR >= 1 && NR <= 4' to print the first four lines.

Patterns

BEGIN and END:

  • BEGIN: executed before processing any input.
  • END: executed after processing all input.

For example, you can use awk 'BEGIN {print "Start"} {print} END {print "End"}' to print the contents of a file with Start and End messages.

In awk, you can use patterns to filter lines, for example, awk '$1 > 10' means to print only the lines where the first column is greater than 10 (when no action is specified, the default action is to print the whole line). awk '/pattern/' means to print only the lines that contain pattern, and the search pattern supports regular expressions. awk '/pattern1/,/pattern2/' means to print all lines from the first line that matches pattern1 to the first line that matches pattern2.

You can use ~ and !~ to search for patterns in a column. For example, awk '$1 ~ /pattern/' means to print only the lines that match the regular expression pattern in the first column. awk '$1 !~ /pattern/' means to print only the lines that do not match the regular expression pattern in the first column.

awk Scripts

For some complex tasks, you can write awk scripts. Here is an example of a simple awk script that statistics the number of words in a file:

#! /usr/bin/awk -f

BEGIN {
    # Update the input and output field separators
    FS=":"
    OFS=" "
    tot_count = 0
}
{
    for (i = 1; i <= NF; i++) {
        words[$i]++
        tot_count++
    }
}
END {
    print "Total words:", tot_count
    for (word in words) {
        print word, words[word]
    }
}

NOTE: awk file is not the right way to run an awk script, and we must use the -f option to specify the script file.

cat

Option Description
-n Show line numbers
-b Only show non-empty lines numbers
-s Suppress repeated empty lines
-v Use ^ and M- to display non-printable characters, except for tab and newline
-E Display $ at the end of each line
-T Display ^I for tab characters
-A Equivalent to -vET
-e Equivalent to -vE
-t Equivalent to -vT

You can use cat to read from standard input then output to a file. For example, cat > file will read from standard input and write to file (use ^D to send EOF); cat >> file can be used to append content to file.

You can put - as a file name to read from standard input. For example, cat file1 - file2 will read from standard input and output the content of file1, then the content of standard input, and finally the content of file2; cat file1 - file2 - file3 will read from standard input and output the content of file1, then the content of standard input, then the content of file2, and finally the content of file3. In this process, you will be required to input the content twice, you can use ^D to end the first input, then input the second content and use ^D to end it again.

There is another command zcat, which is similar to cat but used for compressed files.

NOTE: tac is a command that is similar to cat, but it outputs the contents of files in reverse order.

Digression: The name cat comes from concatenate.

grep

Option Description
-A 2 Show matching lines and the next two lines
-B 2 Show matching lines and the previous two lines
-C 2 Show matching lines and two lines before and after
-r Recursively search directories
-n Show line numbers
-i Ignore case
-v Invert match
--include "*.py" Search only in files matching the pattern
--exclude "test*" Skip files matching the pattern
--exclude-dir "test*" Skip directories matching the pattern
-c Show the count of match in each file instead of the matching lines
-o Only show the matching part
-l Show the names of files with matches instead of the matching lines
-L Show the names of files without matches instead of the matching lines
-e Specify a pattern explicityly
-w Only match whole words
-x Only match whole lines
-F Interpret the pattern as a fixed string, not a regex
-H Show file names in output (default when multiple files are searched)
-h Do not show file names in output (default when a single file is searched)
-m Stop after N matches for each file
--color=auto,always,never Highlight rules
-f Read patterns from a file, one pattern per line
-E Interpret the pattern as an extended regular expression (ERE)

The difference between extended regular expressions (ERE) and basic regular expressions (BRE) is that in ERE, ?, +, {, |, (, and ) are special characters, while in BRE, they are not special characters unless escaped with a backslash (\).

NOTE: -e is useful when you want to specify multiple patterns or the pattern starts with a hyphen (-).

NOTE: We usually use egrep as an alias for grep -E, and fgrep as an alias for grep -F.

NOTE: Word-constituent characters are letters, digits, and underscores.

NOTE: In shell command, you usually need to quote the pattern to prevent pathname expansion.

Regression: grep comes from the command g/re/p, where g stands for global, re stands for regular expression, and p stands for print.

sed

Option Description
-i[SUFFIX] Edit files in place, optionally with a backup suffix
-e Add a script to the commands to be executed
-f Add a script file to the commands to be executed
-E Use extended regular expressions
-n Suppress automatic printing of pattern space

NOTE: With -n, sed will only print lines explicitly specified with the p, P, or w.

Pattern Space and Hold Space

sed uses two spaces: the pattern space and the hold space.

The pattern space is where the current line is processed. sed will read one line at a time from the input into the pattern space.

The hold space is a temporary storage area that can be used to store data between commands. It is empty until you explicitly use some commands to store data in it. And content in the hold space persists across multiple lines.

Commands

Command Description
s Substitute a pattern in the pattern space
p Print the pattern space
P Print the first part of the pattern space (up to the first newline)
d Delete the pattern space
D Delete the first part of the pattern space (up to the first newline)
q Quit sed immediately
h Copy the pattern space to the hold space
H Append the pattern space to the hold space
g Copy the hold space to the pattern space
G Append the hold space to the pattern space
n Read the next line into the pattern space
N Append the next line to the pattern space
x Exchange the pattern space and the hold space
= Print current line number
w Write the pattern space to a file
W Write the first part of the pattern space (up to the first newline) to a file
a\ Append text after the current line (use \ to continue on the next line)
c\ Change the current line to the specified text (use \ to continue on the next line)
i\ Insert text before the current line (use \ to continue on the next line)

NOTE: The a, c, and i are GNU sed extensions. It can be used without \ but only for a single line of text.

Examples for s command

# Replace the first occurrence of "old" with "new" in each line
sed 's/old/new/' file.txt
 # Replace all occurrences of "old" with "new" in each line
sed 's/old/new/g' file.txt
# Replace all occurrences of "old" with "new" in each line, ignoring case
sed 's/old/new/gi' file.txt
# Replace the second occurrence of "old" with "new" in each line
sed 's/old/new/2' file.txt
# Replace "old" with "new" in each line from the second occurrence to the end of the line
sed 's/old/new/2g' file.txt
# Replace the first occurrence of "old" with "new" in every 10th line
sed '1~10s/old/new/' file.txt
# Use | as the delimiter instead of /
# Some available delimiters are: /, |, #, @, !, and +
sed 's|/var/log|/var/logs|g' file.txt
# Replace the first occurrence of "old" with "new" in the 5th line
sed '5s/old/new/' file.txt
# Replace the first occurrence of "old" with "new" in lines 5 to 10
sed '5,10s/old/new/' file.txt
# Replace all occurrences of "old" with "new" in lines matching "pattern"
sed '/pattern/s/old/new/g' file.txt
# Replace all occurrences of "old" with "new" in lines matching "pattern" and all following lines
sed '/pattern/,$s/old/new/g' file.txt
# Replace all occurrences of "old" with "new" in lines between "start_pattern" and "end_pattern"
sed '/start_pattern/,/end_pattern/s/old/new/g' file.txt

find

Option Description
-type Specify the type of file to search for
-readable Search for files or directories that are readable by the current user
-writable Search for files or directories that are writable by the current user
-executable Search for files or directories that are executable by the current user
-name Specify the file name to search for. Can use wildcards
-path Specify the path to search for. Can use wildcards
-iname Similar to -name, but ignores case
-ipath Similar to -path, but ignores case
-empty Search for empty files or directories
-size Specify the size of the file or directory to search for
-exec Execute a command on the found files or directories
-perm Specify the permissions to search for
-user Specify the owner of the file or directory
-group Specify the group of the file or directory
-maxdepth Specify the maximum depth to search
-mindepth Specify the minimum depth to search
-depth Process each directory’s contents before the directory itself
-delete Delete the found files or directories
-and Logic and
-or Logic or
-not Logic not
-regex Use a regular expression
-iregex Use a case-insensitive regular expression
-print0 Print the found files or directories, separated by a null character
-samefile Search for files that are hard links to the specified file
-links Search for files with a specific number of hard links
-P Neever follow symbolic links (default)
-L Follow symbolic links
-H Follow symlinks only for the starting directories explicitly passed as arguments
-mtime Specify the modification time of the file or directory, the time is in days
-atime Specify the access time of the file or directory, the time is in days
-ctime Specify the creation time of the file or directory, the time is in days
-mmin Specify the modification time of the file or directory, the time is in minutes
-amin Specify the access time of the file or directory, the time is in minutes
-cmin Specify the creation time of the file or directory, the time is in minutes

The options for -type:

  • b: block device file
  • c: character device file
  • p: pipe file
  • s: socket file
  • f: regular file
  • d: directory
  • l: soft link file

NOTE: When you use -regex, the pattern is matched against the entire file name, which is a little bit different from grep. If you want to match a specific part of the file name, you need to use .* to match any characters before and after the pattern. For example, find . -regex ".*pattern.*" will find files that contain pattern in their names.

The units for -size:

  • b: block size, which is decided by the file system (usually 512 bytes)
  • c: bytes
  • k: kilobytes (1024 bytes)
  • M: megabytes (1024 kilobytes)
  • G: gigabytes (1024 megabytes)

You can use find . -size 1M to find files that are exactly 1 megabyte in size; use find . -size +1M to find files larger than 1 megabyte; use find . -size -1M to find files smaller than 1 megabyte; use find . -size +1M -size -2M to find files larger than 1 megabyte but smaller than 2 megabytes.

You can use -exec to execute a command on the found files or directories.

For example, find . -name "*.txt" -exec ls -l {} \; will find all txt files in the current directory and execute ls -l on each of them. {} will be replaced by the found file name, \; means the end of the command, and the semicolon needs to be escaped to prevent it from being interpreted by the shell.

ln

Option Description
-s Soft link
-f Force the creation of the link, removing existing files if necessary
-t Specify the target directory for the link

NOTE: When using ln target link_name, and link_name is an existing directory, it will create a file named target in that directory.

NOTE: rm and unlink commands can both delete symbolic links or hard links. But unlink can only delete one file at a time,

You can use ln to create hard links and soft links (symbolic links) in Linux. Hard links are files that point to the same inode as the original file, while soft links are files that point to the original file by its path. Besides, hard links and soft links have the following differences:

  • Soft links can point to files in different file systems, while hard links can only point to files in the same file system.
  • Soft links can point to directories, while hard links cannot point to directories.
  • When the source file is deleted, hard links will still be valid, while soft links will become invalid.

In Linux, you can use the ls -l command to view the number of hard links to a file.

sort

Option Description
-r Sort in reverse order
-n Sort numerically
-k Sort by a specific key (column)
-u Unique sort
-t Specify the field separator
-c Check if the input is already sorted
-f Ignore case when sorting
-h Sort by human-readable numbers (e.g., 1K, 2M)
-M Sort by month names (e.g., Jan, Feb)
--files0-from=- Read from standard input with NUL as the file name separator
--files0-from=filename Read from a file with NUL as the file name separator

NOTE: You can use -k to specify multiple keys for sorting. For example, -k2,2 -k1,1 means sort by the second column first, then by the first column. And you can specify parts of a column for sorting, such as -k2.2,2.3 to sort by the second column from the second character to the third character. Besides, you can specify the type of the column for sorting, such as -k2n,2 to sort the second column numerically and -k2r,2 to sort the second column in reverse order.

tar

Option Description
-c Create a new tar archive
-f Specify the name of the tar archive file
-v Verbose mode
-z Use gzip compression or decompression
-j Use bzip2 compression or decompression
-J Use xz compression or decompression
-x Extract files from a tar archive
-C Change to a directory before performing operations
-t List the contents of a tar archive
--wildcards Enable wildcard matching for file names
--delete Delete files from a tar archive (only works with uncompressed archives)
--exclude Exclude files or directories from the tar archive
-r Append files to a tar archive
-A Append another tar archive to the current one
-u Update files in a tar archive, only adding newer files

NOTE: When using --exclue, you must use = to connect the option, and --exclude must appear before the files or directories to be packed. For example, tar -cf a.tar --exclude=*.txt . is correct, but tar -cf a.tar . --exclude=*.txt is incorrect.

NOTE: When using -u, it will not overwrite old files, but will append new files directly, after which the tar archive may contain multiple files with the same name. But I have not found a way to extract the old files.

Makefile

Variables

You can define variables in Makefile and use the variables through $(variable):

objects = obj1.o obj2.o obj3.o
all: $(object)
	$(CC) $(object) -o main

If you want to express the literal $, you can use $$.

If you want to define a variable whose value is a single space, you can use:

nullstring :=
space := $(nullstring) # end of line

The $nullstring is a empty string, and space is a single space.

NOTE: Note that the # and a single space before # are necessary. This is because that the trailing spaces will be added to variables.

You can nest $ to get value:

x = y
y = z
z = u
# $(x) is y
# $($(x)) is $(y), and $(y) is z
# $($($(x))) is $(z), and $(z) is u
a := $($($(x)))

Target Variables

You can set variables only valid for the specified target:

prog: CFLAGS = -g
prog: prog.o foo.o bar.o
	$(CC) $(CFLAGS) prog.o foo.o bar.o
prog.o: prog.c
	$(CC) $(CFLAGS) prog.c
foo.o: foo.c
	$(CC) $(CFLAGS) foo.c
bar.o: bar.c
	$(CC) $(CFLAGS) bar.c

In the example above, only the prog’s CFLAGS is -g.

= := += and ?=

There are many equal signs in make, but they are very different with each other:

  • =: This is like references in C/C++. For example, if you use a = $(b), once b’s value changed after, the a will change too.
  • :=: This is assignment equal sign, which is similar with = in C/C++.
  • +=: This is to append a variable with new values.
  • ?=: This will check if the variable is assigned before; if it is, this will not work, otherwise it is similar with =.

For the +=:

  • If the variable has not been defined, it will be =.
  • If the variable has been define, it will follow the last equal sign. If the last equal is =, += will use =; if the last equal sign is :=, += will use :=.

override

When the variable is defined by the make command, for example make a=12 will define a variable called a whose value is 12, the variable defined and assigned in your Makefile will be replaced by the command line’s. If you don’t want the variable be replaced, you can use override:

# you can use other equal signs
override a := 0

Multi Line Variables

You can define multi line variables by define, the signature after define is the name of the variable. Note that commands in macro must start with tab, so if the lines are not started with tab, they will be treated as a multi line variable’s value. There is an example below:

define two-lines
echo foo
echo $(bar)
endef

NOTE: $(bar) will be replaced by the value of bar.

$@ $< $* $% $? $^ and $+

  • $@: A variable in make, whose value is the target. For example, if $@ appears in commands following main.o: main.cpp, $@ will be main.o exactly.
  • $<: A variable in make, whose value is the first dependency. For example, if $< appears in commands following main.o: main.cpp, $< will be main.cpp exactly.
  • $*: A variable in make, whose value is the stem of the target. For example, using $* following pre_%.o: pre_%.c, and the target is pre_foo.o, $* will be foo.
  • $%: When the target is in an archive (like foo.a), this variable is the names of members. For example, if a target is foo.a(bar.o) the $% will be bar.o and the $@ will be foo.a.
  • $?: A variable in make, whose value is all the dependencies that are newer than the target.
  • $^: A variable in make, whose value is all the dependencies of the target. This will have only one copy if the target depends on the same file more than once.
  • $+: A variable similar with $^ but will store the repetitive files.

Auto Deduction

You can use Makefile with auto deduction. Auto deduction looks like this:

%.o: %.c

The example above will add main.c or main.cpp for target main.o, and the command $(CC) $(CFLAGS) -c $< -o $@ will be added automatically too.

Implicit Rules

Implicit rules are similar with auto deduction. Or we can say auto deduction depends on implicit rules.

There are different implicit rules for different files. I’ll give the implicit rules of C and C++ files:

  • C: *.o will be deducted depending on *.c and the build command will be $(CC) -c $(CPPFLAGS) $(CFLAGS).
  • C++: *.o will be deducted depending on*.C, *.cc or *.cpp and the build command will be $(CC) -c $(CPPFLAGS) $(CFLAGS).

PHONY

.PHONY is to specify the target is a pseudo target. This is usually used for clean and all. .PHONY will let make not treat the target as a file:

.PHONY: clean

include

include is very similar with the include in C/C++. The command will read the files’ contents and put them where the include command is:

# this will read the contents of config.make and put it there.
include config.make

make can also use the -I to specify where to find the files used by include command. For example make -I./include will let include command find the ./include directory.

VPATH and vpath

VPATH is a variable in make, which is used to specify the directories where make will look for files when they are not found in the current directory. The value of VPATH is a colon-separated list of directories.

# this will specify two directories to be used to find files
VPATH = src:../include

vpath is a keyword of make, and this keyword also can be used to find files:

# % is similar with .* of regexpr
# this is to specify to find header files in ../include
vpath %.h ../include

# this is to specify to find C files in ./src
vpath %.c ./src

Multi Targets

You can write more than one target in one line:

bigoutput littleoutput : text.g
	-generate text.g $(subst output,,$@) > $@

$(subst output,,$@) means substitute output of $@ with empty string. The hyphen before means to ignore errors. make will check the return value after each command. When return value is non-zero, make will stop, the hyphen means don’t check the return value.

More straightforward, the commands above are same as:

bigoutput : text.g
	-generate text.g big > bigoutput
littleoutput : text.g
	-generate text.g little > littleoutput

Static Pattern Rules

In make, you can use static pattern rules to simplify the Makefile:

objects = foo.o bar.o
all: $(objects)
$(objects): %.o: %.c
	$(CC) -c $(CFLAGS) $< -o $@

$(objects): %.o: %.c will find foo.o and bar.o from $(objects) and place them at %.o, and then find foo.c and bar.c from foo.o bar.o.

There is another example to use filter and static pattern rules:

files = foo.elc bar.o lose.o
# $(filter %.o,%(files)) will get all the .o files from $(files)
$(filter %.o,$(files)): %.o: %.c
	$(CC) -c $(CFLAGS) $< -o $@
$(filter %.elc,$(files)): %.elc: %.el
	emacs -f batch-byte-compile $<

Generate Dependencies Automatically

gcc -MM *.c will print the header files of C files. For example gcc -MM test.c’s output will look like this:

test.o: test.c header1 header2 header3

We can use the command to automatically add dependencies for a target:

# Add @ before one command will not print the command
# set -e will let the command stop when error occurs
%.d: %.c
	@set -e; \
	$(CC) -MM $< > $@.; \
	sed -e 's/\($*\)\.o[ : ]*/\1.o $@ :/g' < $@. > $@; \
	rm -f $@.

We use $(CC) -MM $< > $@. to generate a file %.d. to store the output. Then we use sed to substitute the %.o with %.o %.d and output to %.d. Finally we remove the %.d..

After this, we can include the %.d files:

sources = a.c b.c
# substitude .c with .d in sources
include $(sources:.c=.d)

After include, we’ll have something like a.o a.d: a.c header in our Makefile, then make will auto deduction the commands.

Nested Makefiles

If you use 3rd parties, you may want to build the 3rd parties by make:

subsystem:
	$(MAKE) -C subdir

You can pass the variables of current Makefile to sub-Makefile through export:

export variable = value
# pass all variables
export

define

In make, you can use define to define macros:

# note that there is no semicolon after each command
define name
	command1
	command2
	...
endef

If you want to use the macro, you just need use $(macro_name) to call the macro.

Conditional Structures

This part is simple, so I just post some examples.

The example using ifeq:

libs_for_gcc = -lgnu
normal_libs =
foo: $(objects)
ifeq ($(CC),gcc)
	$(CC) -o foo $(objects) $(libs_for_gcc)
else
	$(CC) -o foo $(objects) $(normal_libs)
endif

You can use ifneq, ifdef and ifndef, too.

Functions

You can use functions in make through $(func_name args).

There are some functions related to strings:

  • $(subst from,to,text): substitute from with to in text.
  • $(patsubst pattern,replacement,text): substitute pattern with replacement in text.
  • $(strip text): remove all the leading and trailing spaces in text.
  • $(findstring target,text): find target in text, if found, return target otherwise, return empty string.
  • $(filter pattern...,text): filter the contents matching the pattern... from text.
  • $(filter-out pattern...,text): filter out the contents matching the pattern... from text.
  • $(sort list): sort the contents in list lexicographically. Note that sort will unique the contents, too.
  • $(word i,text): get the i-th word from text (index started from 1).
  • $(wordlist l,r,text): get the words whose index is in [l,r] in sequence.
  • $(firstword text): get the first word of text.

There are some functions related to files:

  • $(dir name...): get the directory part from name.
  • $(notdir name...): get the non-directory part from name.
  • $(suffix name...): get the suffixes of files from name.
  • $(basename name...): get the base name (files’ name without extension) of files from name.
  • $(addsuffix suffix,name): add suffix for name.
  • $(addprefix prefix,name): add prefix for name.
  • $(join list1,list2): join two lists. This will append words in list2 to list1. For example $(join aaa bbb,111 222 333) will get aaa111 bbb222 333.

If I want add a suffix for every item in a variable, I can do it with this:

names := a b c d
files := $(foreach item,$(names),$(item).o)

  • $(if condition,then-part,else-part): if the condition is a non-empty string, it will return the then-part, otherwise it will return the else-part.

Now if you hope to have a function which can reverse two parameters, you can do it through this:

reverse = $(2) $(1)
reversedItemList = $(call reverse,item1,item2)

After that, reversedItemList will be item2 item1. Of course, you can add different suffixes for items:

addTargetAndDependency = $(1).o : $(1).c
result = $(call addTargetAndDependency,main)

result will be main.o : main.c.


$(origin variablename) will tell you where the variablename comes from. The return values are explained below:

  • undefined: never defined before.
  • environment: the variable comes from environment variables.
  • default: the default variable, such CC.
  • file: the variable is defined in a Makefile.
  • command line: the variable is defined by command lines (when you type make a=1, a is defined by command lines).
  • override: the variable if defined by override.
  • automatic: the variable is defined by make automatically.

This is to execute a command in shell:

contents := $(shell cat foo)
files := $(shell echo *.c)

The commands above will get the output of a shell command.


  • $(erro text): output text and stop make.
  • $(warning text): output text but don’t stop make.

Return Value of Make

  • 0: success.
  • 1: some errors.
  • 2: when you use -q (-q will not run make, but give you a return value 0 if the targets are up to date) and make cannot make sure whether or not the files is up to date.

Specify Target

If you run make, make will build the first target. But you can specify target by make targetname.

There are some rules you should obey for naming a target when you write Makefile:

  • all: build all targets.
  • clean: remove all the files created by make.
  • install: install the targets, when in C/C++, this will move the binary files to /usr/bin and move the header files to /usr/include.
  • print: print the files having been updated.
  • tar: pack the source files into a tar file.
  • dist: create a compressed file including all source files.

Check Make Syntax

When you want to check your syntax in Makefile rather than run the make, you can use those options: -n, --just-print, --dry-run, --recon. The four options are synonyms.

Other Options in make

Option Description
-j Use the specified number of jobs (cores) to build
-q Check if the target exists
-W Build the targets that depend on the specified file
-o Ignore the specified file while building
-B Always re-build all targets, even if they are up to date
-C Change working directory to the specified directory
-t Touch all the targets
--debug Print debug info
-e Environment overrides
-f Use the specified file as Makefile
-i Ignore errors
-I Include directory
-k Keep going even if there are errors
-S Stop when an error occurs
-p Print the data base
-r Do not use built-in rules
-R Do not use built-in variables
-s Silent mode, do not print commands
-w Print the working directory before and after processing
--no-print-directory Do not print the working directory
--warn-undefined-variables Warn when a variable is undefined

--debug=options will print the debug info of make, the available options are:

  • a: print all info.
  • b: print basic info.
  • v: verbose.
  • i: print implicit rules.
  • j: print jobs’ info including PID, return value and so on.
  • m: this is for debugging when remaking makefiles.
  • If you just use -d, this is same with --debug=a.

You have learned a lot about make, why not to read the Makefile of Linux Kernel.

git

.gitignore

You can use .gitignore to specify files or directories that do not need to be tracked by git.

Most wildcards in .gitignore are similar to those in bash. Check Wildcards in Linux for more information about wildcards.

By default, the items in .gitignore will be ignored recursively. If you don’t want to ignore recursively, you can add / before the item to indicate that it only takes effect in the current directory. For example, /foo means to ignore only the foo file or directory in the current directory, while foo means to ignore all foo files or directories.

By default, the items in .gitignore match both directories and files. If you only want to match directories, you can add / at the end of the item to indicate that it only matches directories. For example, foo/ means to match only the directory foo, while foo means to match all foo files or directories.

You can configure the ignore rules in ~/.config/git/ignore to ignore files globally.

There can be a .gitignore file in any directory in a repository, and this file will take effect on files and directories in the current directory. The priority of .gitignore files is from subdirectory to parent directory, and the global ignore file has the lowest priority.

References




    Enjoy Reading This Article?

    Here are some more articles you might like to read next:

  • gcs-front-end Development
  • gcs Documentation
  • gcs-back-end Development
  • Q&A
  • Contributing to blink-cmp-git