Instructions

The traditional way of using a Unix or Linux system is the command line. That is what you will be presented with when you log in to a computer via SSH (Secure Shell), for example. For skilled users, using the command line is generally very efficient compared to most GUIs. A basic level of proficiency with the command line can be considered a necessary part of general IT competency. This guide is meant to be a basic shell tutorial and explains commands and key concepts. It does not cover advanced topics such as shell scripting or administration.

Shell

The program that handles user input and interacts with programs and applications is called the shell. It is a command line interpreter that executes programs, expands wildcards and provides variables, condition and loop statements and process management. There are many different shells that, even though they may differ in syntax or appearance, provide the same core functionality.

There are two pedigrees of shells, Bourne-type shells and C-shell derivatives. They differ in syntax and some features, but the functionality provided is very similar. This guide assumes for the most part a Bourne-type shell, but the C-shell syntax is also presented where necessary.

At Aalto, the default shell for new users is bash (Bourne-Again SHell). Before 2018 the default shell was zsh (Z-shell), and older users will still have that if they have not changed their shell. The two are mostly compatible. An even older, historical default shell was tcsh (TENEX C-shell). Users can change their shell using the chsh (CHange SHell) command. Of these shells, bash and zsh are Bourne shell variants, and tcsh is a C-shell variant.

Prompt and commands

bash:

tteekkari@kosh:~$

zsh:

tteekkari@kosh ~ %

tcsh:

kosh:~>

The shell presents the user with a command prompt and then waits for user input. The prompt is customisable in most shells, but the default prompts in Aalto environment are these three.

The prompt relays information: by default, it shows the username, the computer hostname, the current directory and the end sign. In the Bourne prompts above, the username is "tteekkari", the hostname is "kosh", the current directory is "~" and the end sign is "$" or "%". The C-shell prompt does not display the username and uses ">" as the end sign. The end sign can also in some circumstances be "#", which is usually reserved for root shells.

Commands are typed after the prompt and executed using the Enter key.

A command typically consists of the program to be executed and its options and arguments.

The program part is the name of the program file to be executed. The options are special arguments that alter what the program does. The options are usually prepended with a dash ("-") or, in their longer form, with two dashes ("--"). The rest of the arguments are typically some kind of target, instruction or filename for the program to work on.

Consider this command:

tteekkari@kosh:~$ ls -l foo.txt

Here, after the prompt, ls is the program part, -l is an option to toggle long-style output, and foo.txt is a filename argument.

Keyboard shortcuts and command-line editing

The commands typed after the prompt form the command line.

Most shells provide keybindings and shortcuts and command line editing features. The keybindings are usually customisable by the user, so they are not set in stone, but the ones presented here work in the default Aalto environment.

The command line can, in most shells, be edited much like one would do in a text editor. The cursor can be moved using the left and right arrow keys and characters can be added by typing them in, or deleted using the Delete or Backspace keys. The Insert key toggles whether characters are overwritten or inserted at the cursor. There are also key combinations for different functions. Some of these key combinations are presented in the table below:

Key combination Action
Ctrl-C Terminate program
Ctrl-Z Stop program
Ctrl-L Clear screen
Ctrl-S Freeze screen
Ctrl-Q Unfreeze screen
Ctrl-A Go to the start of the line (Home)
Ctrl-E Go to the end of the line (End)
Ctrl-H Backspace
Ctrl-M Enter
Ctrl-P Previous command in history
Ctrl-N Next command in history
Ctrl-W Cut previous word
Ctrl-Y Paste, yank
Ctrl-D End-of-file, logout

The key combinations are entered by holding the modifier key (in the cases above, Ctrl) and then pressing the other key.

For information on how to modify the keybindings, please see your shell's man page.

Command history

Shells usually store the command history. The history is also usually written into a file in the user's home directory. The filename depends on the shell (and its configuration), but the default filenames are ~/.bash_history for bash, ~/.zsh_history for zsh and ~/.history for tcsh.

The command history can be reviewed with the command history. The previous commands can also be recalled by using the up and down arrow keys or Ctrl-P and Ctrl-N. In bash and zsh, the history can also be searched by using Ctrl-R. This presents the user with a separate prompt that searches the command history.

The command history can also be recalled using the exclamation mark ("!"). In history substition, the commands can be referenced either by their numbers, relative numbers ("the command I did X commands ago") or their command line.

Consider the following command history:

tteekkari@kosh:~$ history
 1000  history
 1001  echo foo
 1002  echo bar
 1003  history
tteekkari@kosh:~$

To recall a command by its relative number ('The command 3 commands ago'):

tteekkari@kosh:~$ !-3
echo foo
foo
tteekkari@kosh:~$

To recall a command by its command line:

tteekkari@kosh:~$ !ec
echo foo
foo
tteekkari@kosh:~$

To recall a command by its absolute number:

tteekkari@kosh:~$ !1002
echo bar
bar
tteekkari@kosh:~$

More complex substitutions are also possible:

tteekkari@kosh:~$ echo !1002 !1001
echo echo bar echo foo
echo bar echo foo
tteekkari@kosh:~$ 

The command history can be cleared with the command history -c.

Commands: builtins, functions and executables

A shell can run three types of things: shell builtins, functions and regular executables. Most commands are implemented as separate executables (as per the Unix philosophy), but the shell usually has some of the functionality built in. This includes program flow statements used for scripting, the cd (change directory) command and the kill command.

Shells also support functions that are essentially shell scripts (or grouped commands) that can be executed using the function name. Functions can be made by the user, but writing them is not in the scope of this guide.

Anything not implemented as a builtin or a function is a separate executable, which is located somewhere in the file system.

When a command is executed by the shell, it first checks if the command is a builtin or a function. If the former, the builtin will run. If not, the shell will check the paths given in the environment variable PATH in order, and will run the first appropriately named executable it finds. If no executable is found this way, the shell will print an error message.

The type of command (whether it is a builtin or not) can be checked by using type (bash and zsh) or where (tcsh).

To check whether a command is a builtin or not (bash, zsh):

tteekkari@kosh:~$ type -a kill
kill is a shell builtin
kill is /bin/kill
tteekkari@kosh:~$

And same in tcsh:

kosh:~> where kill
kill is a shell built-in
/bin/kill
kosh:~> 

Here we can see that kill is implemented both as a builtin and as a separate executable. If we try to run these, we see that they are different:

tteekkari@kosh:~$ kill
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
tteekkari@kosh:~$ /bin/kill

Usage:
 kill [options] <pid> [...]

Options:
 <pid> [...]            send signal to every <pid> listed
 -<signal>, -s, --signal <signal>
                        specify the <signal> to be sent
 -l, --list=[<signal>]  list all signal names, or convert one to a name
 -L, --table            list all signal names in a nice table

 -h, --help     display this help and exit
 -V, --version  output version information and exit

For more details see kill(1).
tteekkari@kosh:~$ 

A separate command that finds where an executable would be run is which. However, as it is a separate executable, it does not see shell builtins.

Using which:

tteekkari@kosh:~$ which kill
/bin/kill
tteekkari@kosh:~$

Pipes, streams and redirection

When the shell runs programs, every process has by default three standard streams it can use to communicate with the user or with other programs. These are called standard input, standard output and standard error and are also referred to by their shorter name and/or their file descriptor numbers stdin (0), stdout (1) and stderr (2). By default, standard output and standard error are printed on the terminal and stdin is read from the terminal. The shell can be used to redirect these streams so that, for example, the standard output of one program is used as the standard input of another program. This is called piping. The streams can also be directed to and/or read from files. This is called redirection.

Redirection into or from a file

A program's output can be redirected into a file using a greater-than sign (>). The file given after the sign is overwritten and the output of the program will be redirected into it.

Redirecting output into a file:

tteekkari@kosh:~$ echo "foo" > foo.txt
tteekkari@kosh:~$ cat foo.txt
foo
tteekkari@kosh:~$ echo "bar" > foo.txt
tteekkari@kosh:~$ cat foo.txt
bar
tteekkari@kosh:~$

The cat (conCATenate) command used above prints the contents of the files given as arguments, one after another. When it is used with only one argument, it just prints the contents of the single file.

By using two greater than -signs (>>), it is possible to append to a file instead of overwriting it.

Appending into a file:

tteekkari@kosh:~$ echo "foo" > foo.txt
tteekkari@kosh:~$ cat foo.txt
foo
tteekkari@kosh:~$ echo "bar" >> foo.txt
tteekkari@kosh:~$ cat foo.txt
foo
bar
tteekkari@kosh:~$

When used on its own, this kind of redirection only redirects the standard output stream. It is also possible to redirect the standard error stream. To do that, the stream number must be used along with the greater than -sign. It can often be beneficial to suppress error messages. To accomplish that, the standard error stream can be redirected to /dev/null which is a special device file that discards anything written into it.

Separating the standard output and the standard error:

tteekkari@kosh:~$ curl http://www.aalto.fi > foo2.txt 2> foo3.txt
tteekkari@kosh:~$ cat foo2.txt
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://www.aalto.fi/fi/">here</a>.</p>
<hr>
<address>Apache/2.4.7 (Ubuntu) Server at www.aalto.fi Port 80</address>
</body></html>
tteekkari@kosh:~$ cat foo3.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   308  100   308    0     0   1036      0 --:--:-- --:--:-- --:--:--  1037
tteekkari@kosh:~$

The curl command used above downloads an URL (Uniform Resource Locator, a 'web address'). It prints the contents of the URL on the standard output and a status display on the standard error.

It is also possible to combine the standard output and standard error streams. The syntax for combining the streams differs a bit between the shell flavours.

Combining stdout and stderr, bash and zsh:

tteekkari@kosh:~$ curl http://www.aalto.fi >foo2.txt 2>&1
tteekkari@kosh:~$ cat foo2.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   308  100   308    0     0    955      0 --:--:-- --:--:-- --:--:--   953
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://www.aalto.fi/fi/">here</a>.</p>
<hr>
<address>Apache/2.4.7 (Ubuntu) Server at www.aalto.fi Port 80</address>
</body></html>

and tcsh:

kosh:~> curl http://www.aalto.fi >&foo2.txt
kosh:~> cat foo2.txt 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   308  100   308    0     0   1789      0 --:--:-- --:--:-- --:--:--  1790
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://www.aalto.fi/fi/">here</a>.</p>
<hr>
<address>Apache/2.4.7 (Ubuntu) Server at www.aalto.fi Port 80</address>
</body></html>

Standard input for a program can also be read from a file. This can be done by using the less-than sign (<) followed by a filename. The input for the program is then read from the file given.

tteekkari@kosh:~$ cat foo.txt
foo
tteekkari@kosh:~$ tr a-z A-Z <foo.txt
FOO
tteekkari@kosh:~$

In this example the tr (TRanslate) command translates the character set given in the first argument into the character set given in the second argument, which in this case means it changes small letters into capital letters.

Here documents

A here document is a special kind of redirection in that the text to be redirected is literally typed 'here'. Here documents can be used for example in scripts to include multiple lines of text in the script without having to put it in a separate file. The syntax for a here document is two less-than signs followed by a delimiter. Any text after this is treated as standard input for the command until a line containing only the delimiter is found.

An example of a here document:

tteekkari@kosh:~$ tr a-z A-Z << EOF
> don't
> panic
> EOF
DON'T
PANIC
tteekkari@kosh:~$

Piping commands

As mentioned earlier, the shell can be used to redirect the output of one program to be used as input of another program. This can be done with the pipe character (|). As per the Unix philosophy, programs usually read and print plain text, which means that using programs to modify other programs' output is a pretty common thing to do. Perhaps the most useful tool that depends on this functionality is grep, which searches its input for a pattern given as an argument and prints the matching lines.

Piping output:

tteekkari@kosh:~$ history
 1000  history
 1001  echo foo
 1002  echo bar
 1003  history
tteekkari@kosh:~$ history | grep foo
 1001  echo foo
 1004  history|grep foo
tteekkari@kosh:~$

Here the output of history is piped to the grep command which prints the lines matching the pattern foo.

The tr example in the redirection section could also be made using a pipe:

tteekkari@kosh:~$ cat foo.txt
foo
tteekkari@kosh:~$ cat foo.txt | tr a-z A-Z
FOO
tteekkari@kosh:~$

Environment variables

Every process has an environment which consists of environment variables. The environment of a process is inherited by its subprocesses. The shell can set, modify and unset environment variables, and they can be temporary or permanent. The variables usually affect the shell and/or program(s) and they are one way of configuring the shell or an application.

In bash and zsh, variables can be set using the syntax VARIABLE=value and in tcsh by using setenv VARIABLE value. In bash and zsh, a variable also has to be exported, which is done using the export command. If the variable is not exported, it remains an internal variable of the shell and is not inherited by subprocesses (programs executed by the shell).

Setting an environment variable, bash and zsh:

tteekkari@kosh:~$ VISUAL=nano
tteekkari@kosh:~$ export VISUAL
tteekkari@kosh:~$

or directly

tteekkari@kosh:~$ export VISUAL=nano
tteekkari@kosh:~$

Setting an environment variable, tcsh:

kosh:~> setenv VISUAL nano

The environment variables can then be referenced in the command line as $VARIABLE or ${VARIABLE}. The entire environment can be printed using env or printenv. The environment of a running process can be viewed by looking at the file /proc/<pid>/environ. The value of a single environment variable can be checked using either printenv or just echo.

Printing the value of an environment variable:

tteekkari@kosh:~$ export VISUAL=nano
tteekkari@kosh:~$ printenv VISUAL
nano
tteekkari@kosh:~$ echo ${VISUAL}
nano
tteekkari@kosh:~$

Some useful environment variables:

Variable Meaning
HOME Path to the user's home directory
PATH A list of paths the shell will search for executables
DISPLAY Tells any graphical applications the address of the windowing server
VISUAL, EDITOR Determines the editor used when a script or a program starts an editor
LD_LIBRARY_PATH A list of paths containing shared libraries used by programs
PS1 Used to set the prompt string (*bash*, *zsh*)
LANG, LC_* Variables used to set the locale
TERM The terminal (emulator) type

Wildcards

The shell can complete partial file names using wildcards. The wildcards can be used as a shorthand to typing multiple file names or work in situations where the exact file name is not known or cannot even be determined. The most common wildcard characters are asterisk (*) and question mark (?). The asterisk replaces any string (even an empty one) and the question mark replaces any single character.

Consider the following list of files:

tteekkari@kosh:~$ ls
a  ab  abcd  b  c  cd  d
tteekkari@kosh:~$

Now the wildcards can be used as follows:

tteekkari@kosh:~$ ls a*
a  ab  abcd
tteekkari@kosh:~$ ls a?
ab
tteekkari@kosh:~$ ls ??
ab  cd
tteekkari@kosh:~$ ls *d
abcd  cd  d
tteekkari@kosh:~$ ls a*?d
abcd

In the first example a* completes as a, ab and abcd. The asterisk replaces an empty string in a, 'b' in b and 'bcd' in abcd. In the second example a? only hits ab since the question mark only replaces exactly one character. In the third example ?? hits ab and cd as they are the only file names with exactly two characters. In the third example *d hits abcd, cd and d, much like in the first example. The wildcards can also be used in the middle of a filename, or multiple times, as the final example illustrates. There, the a*?d hits abcd and the asterisk hits b and the question mark hits c.

Note that the wildcards are expanded by the shell, so in these examples ls might get one or multiple arguments, depending on how many files the wildcards apply to.

Aliases

Shells usually allow aliases, which can be used to replace text in the beginning of commands. An alias could for example be used to always set certain options for certain commands.

The syntax for setting and removing aliases is different in bash and tcsh. After they have been set, they can be used just like normal commands.

Setting, using and removing an alias, bash and zsh:

tteekkari@kosh:~$ ls
a  b
tteekkari@kosh:~$ alias rm='rm -i'
tteekkari@kosh:~$ rm a
rm: remove regular empty file 'a'? y
tteekkari@kosh:~$ unalias rm
tteekkari@kosh:~$ rm b
tteekkari@kosh:~$

Setting, using and removing an alias, tcsh:

kosh:~> ls
a  b
kosh:~> alias rm 'rm -i'
kosh:~> rm a
rm: remove regular empty file 'a'? y
kosh:~> unalias rm
kosh:~> rm b
kosh:~>

Here the command rm (ReMove) is used to delete files. With the option -i the rm command asks for extra confirmation for each argument — without the -i the file is just removed silently. This is a somewhat common alias to set on an administrator (root) account where an accidental file deletion can cause major damage to the operating system.

When using aliases, any arguments to the command are just appended to any the alias might already have. If an argument is added somewhere in the middle of the command, the alias will not work and a shell function or a script would have to be used instead.

Arguments and escaping a string

As stated earlier, commands are often given one or more arguments or options. The command receives the arguments as a list consisting of all the arguments of the command line split at spaces. This can in certain cases behave counterintuitively.

Specifically, if an argument contains a space, it needs to be escaped. The same applies for wildcard characters if they are to be used verbatim and for some other characters that have a special meaning for the shell interpreter. Strings can be escaped in three ways. One of the ways involves escaping every special character separately by prepending it with a backslash () character. A backslash can itself be escaped (\\). The other two ways are to enclose the string in quotation marks, either using double quotes (") or single quotes ('), which behave differently.

When using single quotes, the string can contain any characters except the single quotes themselves, and everything is passed verbatim without expanding any variables, etc.

If the string is enclosed in double quotes, the special characters $ (dollar sign), \ (backslash) and often ! (exclamation mark) have special meanings. The backslash can be used to escape characters, the exclamation mark to reference command history and the dollar sign to reference variables.

Consider the following echo statements:

tteekkari@kosh:~$ echo Hello World
Hello World
tteekkari@kosh:~$ echo Hello      World
Hello World
tteekkari@kosh:~$ echo "Hello World"
Hello World
tteekkari@kosh:~$ echo "Hello      World"
Hello      World
tteekkari@kosh:~$ echo Hello\ World
Hello World
tteekkari@kosh:~$ echo Hello\ \ \ \ \ \ World
Hello      World

Without quotes or any escaping, echo receives two arguments, Hello and World no matter how many spaces there are in between; it then prints the arguments separated by a single space. When quotes are used, echo receives one argument, which is the string Hello World in the first case and Hello World in the second. Similarly, backslashes can be used to escape the spaces in which case echo only receives one argument.

The difference between single and double quotes:

tteekkari@kosh:~$ echo ${LANG}
en_US.UTF-8
tteekkari@kosh:~$ echo *
file1 file2 file3
tteekkari@kosh:~$ echo \*
*
tteekkari@kosh:~$ echo '${LANG}'
${LANG}
tteekkari@kosh:~$ echo '\${LANG}'
\${LANG}
tteekkari@kosh:~$ echo '"${LANG}"'
"${LANG}"
tteekkari@kosh:~$ echo "*"
*
tteekkari@kosh:~$ echo "${LANG}"
en_US.UTF-8
tteekkari@kosh:~$ echo "\${LANG}"
${LANG}
tteekkari@kosh:~$ echo "\\\${LANG}"
\${LANG}
tteekkari@kosh:~$ echo "\"${LANG}\""
"en_US.UTF-8"
tteekkari@kosh:~$ echo "\"\${LANG}\""
"${LANG}"
tteekkari@kosh:~$ echo '*'
*

When a string is enclosed in double quotes, any environment variables are substituted and the backslash can be used to escape characters. Wildcards are not substituted. When using single quotes, all characters are treated literally and no substitutions or escapes are done. This also means that when using single quotes, the single quote character can not be present in the string.

Processes

A process is a running program. As a concept, a process comprises of the program code, memory areas used by the program, the call stack (which includes the function calls, memory and parameters), operating system resources (file descriptors, environment, etc), process ownership information, process permissions and the processor state, also known as the context. In short, a process contains all the run-time information on a running program.

As Linux is a multitasking operating system, there are always multiple processes running. From the point of view of the operating system, a process can be in one of three states. A process is either running, waiting or blocked. These states are directly related to whether the process is being run on the CPU at any given time. The transitions between states are controlled by the operating system's scheduler.

Processes can be listed with the command ps.

tteekkari@kosh:~$ ps
  PID TTY          TIME CMD
22521 pts/129  00:00:00 bash
22629 pts/129  00:00:00 ps
tteekkari@kosh:~$ ps ux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
tteekka+ 22500  0.7  0.0  78268  9300 ?        Ss   20:02   0:00 /lib/systemd/sy
tteekka+ 22501  0.0  0.0 300284  8436 ?        S    20:02   0:00 (sd-pam)
tteekka+ 22520  0.0  0.0 142880  3044 ?        R    20:02   0:00 sshd: tteekkari
tteekka+ 22521  1.0  0.0  28568  5640 pts/129  Ss   20:02   0:00 -bash
tteekka+ 22578  0.0  0.0  24376  2496 ?        Ss   20:02   0:00 krenew -b -K 60
tteekka+ 22632  0.0  0.0  43172  3604 pts/129  R+   20:02   0:00 ps ux
tteekkari@kosh:~$ 

The columns in the listing are as follows:

USER The username that owns the process
PID Process ID number
%CPU, %MEM Percentage of CPU time and memory used by the process
VSZ Virtual memory size in kB
RSS Resident set size, non-swapped physical memory used in kB
TTY The terminal device assigned for the process
STAT Status of the process (S = sleeping, R = running, D = blocked)
START, TIME When the process was started and how much CPU time has it used
COMMAND Command line of the process

There are also a few other statuses for the STAT field and there are a lot more fields one can choose from. The best resource for information on those is the ps(1) man page.

For a more continuous watching of processes there are the process monitors top and htop. Of these, htop is newer and more colourful, but the older top is also perfectly workable.

The output of top: top

The output of top includes some general information on the computer load, uptime and memory usage. In addition to that, there is a periodically updated process listing that contains pretty much the same fields seen in ps output. The new fields are the PR and NI fields, which are the PRiority of the process and a NIce value. Processes with a higher nice value yield to other processes and it is encouraged to nice any long-term CPU-intensive processes to take other users into account.

Processes can be started nice'd with the command nice. The nice value of a running process can be altered with the command renice. Users can only renice their own processes.

Background and foreground processes, job control

A shell can run multiple processes at once. There can be one foreground process, which receives the keyboard input from the terminal. The rest of the processes are suspended or in the background. Background processes print to the terminal, but receive no keyboard input.

A shell groups the processes running in it into jobs. Running jobs can be listed using the command jobs. The job numbers are not visible outside the single shell session and will only work in the same shell session.

A process can be started in the background by following its command line with the ampersand symbol (&).

When a process is started to run in the background, its job number and PID (process id) are printed and the process is started. Jobs can be brought to foreground using the command fg (foreground) and a stopped process can be sent to background using the bg command (background).

A foreground process can be stopped by pressing Ctrl-Z. The execution will be halted and the shell is resumed. The execution can then be resumed by either sending the process to the background using bg or by bringing it back to front using fg. The command fg can also be used to bring a process straight to the foreground from the background.

tteekkari@kosh:~$ firefox &
[1] 31087
tteekkari@kosh:~$ jobs
[1]+  Running                 firefox &
tteekkari@kosh:~$ fg
firefox
^Z
[1]+  Stopped                 firefox
tteekkari@kosh:~$ jobs
[1]+  Stopped                 firefox
tteekkari@kosh:~$ bg
[1]+ firefox &
tteekkari@kosh:~$ fg
firefox
^C
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.

tteekkari@kosh:~$ 

Here Firefox is first started in background using the ampersand. It gets the job number 1 and PID 31087. Then when jobs are listed using jobs, the running Firefox process is listed. The process is then brought to foreground using fg and then stopped using Ctrl-Z. Stopping the job will print the job number, its status (Stopped) and the process command line. The jobs listing will reflect this also. The process is then resumed in the background using bg and then brought to foreground using fg and finally terminated using Ctrl-C.

The commands bg and fg as well as the shell builtin version of kill can also take the job number as an argument. The job number can be given in the form %n where n is the job number. The Firefox process in the previous example could thus have been killed using kill %1.

Signals

Processes can be controlled using signals. Signals are operating system messages that programs must handle when received. If the program doesn't have a custom signal handler, a default action is taken. For most signals, the default action is to terminate the process. Signals can also be sent by users with the command kill even though most signals originate from the operating system.

The kill command is usually a shell builtin so that the shell does not need to spawn a new process for sending the signal. This can be very beneficial in a situation where the operating system process table is full and new processes cannot be spawned. It is also available as a separate executable that can be called directly as /bin/kill. The separate executable can handily list the available signals.

tteekkari@kosh:~$ /bin/kill --table
 1 HUP      2 INT      3 QUIT     4 ILL      5 TRAP     6 ABRT     7 BUS
 8 FPE      9 KILL    10 USR1    11 SEGV    12 USR2    13 PIPE    14 ALRM
15 TERM    16 STKFLT  17 CHLD    18 CONT    19 STOP    20 TSTP    21 TTIN
22 TTOU    23 URG     24 XCPU    25 XFSZ    26 VTALRM  27 PROF    28 WINCH
29 POLL    30 PWR     31 SYS     
tteekkari@kosh:~$ 

There are 31 different signals and they are numbered. The signals are usually referred to with their names prepended with "SIG". Some of the more useful signals to a normal user are SIGHUP, SIGINT, SIGKILL, SIGUSR1, SIGUSR2, SIGTERM, SIGCONT, SIGSTOP and SIGTSTP. All the signals are listed and described on the signal(7) man page.

With this information, we can understand some of the keyboard shortcuts better. For example, when Ctrl-C is pressed, the foreground process receives the signal SIGINT (2, Interrupt from keyboard), which by default ends the program. Ctrl-Z sends the foreground process the signal SIGTSTP (20, Stop typed at terminal), which will by default stop the program. Then when the process is fg'd or bg'd, it receives the signal SIGCONT (Continue if stopped), and continues execution.

A program can process signals using a signal handler. Programs can also block or ignore signals if the programmer so chooses. There are, however, two signals that cannot be handled in a program. They are SIGKILL (9) and SIGSTOP. SIGKILL always kills the program right away (this should usually be used as a last resort because programs do not get a chance to exit in a controlled manner and might skip cleanup steps, for example). SIGSTOP stops the program execution.

Sometimes if the process is waiting for some OS resource or system call (in uninterruptible sleep or blocking), it will not be terminated even by SIGKILL until it gets that resource. Unfortunately, there is no magic way of fixing these situations; sometimes if a process is in a really bad state, the only thing that helps is a reboot.

The most common use for signals is killing processes. This is usually accomplished by sending a process a SIGTERM signal, which is the default signal kill will send if no signal is specified. The process can be specified either by its PID or its job number if it is running in the same shell.

Killing processes:

tteekkari@kosh:~$ cat &
[1] 18779
tteekkari@kosh:~$ kill %1
tteekkari@kosh:~$ 
[1]+  Terminated              cat

Killing from another terminal:

tteekkari@kosh:~$ kill <pid>
tteekkari@kosh:~$

In the terminal with the process being killed, this will look like:

tteekkari@kosh:~$ cat
Terminated
tteekkari@kosh:~$

To send other signals, the signal can be specified by using either the name or the number, thus

$ kill -9 <pid>
$ kill -KILL <pid>

are equivalent.

If -1 is used for the PID (or one of the PIDs, as kill also works for a list of multiple PIDs), kill will send the signal to all of the processes it is allowed. For a normal user, this means all the processes running as the user.

Processes and the /proc file system

In Linux, processes can be examined via the /proc file system. This is also how the tools described earlier (ps, top, etc) actually work. It is a pseudo file system, generated 'on the fly' by the operating system kernel, and the files are thus not stored anywhere.

Every process has a directory under /proc. The name of the directory is the process's PID.

tteekkari@kosh:~$ touch foo.txt
tteekkari@kosh:~$ tail -f foo.txt &
[1] 5866
tteekkari@kosh:~$ cd /proc/5866
tteekkari@kosh:/proc/5866$ ls
attr             exe        mounts         projid_map    status
autogroup        fd         mountstats     root          syscall
auxv             fdinfo     net            sched         task
cgroup           gid_map    ns             schedstat     timers
clear_refs       io         numa_maps      sessionid     timerslack_ns
cmdline          limits     oom_adj        setgroups     uid_map
comm             loginuid   oom_score      smaps         wchan
coredump_filter  map_files  oom_score_adj  smaps_rollup
cpuset           maps       pagemap        stack
cwd              mem        patch_state    stat
environ          mountinfo  personality    statm
tteekkari@kosh:/proc/5866$ ls -go fd/
total 0
lrwx------ 1 64 Dec  5 11:16 0 -> /dev/pts/304
lrwx------ 1 64 Dec  5 11:16 1 -> /dev/pts/304
lrwx------ 1 64 Dec  5 11:16 2 -> /dev/pts/304
lr-x------ 1 64 Dec  5 11:16 3 -> /m/home/home6/62/tteekkari/unix/foo.txt
tteekkari@kosh:/proc/5866$ cat cmdline 
tail-ffoo.txttteekkari@kosh:/proc/5866$ kill %1

The /proc entries contain a lot of information, but some of the more useful files are: the cmdline, which shows the command line for the process; the directory fd, which lists the open file descriptors (just think files) of the process; and the environ file, which lists the environment for the process. The cwd symlink points to the process's CWD (current working directory). Even though a user can see most of the information for their own processes, not all of it is available for processes owned by other users. Access to the files and directories is controlled by regular file permissions.

Manual pages

Most of the shell commands come with a corresponding manual page. Manual pages can be viewed with the man command. The syntax is man [category] page_name. For example the manual page of man could be viewed with the command man man. In texts and on the pages themselves, pages are referenced to as page_name(#) in which page_name is the name of the page and the number is the category the page is in.

There are seven sections of manual pages which date back to the original Unix Programmer's Manual from 1971. The sections can be relevant when there is a page by the same name in multiple sections. An example of such a page would be for example printf, which is an executable in /usr/bin/printf as well as a C function. man printf would open the page printf(1), which describes the shell command. man 3 printf would describe the C function.

Man page sections:

Number Description Examples
1 Commands pwd, ls, tr
2 System calls fork, socket, chdir
3 Subroutines scanf, sin, strtok
4 Special files null, mem, rtc
5 File formats hosts, shadow, fstab
6 Games nethack
7 Miscellaneous signal, capabilities, cgroups

Man pages can be searched using the command apropos or man -k. They both do the same thing and print out all the names of the man pages on which the term searched for was found. This is a good way of finding a command you can't remember: searching the man pages for what the command does.

For example, to find all the man pages that deal with permissions:

tteekkari@kosh:~$ apropos permissions
access (2)           - check user's permissions for a file
chmod (2)            - change permissions of a file
dh_fixperms (1)      - fix permissions of files in package build directories
dh_testroot (1)      - ensure that a package is built with necessary level of...
eaccess (3)          - check effective user's permissions for a file
euidaccess (3)       - check effective user's permissions for a file
faccessat (2)        - check user's permissions for a file
faked (1)            - daemon that remembers fake ownership/permissions of fi...
faked-sysv (1)       - daemon that remembers fake ownership/permissions of fi...
faked-tcp (1)        - daemon that remembers fake ownership/permissions of fi...
fchmod (2)           - change permissions of a file
fchmodat (2)         - change permissions of a file
ioperm (2)           - set port input/output permissions
WWW::RobotRules (3pm) - database of robots.txt-derived permissions
XF86VidModeGetPermissions (3) - Extension library for the XFree86-VidMode X e...
tteekkari@kosh:~$

As many pieces of software introduce changes between versions, or there might be several (mostly) interchangeable flavours of any given program, it is usually a good idea to consult the local manual page instead of searching for one on the WWW. The locally installed manual pages come with the software and correspond to the version and flavour that is actually present on the computer.

Useful commands

This is a partial (but long) list of useful commands grouped by categories.

HELP!
man (manual) Command manual pages, 'man <command>'
info Slightly longer manual pages for some programs, 'info <command>'
apropos Search for a string in manual pages
Handling files
ls (list) Lists directory contents or individual files
cd (change directory) Changes the current working directory
pwd (print working directory) Prints the current working directory
ln (link) Creates a link
cat (concatenate) Prints contents of files
more Prints file contents paginated
less Less is more. Improved version of more
mkdir (make directory) Creates a directory
cp (copy) Copies files
mv (move) Moves and renames files
rm (remove) Deletes files
rmdir (remove directory) Deletes directories
chmod (change mode) Changes files' permission bits
chown (change owner) Changes files' owner and group
chgrp (change group) Changes files' group
dd Swiss army knife
scp (secure shell copy) Copies files over SSH connection
rsync Copies files and directory trees locally and/or between computers
Tools
wc (word count) Counts characters, words and/or lines
cut Selects parts of lines
paste Combines lines from multiple files
head Prints the beginning of input/files
tail Prints the end of input/files
fmt (format) Formats input for different text widths
grep Prints matching lines
tr (translate) Translates characters or deletes them
sort Sorts lines
uniq (unique lines) Reports or omits repeated lines
diff (differences) Finds differences between files
sed (stream editor) A Swiss Army Knife
tee Writes input to both files and standard output
od (octal dump) Dumps files in octal and other formats
xxd Dumps input in hexadecimal or reverses dumps
bc (basic calculator) Calculator/mathematical programming language
Identity, information
logname, whoami (login name) Prints username
id Prints user and group IDs
groups Prints the groups user is in
hostname Prints the computer hostname
uname (unix name) Displays information about the computer, OS and OS version
getent (get entries) Queries local databases
lsb_release (Linux Standard Base release) Displays information about the Linux distribution
dmesg (driver message) Prints the kernel ring buffer
sysctl Displays or changes settings on the running kernel
Archiving files, file compression
ar (archive) Handles ar archives used for example in Debian packages
tar (tape archiver) Archives multiple files into one
gzip, gunzip Compresses or expands gz archives
bzip2, bunzip2 Compresses or expands bz2 archives
xz, unxz Compresses or expands xz archives
zip, unzip Compresses or expands zip archives
Network tools
ip Displays/changes routing and network interface settings
ifconfig Old tool for displaying/changing network interface settings
route Old tool for displaying/changing routes
arp Displays or manipulates the kernel ARP cache
ping An ICMP diagnostic tool
traceroute An ICMP diagnostic tool
nc (netcat) Creates TCP and UDP connections and listens
File systems and disk usage
df (disk free) Prints the amount of free space on file systems
du (disk usage) Estimates file space usage
quota Displays quotas
Scripting and programming languages
awk (Aho, Weinberger and Kernighan) The AWK programming language
perl (Practical Extraction and Reporting Language) The Perl programming language
python The Python programming language

File system

Everything is a file.

or, as Linus Torvalds put it,

Everything is a file descriptor or process.

The 'Everything is a file' paradigm is one of the key concepts of Unix (and Linux). Almost everything is presented as files in directories and the file system offers a common namespace for all system resources. This means that everything can be manipulated using the same set of tools created for manipulating files.

A file is a discretely stored collection of data or records. Files are arranged into directories, which are special files that contain a list of other files. These listed files are located in the directory. All directories except for the root directory are also contained in another directory.

In Unix and Unix-based operating systems (or, more accurately, most Unix/Linux file systems) filenames are usually case sensitive. The only forbidden characters in filenames are the forward slash (/), which is used as the directory separator and the null character (\0), which is used as a string separator. The maximum length of a filename is 255 characters.

In addition to these limitations, the special filenames . and .., denoted by a dot (.) and two dots (..) are reserved and refer to the current and upper-level directory, respectively.

As a special case, files with a name beginning with a dot (.) are hidden from file listings and are considered hidden files.

There are several types of files in addition to regular files and directories, for example, device files and symbolic links. There are several other types as well, but they are encountered a bit more rarely in normal interactive use.

Structure

All files are arranged in a single tree-like structure which is called the root file system. The tree can contain many separate file systems, but they are all presented as part of this tree. Adding a file system to the tree is called mounting, and the directory that hosts the root directory of the mounted file system is called the mount point. By default, only the root user can mount file systems but normal users can still use the mount command to list the mounted file systems.

At the base of the file system tree is the root directory /. All other files and file systems are located in subdirectories of the root directory. The structure of the file system is pretty well-established, and much of it is standardised as FHS (File system Hierarchy Standard), which can be found at https://refspecs.linuxfoundation.org/fhs.shtml.

Here are some of the top-level directories and a brief description:

Directory Description
/bin The essential user command binaries
/boot Boot loader files
/dev The device tree
/etc System configuration files
/home User home directories (Note: Not at Aalto)
/lib Essential shared libraries and kernel modules
/media Mount point for removable media
/mnt Mount point for temporary file systems
/opt Additional application software
/proc Processes exposed as files
/root Home directory of the root user
/run Run-time data
/sbin Essential system binaries
/tmp Temporary files
/usr Non-essential user and system binaries and applications
/var Variable data files

Absolute and relative path

Files are referred to using their path. Every process has a working directory, which is usually the directory the process was started in. Files can be referred to either in relation to this (relative path) or using the absolute path beginning from the root directory.

Consider the following directory structure:

/
├── a
│   └── x.txt
└── b
    ├── c
    │   └── y.txt
    └── z.txt

If the current working directory is b, b can be referred to using . and the upper level directory, / using ... Now to refer to y.txt one can use either c/y.txt — the relative path — or /b/c/y.txt, the absolute path. To refer to x.txt, one can use either ../a/x.txt (relative path) or /a/x.txt (absolute path). A good thing to know is that the upper level directory of the root directory is the root directory itself, so also ../../../../a/x.txt would work just as well here. The file z.txt can be referred to as z.txt or ./z.txt (relative path) or /b/z.txt (absolute path).

For file names containing odd characters and how to escape them, see Arguments and escaping a string.

Home directory and working directory

Every process, including the shell has a current working directory (CWD). For processes this is usually the directory where the process was started.

As the example in Absolute and relative path shows, when referring to files using a relative path, the path always starts at the current working directory.

The current working directory can be printed using the command pwd.

tteekkari@kosh:~$ pwd
/u/62/tteekkari/unix
tteekkari@kosh:~$

Another notable directory is the user's home directory. The home directory can be referred to using the tilde sign (~). The home directory is a directory where the user has write permission and in which private data can be stored.

Moving around in the file system, interacting with files

The current working directory can be changed using the cd command. The directory to be changed to is given as an argument. When used without arguments, cd will change to home directory.

Utilities such as ls (list files) default to the current working directory when run without arguments. It is also useful to be able to refer to files using just their filename instead of the full (or partial) path.

Consider the directory structure presented earlier:

/
├── a
│   └── x.txt
└── b
    ├── c
    │   └── y.txt
    └── z.txt

The current working directory is b.

$ pwd
/b
$ ls
c/   z.txt
$ cd /a
$ ls
x.txt
$ cd ..
$ pwd
/
$ cd b/c
$ pwd
/b/c
$ ls
y.txt
$ rm y.txt

First, the current directory is /b. ls without arguments lists the files in the current directory and now shows that it contains the directory c/ and z.txt.

Then we change to the directory /a using the absolute path. Here, ls shows the file x.txt. Then we change to the root directory using the relative path ... Then we change to the directory /b/c using relative path b/c. Here, ls shows the file y.txt, which is then removed using the relative path.

We end up with the following file tree, where the file y.txt has been removed:

/
├── a
│   └── x.txt
└── b
    ├── c
    └── z.txt

Listing files and properties of a file

The contents of a directory can be listed using the command ls like in the previous example. Without arguments, ls gives a listing of files in the current directory. When one or more file or directory names are given as arguments, only they will be listed. There are multiple options that modify the output, one of the more common ones being -l (long list).

The output of ls -l and its fields: ls -l

The first character on each row denotes the type of the file: - for a regular file, d for a directory and l for a symbolic link. There are also other types, such as p for a named pipe, c for a character device, b for a block device, etc., but these more specialised types will not be covered here. The full list of types can be found on the 'ls' info page.

The next nine characters are the file permissions. They are listed in three groups of three different permissions. The permissions are covered in more detail in File permissions.

The second field shows the number of links referring to the file. For a directory, this is in practice the number of subdirectories the directory contains. For any directory that is at least two, since every directory contains the special directories . and ... For a normal file, the number is the number of hard links to the file, which is in practice how many times the contents of the file are referred to in the file system. The internal structure of the file system and hard links will not be covered here though.

The third field shows the username of the file's owner and the fourth the group of the file. Every file is owned by a user and a group, each of which can be given different permissions to the file. The permissions are, again, covered in more detail elsewhere (see File permissions).

The fifth field shows the file size in bytes. It is also possible to use the option -h to display the size in human-readable format.

The sixth field shows the modification time of the file. This time is updated whenever the file is changed.

The last field shows the file name. For symbolic links, ls -l also shows where the link is pointing.

The command ls also has many other options, which can be used, for example, to sort the files in various ways. There is one more important option, -a (all files), which lists all files, including the ones that have a file name beginning with a dot.

Listing 'hidden' files:

tteekkari@kosh:~/files$ ls
file1  file2
tteekkari@kosh:~/files$ ls -a
.  ..  .dotfile1  file1  file2
tteekkari@kosh:~/files$

All the options for ls can be found on its man page.

Creating and editing files

Files can be created with many programs and applications, or via the shell by piping output into a file. An empty file can be created using the touch command, but it is often more meaningful to use a text editor, for example.

An editor is a program that (interactively) modifies text files. The most popular editors are likely still emacs and vi or the vi-based vim, even though there are some rather popular graphical ones as well. Both of these are very versatile, extensible and extremely efficient in the hands of a skilled user. Both also have their strong advocates, and argumentation for one's preferred editor has escalated to almost religious proportions in the past. Here it probably makes more sense to introduce the lightweight editor GNU nano, though. It is both lighter and more intuitive to use than the two giants, emacs and vim. Regarding emacs and vim, it is probably useful to mention that to exit EMACS, the key combination is Ctrl-X Ctrl-C and for vim, <ESC>:q!<ENTER>.

A file can be opened by giving it to nano as an argument. The editor should look something like this: nano

The cursor can be moved using the arrow keys, and the editor works in rather intuitive fashion. The keyboard shortcuts are listed at the bottom of the screen and the shortcut Ctrl-G brings up a help page. In the shortcuts, the caret symbol ("^") means the Control key.

A file can be saved using the key combination Ctrl-O, after which the editor will prompt for a filename. To exit the editor, press Ctrl-X.

Removing files

Files can be removed using the command rm (remove files). The files to be removed are given as arguments. By default, rm only removes files, but it can also remove directories and directory trees with the option -r (recursive).

Creating an empty file using touch and then removing it with rm:

tteekkari@kosh:~$ touch testfile1
tteekkari@kosh:~$ ls testfile1
testfile1
tteekkari@kosh:~$ rm testfile1
tteekkari@kosh:~$ ls testfile1
ls: cannot access 'testfile1': No such file or directory
tteekkari@kosh:~$

Please note that many commands — rm included — accept options and arguments in any order. This means that especially with rm, one needs to be careful when handling files with names beginning with a dash (-) as they can be interpreted as options. This can be prevented by giving rm two dashes (--) as the last option before the arguments. It indicates that anything that comes afterward is not an option but an argument and works with quite a few other commands as well.

An example of a problematic filename:

tteekkari@kosh:~/testdir$ ls
directory1/  -r  testfile1
tteekkari@kosh:~/testdir$ rm *
tteekkari@kosh:~/testdir$ ls
-r
tteekkari@kosh:~/testdir$ 

As seen here, both the directory directory1 and the file testfile1 get deleted with the asterisk wildcard, but the file -r stays as it was treated as an option (which also caused the directory to get deleted). The file -r can be deleted like this:

tteekkari@kosh:~/testdir$ ls
-r
tteekkari@kosh:~/testdir$ rm -- -r
tteekkari@kosh:~/testdir$ ls 
tteekkari@kosh:~/testdir$

The example as it should have been done:

tteekkari@kosh:~/testdir$ ls
directory1/  -r  testfile1
tteekkari@kosh:~/testdir$ rm -- *
rm: cannot remove 'directory1': Is a directory
tteekkari@kosh:~/testdir$ ls
directory1/
tteekkari@kosh:~/testdir$ 

Creating and removing directories

Directories can be created using the command mkdir (make directory). Empty directories can be removed using rmdir (remove directory). Both take the directory name(s) as argument(s). As seen in previous example, the command rm can also be used to remove entire directory trees, empty or not, but it should be used with care as without the option -i (interactive) it doesn't ask anything before deleting an entire directory tree.

Creating and deleting directories:

tteekkari@kosh:~$ mkdir directory
tteekkari@kosh:~$ cd directory/
tteekkari@kosh:~/directory$ ls
tteekkari@kosh:~/directory$ cd ..
tteekkari@kosh:~$ rmdir directory/
tteekkari@kosh:~$ mkdir directory
tteekkari@kosh:~$ cd directory/
tteekkari@kosh:~/directory$ touch testfile1
tteekkari@kosh:~/directory$ ls
testfile1
tteekkari@kosh:~/directory$ cd ..
tteekkari@kosh:~$ rmdir directory/
rmdir: failed to remove 'directory/': Directory not empty
tteekkari@kosh:~$ rm -r directory/

First, a directory called directory was created. Then cd was used to go into that directory and list its contents. We changed back out from the directory and deleted it using rmdir which succeeded. We then created the directory again and this time created a file called testfile1 inside the directory. Then removing the directory using rmdir failed, but it could still be removed using rm -r.

Renaming, moving and copying files

Files can be renamed and moved using the command mv (move). This command accepts two or more arguments. When using two arguments, the file given as the first argument will be renamed as indicated in the second argument or, if the second argument is a directory, it will be moved into that directory. If a file with the same name already exists, it will be overwritten. If more than two arguments are used, the last argument has to be a directory, and all the files given as previous arguments will be moved into that directory.

An example of moving and renaming files:

tteekkari@kosh:~/testdir$ ls
directory1  file1  file2
tteekkari@kosh:~/testdir$ mv file1 file3
tteekkari@kosh:~/testdir$ ls
directory1  file2  file3
tteekkari@kosh:~/testdir$ mv file2 file3 directory1/
tteekkari@kosh:~/testdir$ ls
directory1
tteekkari@kosh:~/testdir$ ls directory1/
file2  file3
tteekkari@kosh:~/testdir$

First, the current directory contains an empty directory directory1 and two files, file1 and file2. The file file1 is first renamed as file3 and then both files file2 and file3 are moved into directory1.

Reading the contents of a file

The contents of a file can be printed on the terminal using the command cat (concatenate). As the name implies, the command was originally meant for printing out multiple files in succession to combine them, but can just as well be used to print the contents of a single file. The files to be printed are given as arguments.

Viewing a file using cat:

tteekkari@kosh:~$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.3 LTS"
tteekkari@kosh:~$

Files are often long and can't fit on the screen as is. With long files, it is often better to use the command more, which prompts the user for input after every full screen before printing out the next line or screenful. The name of the command comes from its prompt, which is ---More--.

An alternative to more is less (less is more), which offers more functionality, including scrolling in both directions and performing searches in the open file. Whereas more is a paginator, less is actually more like a file viewer. Less uses a vi-like interface which will not be covered in depth here, but the manual page lists all the commands and shortcuts. To exit less, press q.

A symbolic link is a file that is either a relative or absolute reference to another place in the file system. Unlike regular files and directories, it does not have its own contents. All operations touching the contents are performed on the target file.

The file or directory that a symbolic link points to is called the target. In most cases, symbolic links work fully transparently — that is to say a program opening the link file sees the file or directory the link is pointing to. A symbolic link is a separate file from the target file in that if the target file is, for example, removed, the symbolic link is not. A symbolic link can thus also point to a non-existent target.

Symbolic links can be created using the command ln -s. Note also that a symbolic link can point to one of its parent directories in the file system tree. This will turn the https://en.wikipedia.org/wiki/Tree_(data_structure) into a https://en.wikipedia.org/wiki/Directed_graph and manual traversing of the tree might not always work intuitively. The purpose of symbolic links is usually to make one file or path visible in multiple places in the file system.

Creating symbolic links using relative and absolute paths:

tteekkari@kosh:~$ ls foo.txt
foo.txt
tteekkari@kosh:~$ ln -s foo.txt foo2.txt
tteekkari@kosh:~$ ls -l foo*.txt
lrwxrwxrwx 1 tteekkari domain users 7 Dec 12 17:41 foo2.txt -> foo.txt
-rw-r--r-- 1 tteekkari domain users 0 Dec 12 17:40 foo.txt
tteekkari@kosh:~$ ln -s /usr/share/dict/words words
tteekkari@kosh:~$ ls -l words
lrwxrwxrwx 1 tteekkari domain users 21 Dec 12 17:41 words -> /usr/share/dict/words
tteekkari@kosh:~$

Without the -s option, ln creates hard links. Hard links will not be covered here and generally should be avoided in normal use.

File permissions

The traditional Unix-style file permissions consist of three categories of users who can be given up to three types of permissions. These permissions can also be referred to as the mode of the file. The three categories are user, group and others. Every file has two owners: a user and a group.

The three different permissions are read permission, write permission and execute permission. Read permission allows reading the file contents, write permission allows modifying the file contents and execute permission allows executing the file.

When attempting to access a file, the permissions are checked and access may be denied, when necessary. First, the operating system checks if the user is the owner of the file, and if so, the 'user' category permissions are applied. If the user is not the owner, the operating system checks if the user belongs to the group that has ownership of the file. If so, the 'group' category permissions are applied. Otherwise, the permissions in the 'others' category are applied.

Permissions can be expressed in two ways. One is the symbolic notation seen in the output of ls -l. The other is in the form of a three or four-digit octal number.

The symbolic notation as seen in ls -l:

Symbolic notation

In symbolic notation the permissions are expressed in three groups of three permissions. The first group is the user permissions, the second group is the group permissions and the third is the others permissions. The read permission is shown as r, write as w and execute as x.

The other way to express file permissions is as an octal number. The entire set of file permissions, i.e. the mode of the file, can be expressed as a four-digit octal number. The first digit is a set of special permissions and is usually 0. The next three digits are the user, group and others permissions. Each digit is the sum of permissions for each category. The read permission corresponds to 4, write to 2 and execute to 1. Thus the full rwxrwxrwx in octal would be 0777 (first digit is 0 and the rest are 7 (r+w+x = 4+2+1). The octal number 0644, for example, would correspond to rw-r--r--, and 0600 to rw------- in symbolic notation.

The effect of each mode bit (permission) depends somewhat on the type of the file.

Regular file (-) Directory (d) Symbolic link (l)
Read (r) The file contents can be read. The directory contents can be listed. Ignored. Target file permissions apply.
Write (w) The file can be written into. Files can be removed or added to the directory. Ignored. Target file permissions apply.
Execute (x) The file can be executed. The directory can be traversed and files can be used. Ignored. Target file permissions apply.

As the table shows, permissions on directories and symbolic links behave a bit differently from regular files. The read permission on a directory allows reading the directory contents, which in practice means the names of the files contained in the directory. To be able to read the contents of the files, the execute permission is also needed (and naturally the file itself needs to be readable). The execute bit on a directory allows entering that directory and accessing he contents. The write permission allows creating and removing files. A write permission on a directory controls removing files. Even users with write permission to a file cannot remove the file unless they also have write permission to the directory the file resides in.

For symbolic links, all the permissions on the symbolic link itself are ignored and the permissions of the target file are used instead.

In addition to these three permission bits, there are also three special modes, which are more like file attributes than access permissions. In octal presentation they form the first digit. The special modes are set user ID (setuid, SUID), set group ID (setgid, SGID) and the sticky bit.

The setuid bit, when set, allows a user to execute a file as the owner of the file instead of as him/herself. Thus, a user may run a setuid binary owned by root as the root user and bypass all permission checks. Allowing users to run binaries as other users is inherently dangerous and therefore the setuid bit should normally not be used. External file systems are also usually mounted with the setuid functionality disabled.

The setgid bit works much like the setuid bit, except that instead of changing the user ID when executed, it changes the effective group ID and the file is executed as though the user was in the group owning the file. This is also dangerous and should not be used on regular files.

On directories, the setgid bit causes all files and subdirectories in the directory to inherit the group of the parent directory. This can be very useful on shared directories to make sure that files can be read by other users (in a group).

The sticky bit on a directory changes the behaviour of the write permission so that users cannot move, rename or delete files that are not owned by themselves. This permission is in use, for example, in /tmp/, a shared directory that potentially contains files owned by many different users and prevents the users from removing each others' files.

In symbolic notation, the setuid/setgid special modes are shown -- should a file or directory have them set -- in the place of the relevant execute permission (user for setuid, group for setgid). An execute permission of 'x' is replaced by 's' and a '-' is replaced by 'S'. The sticky bit is shown in the place of the execute permission of the others category. An 'x' is transformed into a 't' and a '-' into a capital 'T'.

In octal notation, the special permissions form the first digit of the four-digit octal representation. The setuid bit corresponds to 4, the setgid bit to 2 and the sticky bit to 1.

For example, the /tmp/ directory usually has the mode drwxrwxrwt, which would correspond to 1777 octal. A directory with setgid and sticky bits set could have a mode of drwxrws--T, which would correspond to 3770 octal. The first digit of 3 corresponds to 2 (setgid) + 1 (sticky) and the user and group permissions to 7 (r (4) + w (2) + x (1) = rwx (7)). In the /tmp/ example the first digit corresponds to the sticky bit (1) and the rest of the permissions are rwx as above.

Setting file mode/permissions

File permissions can be set using the command chmod (change mode). The command can parse both octal and symbolic permissions. The octal notation can be used to set the file mode and the symbolic notation to set individual permission bits.

When using the symbolic notation, the user permissions are referred to using the letter 'u', group permissions using 'g' and others using 'o'. There is also a shorthand for all permissions: 'a' (this is the same as 'ugo'). The permission bits themselves are represented by "r", "w" and "x" (and "s" for setuid/setgid and "t" for sticky bit).

Different invocations of chmod:

tteekkari@kosh:~/testdir$ ls -l
total 0
-rw-r--r-- 1 tteekkari domain users 0 Dec 18 15:53 file1
tteekkari@kosh:~/testdir$ chmod g+w file1
tteekkari@kosh:~/testdir$ ls -l
total 0
-rw-rw-r-- 1 tteekkari domain users 0 Dec 18 15:53 file1
tteekkari@kosh:~/testdir$ chmod 0644 file1 
tteekkari@kosh:~/testdir$ ls -l
total 0
-rw-r--r-- 1 tteekkari domain users 0 Dec 18 15:53 file1
tteekkari@kosh:~/testdir$ chmod go-r file1
tteekkari@kosh:~/testdir$ ls -l
total 0
-rw------- 1 tteekkari domain users 0 Dec 18 15:53 file1
tteekkari@kosh:~/testdir$ chmod a+x file1
tteekkari@kosh:~/testdir$ ls -l
total 0
-rwx--x--x 1 tteekkari domain users 0 Dec 18 15:53 file1

Users, groups and permissions

As Unixes (and Linux) are multi-user operating systems, there can be multiple users logged on to a single system. For every user, the operating system stores seven different data fields: username, password, numerical user ID (UID), numerical group ID (GID), the full name of the user, the home directory path of the user and the shell of the user.

Username is the name that is used to log in to the system and shown in various places where a human readable name is required. The password is (a hash of) the password used to authenticate the user. The numerical user and group IDs are the internal representation of the user and the primary group of the user. The full name and shell are pretty self-explanatory. The home directory path is the path to the user's home directory, which is where any shell initialisation files are read from and where the session begins.

Users

Users' information can be examined by using the finger command. The users who are logged on to a computer can be listed using w, who or users.

tteekkari@kuikka:~$ w
 11:14:45 up 33 days, 16:00,  1 user,  load average: 0.10, 0.04, 0.01
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
tteekkar pts/0    130.233.224.196  11:14    0.00s  0.04s  0.00s w
tteekkari@kuikka:~$ who
tteekkari pts/0        2019-12-18 11:14 (130.233.224.196)
tteekkari@kuikka:~$ users
tteekkari
tteekkari@kuikka:~$

Changing user info

Users can change their login shell using the command chsh. The full name (and a few other miscellaneous fields) can be changed using the command chfn. Of these, only chsh is available on Aalto computers.

Changing password

Users can change their password using the command passwd. The command prompts once for the old password and, if it is entered correctly, twice for the new password. If the new passwords match and fulfil all security requirements, the password is changed.

For changing passwords in the Aalto environment, please see Changing passwords.

Groups

Users can be bundled into user groups. A group has three properties, a name, a numerical group ID and a list of members. A user can be a member in multiple groups and groups are usually used for access control. If a group has permission to do something, then any group member has that same permission.

The groups that a user is member of can be listed using the commands id (which also prints the numeric group/user IDs) and groups.

tteekkari@kosh:~$ id
uid=6666666(tteekkari) gid=70000(domain users) groups=70000(domain users)
tteekkari@kosh:~$ groups
domain users

A user has a primary group, which is used as the default group owner for any new files created by the user. In addition to the primary group, a user can have up to 65535 auxiliary groups, which are just additional groups the user is a member of.

A user's groups are usually only updated at the beginning of the session. If a user is added to a new group during the session, the new group membership will not come into effect unless the user logs out and then back in.

The primary group of the user can be changed using the command newgrp. This is very rarely needed, but can, for example, be used to introduce a new group into an old session.

File ownership

All files are always owned by both a user and a group. Both the user and the group can be assigned different permissions to the file. The owner of a file can be changed using the command chown (change owner, can change both user and group) or chgrp (change group, only changes the group ownership).

Both of these commands will accept either the user or the group names or their numeric IDs.

Users can normally change only the group ownership of their files, and they can only choose groups they are themselves members of.