Topics in Systematics and Evolution: Bioinformatics for Evolutionary Biology

Topic 2: Commandline Introduction

In this tutorial we will be going through parts of several software carpentry workshops. Right now, just code along with me, but after the lesson you can go through the tutorial on your own to clarify any understanding problems.

Accompanying material

Other Tutorials

Here are some good tutorials if you’re interested in learning a programming language

Getting help

most programs come with a manual page, explaining all options. You can get help about individual command with the following:

  1. for command structure, variables, and shell rules: man bash
  2. bash builtins: help, and help <cmd>. help on for loops: help for, help on conditionals help if, help on change directory help cd, etc.
  3. for help on external programs, like ls, grep, sed, awk you have to look at their manual pages: e.g. man sed
  4. for viewing the contents of a file on screen, use less

Programs like man and less show an on-screen navigation system:

Editing

We’ll have to edit files often in the course. You can edit files locally on your computer and copy them over (we show you how to copy files to the server in this topic). If you don’t have an editor on your laptop, we can suggest Sublime Text, or Visual Studio Code (VS Code). A simple text editor builtin to your os will do. e.g. Wordpad or gedit. Avoid notepad or word.

We also have several editors which you can run directly on the server. Editing directly on the server is faster because you’ll be debugging iteratively.

Reference: Creating a script

You will be asked to type commands interactively, but in later topics you will be asked to create scripts. Here is an example to create a bash script, which by convention ends with .sh.

# here we use nano, but you could use any other editor of choice
nano my_first_script.sh

If the file doesn’t exist, you will see an empty file. You can then type content (i.e. a series of bash commands) in the file. Example:

Save the file, and exit. You can then run this script with:

bash my_first_script.sh

If you add the special line #!/bin/bash (aka “hashbang”) at the top of your script, and mark the script executable (chmod +x my_first_script.sh), then you will be able to run it more easily:

./my_first_script.sh

If you have X11 Forwarding enabled, you can use graphical editors installed on the server:

 # emacs supports both terminal based and window (x11) based
 emacs my_first_script.sh

If you see a window come up, then your X forwarding is configured correctly. Otherwise the terminal version will come up. Graphical emacs looks like this (hit q to remove the welcome screen):

Reference: Copying files between servers (or between your computer and the server)

You can use cp to copy files from and to the same computer. To copy across computers, you have to rely on networking tools. We have collected information on copying files into Copying across machines.

Pipes and redirection

A key feature of command line use is piping the output of one command to the input of another command. This means that large files can be analyzed in multiple scripts without having to write to disk repeatedly.

Key terms
sed

Stream editor. It parses and transforms text using regular expressions. Very powerful, but most easily used to reformat files based on patterns.\ Examples:

grep

Search using regular expression (regex). This command searches for patterns and prints lines that match your pattern.\ Examples:

Exercise 1 – build a pipeline that:

Answer 1
seq 2 2 100 | grep -v 0 | sed 's/2$/2!/g' | grep '!\|3' > exercise_3.txt

Running commands in background

Often you will run commands that take hours or days to finish. If you run them normally your connection needs to be maintained for the whole time which can be impossible. Using screen/tmux/byobu allows you to keep a screen open while you’re logged out and then reconnect to it without loss of data at a later time.

byobu is a layer of veneer on top of screen/tmux. screen and tmux are equally powerful, but can be unintuitive to use.

Cancel command = ctrl-c. This will cancel a script that is currently running. Example:

> seq 1000000
ctrl-c to cancel

Byobu:

Guide to Byobu

Byobu can create multiple levels.

Commands in Byobu

We also provide the underlying command which performs the same action in tmux, in case you experience difficulties with your terminal and function keys.

Troubleshoot: Function keys broken: Byobu is tailored to linux terminal emulators (esp gnome-terminal). If you find that the function keys don’t behave as expected when you’re logged in to the server, you might have to configure your terminal parameters to pass the correct escape codes. This is covered in Topic 1: finalize tool config.

Troubleshoot: Strange characters pop-up: The font in your terminal emulator needs to support unicode characters. The font Ubuntu Mono is known to work well. If you find the lower bar distracting, you may run the command byobu-quiet. This can be undone with byobu-quiet --undo.

Exercise 2:

Answer 2
   > byobu 
   F2
   > seq 10000000
   F3
   F3
   > exit
   F6
   > byobu
   > exit

Daily Assignments

  1. What is one task you’d rather use an R script instead of a shell script? Why? What is one task you’d rather use a shell script, instead of an R script? Why?
  2. Why is piping directly between programs faster than writing each consecutive output to the disk? Explain using information about computer hardware.