Name	Name	Last commit message	Last commit date
parent directory ..
README.org	README.org
catalog-raw.txt.gz	catalog-raw.txt.gz
script-gnuastro.bash	script-gnuastro.bash
script.bash	script.bash
script.py	script.py
url.txt	url.txt

LICENSING

Working in CommandLine Interface, Shell, and Bash

This is the script for the talk given at IPM Observational Astronomy Meeting [^1]. It is based on the IAC SMACK talks [^2].

[^1] https://ipm-oam.github.io/presentations/2022/Working_in_Command-Line_Interface/ [^2] https://gitlab.com/makhlaghi/smack-talks-iac

Original author:

Pedram Ashofteh-Ardakani <pedramardakani@pm.me>

Contributing author(s):

Mohammad Akhlaghi <mohammad@akhlaghi.org>

The content and the scripts in this work are free work: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This file is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this file. If not, see http://www.gnu.org/licenses/.

Commandline interface and Shell

info bash

_C_ommand_L_ine _I_nterface (CLI) works with the keyboard. Where you can get exactly what you need, manipulate it the way you want, and automate the procedure using scripts.

With _G_raphic _U_ser _I_nterface (GUI), you are bound to what “the designers” have given you. You’re not in control of what you see and can’t automate the procedure. See how hard it is to test a GUI program (I feel you “front-end developers”). There are sophisticated tools created just for testing the GUI, and still, it is not easy to use.

Not to mention how slow you can be when chasing the mouse around the screen. Sure, “seeing” the commands can be convenient, but only for when you’re new to the ecosystem and don’t know the program’s capabilities.

One more thing, GUI changes from one program or even one version to another. With one upgrade, you might completely lose your way around. However, a good designed CLI has a backward-compatible nature. Which means, with the upgrades, you are still on the track while enjoying the new features.

Finally, in CLI you can pass around the results of your work from one program to another. This is called “pipe”ing. An awesome and extremely useful feature that is impossible by design in GUI.

Bash

info bash

Bash is just one form of shell, hence the name Bourne-Again SHell (pun intended in the naming). There are older shells such as sh and more recent ones such as zsh (pronounced zee-shell).

The purpose of the shell is to link different programs using the unified standard input (or input files) and standard output (or output files). Therefore unlike other common “programming” languages like C or Python, numbers for example are just a string of characters (which by coincidence are digits).

Of course, it also has similar structures to C or Python (like if or for); But in the shell, these are higher-level constructs to make it easy to conditionally run programs or run them in a loop.

Learning about the basic capabilities such as loops and conditional goes a long way. But you’re not limited to that. There are many GNU programs and other CLI tools that come to your aid. Without them, you might need thousands of lines of code to accomplish the simplest tasks.

Bash enables you to unleash the power of modularity. That is, you have access to many programs that are specialized to do one and only one job the best way possible. Combining these tools will liberate you from using various libraries and plug-ins for the programming language you use.

I already know Python, so why should I care?

Python has its own merits. It’s great for quickly developing solutions. However, it has a quick evolution where sometimes breaks backward compatibility. Your python script might not work on an older computer, or even on your own computer a few months (or two years at tops) from now!

The transition from Python 2 to Python 3 has not been easy for software that depended on it. For instance, have a look at [[https://arxiv.org/abs/1712.00461][the hardship hte Large Synoptic Survey Telescope pipeline is going through [1]]]. It took them almost one decade and needed a lot of extra energy to engineer the update without crashing the whole thing while running on real data - from Subaru Telescope’s Hyper Suprime Cam which was using it in those years, it is still using the LSST pipeline!

[1] https://arxiv.org/abs/1712.00461

In contrast, it is most likely that your bash script executes just find on a computer as old as 1998! This backward compatibility is a key backbone of reproducibility, an important element that is missing in many of the scientific endeavors done these days.

So be nice to yourself, and use more stable and reproducible programming languages for the long run.

Another issue with python is that most of its modules are themselves written in Python, needing many layers of interpretation before your command reaches the CPU. On the other hand, bash mostly runs on compiled programs that directly talk to the kernel.

Compiled programs are faster than interpreted programs in general, since they have already translated the source code to the specific machine you’re using. With this, you are able to fine-tune your programs and follow the best practices that allow for extremely useful optimizations. For example, in the Warp library of Gnuastro, we are dealing with tens of millions of pixels, and hundreds of millions of processing on them. Even 1 microsecond of delay in this huge scale could result in at least one extra minute, where 1 millisecond means 18 extra hours of runtime!

Bash is a layer of abstraction over a very powerful set of tools. Don’t be fooled by its easy looks and syntax. As soon as you get to know your way around the manual, it can do wonders for you.

With that said, let’s dive in.

Get the terminal ready for presentation

# Later I want to show the convenience of using 'alias'
unalias ls ll

Goal

Check

https://archive.stsci.edu/hlsp/uvudf

Part zero, moving around the command line

Where are we?

# Kernel name (Linux, Darwin, etc.)
uname --kernel-name

# Operating system (GNU/Linux, macOS, etc.)
uname --operating-system

# Current location (i.e. parent working directory)
pwd

Who goes there?!

ls
ls --help
ls --color
ls -l
ls -ltrha --color

But this is a lot of options to remember, and a lot to type, so let’s set an alias for ls and ll
```
alias ls="ls --color"
alias ll="ls -ltrha --color"
    
```
Change directory
```
cd w/bash-tutorial
    
```
Check download URL
```
cat url.txt
    
```
Create directory
```
mkdir dataset
ls
    
```

Download the Hubble Space Telescope (HST) UVUDF survey catalog.

# The UVUDF survey catalog: https://archive.stsci.edu/prepds/uvudf
curl https://archive.stsci.edu/missions/hlsp/uvudf/v2.0/hlsp_uvudf_hst_wfc3-uvis_udf-epoch3_multi_v2.0_cat.fits

# We could have given it the output name in the first place by passing the '-o' option
curl -o catalog-raw.fits \
     https://archive.stsci.edu/missions/hlsp/uvudf/v2.0/hlsp_uvudf_hst_wfc3-uvis_udf-epoch3_multi_v2.0_cat.fits

The analysis

Copy the catalogue file with a better name

cd ~/w/bash-tutorial
cp dataset/hlsp_uvudf_hst_wfc3-uvis_udf-epoch3_multi_v2.0_cat.fits catalog-raw.fits

Convert to text

# Just bear with me, we're creating a human readable file from a binary
# FITS format using Gnuastro's Table program. You'll learn about it in
# the future sessions.
asttable catalog-raw.fits --txtf64format fixed -o catalog-raw.txt

Inspect the file with less
```
less catalog-raw.txt
    
```
Print the first 97 rows
```
head -97 catalog-raw.txt
    
```
They all start with ‘#’, so we can get them with grep as well (no need to speculate)
```
# Contains the word 'Column' (case sensitive)
grep Column catalog-raw.txt
    
```
```
# Use the --color option to see the matches
grep --color Column catalog-raw.txt
    
```
```
# Or make it case insensitive
grep -i column catalog-raw.txt
    
```
Note that simply writing # would return an error since the pound sign has a special meaning: “comment”. Comments are lines that are ignored by the command line. So what actually happens, is that bash ignores whatever comes right after the pound sign. To avoid that, we’re sandwiching the ‘#’ with single quotes. This might happen when you’re looking for non-alphabetic characters as they might have special meanings. Be careful and sandwich them between ‘single quotes’.
```
# Bad form
grep # catalog-raw.txt
    
```
```
# Correct form
grep -e '#' catalog-raw.txt
    
```
```
# [Advanced] use regex to say lines that start with the pound sign '#'
# (read more about Regular expressions in grep manual).
grep -e '^#' catalog-raw.txt
    
```

Now write that to a new file, and write the body to another file as well

grep -e  '^#' catalog-raw.txt > header.txt
grep -ve '^#' catalog-raw.txt > data.txt

Let’s check the header again, this time with more and cat
```
cat header.txt
more header.txt
less header.txt
    
```
Note that if we don’t add the .txt extension, nothing bad happens! The computer doesn’t care! It knows what these files contain. It’s only for us humans, and also, they can be helful when categorizing files. Wanna try? See:
```
file header.txt
file catalog-raw.fits
    
```

The data has many occurances of -99 and 99 which are intented to be values that are not actually available. But having numbers can ruin our statistics without failing (which is a logical error, the nastiest kind of error). So let’s replace them with nan as in ‘Not a Number’:

# See that the -99 are replaced with nan
sed -e's/ -99 / nan /g' data.txt

# But we need to store this data somewhere, also, we need to replace
# 99 and the floating point -99.0000000000000 (and the positive number)
# as well! So let's combine all of these criteria inside one 'sed' call
sed -e's/ -\?99 / nan /g' -e's/ -\?99.0*0 / nan /g' \
    data.txt > catalog.txt

Now, let’s say we need to extract the spectroscopic redshifts denoted by SPECZ from the raw catalog. First, we’d have to figure out the column number. But instead of scrolling through the 97 columns, let’s just grep it!

# Note that order of the options could matter, in this case, it doesn't.
grep SPECZ header.txt

# Let's put it in a new file
grep -i 'specz ' header.txt > select.txt

# Check available filters
grep -i mag_ header.txt

# Let's get the 435 filter as well
grep -i mag_f435w header.txt

# Suppose there's a lot of them and we can't just remember them. Let's
# put it in a new file for later reference:
grep -i mag_f435w header.txt > select.txt

# BUT WAIT! It just overrites the file! So we'd have to append it with >>
rm select.txt
grep -i ' id '      header.txt >  select.txt
grep -i ' specz '   header.txt >> select.txt
grep -i 'mag_f435w' header.txt >> select.txt
grep -i 'mag_f606w' header.txt >> select.txt
grep -i 'mag_f775w' header.txt >> select.txt

How can we show them at the same time? Use the pipe | character. Since it is a special character, we need to escape it with slash \:

grep -i -e'mag_f435\|mag_f606' header.txt

Feeling bad about all the new information? You can get all of the information from here:

info grep

How about putting some colors in a separate file? Even better, let’s do some arithmetic over them simultaneously!

awk '{print $1}' catalog.txt

# [Advanced] We actually didn't need to put the data in a separate file
# just to use AWK easier. AWK takes regex as well. For example:
awk '!/^#/{print $1}' catalog-raw.txt > catalog.txt
less catalog.txt

See how the regex seems similar in both grep and awk? This happens a lot. So when you learn a concept, usually it applies to other programs as well. Especially the GNU family.

# Get the ID, SPECZ, F435W, F606W, F775W. We want ID so we can identify
# the final results for later use
cat select.txt
awk '{print $1, $94, $10, $11, $12}' catalog.txt

# But I don't want to see all of them, just the last line would
# suffice. How can we use "tail" here? Use the pipe "|"!
awk '{print $1, $94, $10, $11, $12}' catalog.txt | tail -1

# Let's calculate F435W-F775W to estimate "color"
awk '{print $1, $94, $10, $11, $12, $10-$12}' catalog.txt | tail -1
awk '{print $1, $94, $10, $11, $12, $10-$12}' catalog.txt > magnitudes.txt

Now select the reddest objects

# We're saying where 6th column is greater than 3, print it (default
# behavior)
awk '$6>3' magnitudes.txt

# Explicitely saying print all columns (that's $0)
awk '$6>3 {print $0}' magnitudes.txt

# Only their ID and SPECZ
awk '$6>3 {print $1, $2}' magnitudes.txt

# Save them in a file
awk '$6>3' magnitudes.txt > reddest.txt

But it has lots of ‘nan’ values, let’s filter them out as well:

# Add conditions, also, "nan" is a string, so sandwich it between
# double quotations. In AWK, single quotations have special meaning, it
# shows the start and stop of the commands, so let's be nice and not
# confuse it.
awk '$6>3 && $2!="nan"' magnitudes.txt

It is OK, let’s put it in another catalog:

awk '$6>3 && $2!="nan"' magnitudes.txt > reddest-with-z.txt

Count how many objects we’ve got so far:

# Use word count
wc reddest-with-z.txt

# Also, open the help and check the options
wc --help

# Now check lines, characters, etc. for demo
wc -l reddest-with-z.txt

# Compare with previous catalog
wc -l magnitudes.txt

Now let’s sort by SPECZ in ascending order
```
sort -nk2 reddest-with-z.txt
    
```

How do we get the object with the max redshift?

sort -nk2 reddest-with-z.txt | tail -1

What is its value?

sort -nk2 reddest-with-z.txt | tail -1 | awk '{print $2}'

We only need 3 decimals:

sort -nk2 reddest-with-z.txt | tail -1 | awk '{printf "%.3f\n", $2}'

Sneak peak at Gnuastro’s Table program:

# Bug in table range! I used grep since the '--range=SPECZ,-98,98'
# printed the '99' values as well!
asttable catalog-raw.fits -cID,SPECZ,10,11,12,'arith $10 $12 -' --sort=SPECZ \
| asttable  --range=6,3:inf --txtf64format fixed \
| grep -ve' -\?99.0*0 '

Variables

Let’s say we’d want a random floating point number as the last column when we’re creating mock galaxies, etc. How do we create random numbers?
First we’d need to learn about regualr and special variables, how do we get or set them? There are rules for that:
- Start with characters (case sensitive), and to split, use the underscore “_” character:
```
foo=1
Foo=2
echo $foo
echo $Foo
2a=5
# We get an error here!
response="YAY!"
echo $response
echo "$USER: is this fun?"
echo "audience: $response"
        
```

Simple arithmetic, only works with integers NOT floating points!

echo $(( 5+12 ))
echo $(( $foo+$Foo ))

# Put this into another variable
bar=3
baz=17
foo=$(( $bar+$baz ))
echo $foo
echo "Variable foo is: $foo"

How do I deal with floating point arithmetic you say? Use AWK ;-)
```
echo | awk '{print 1.2 * 10}'
    
```
Random numbers
```
echo $RANDOM
    
```

How do I know this? Cheating of course:

# Go to 'Shell Variables' section and find RANDOM, show the bounds which
# is the range from '0' up to '32767'
info bash

Notice that the internal variables are in all caps. Using ALLCAPS variable names are discouraged since you might accidentally overwrite a critical shell variable! So please just use lower case variable names.

echo $PWD
echo $USER
echo $PATH
echo $PS1
PS1="\[\033[01;35m\]OAM$ \[\033[00m\]"

# Also, you can check all the special variables using 'export'
export

Random number up to 100
```
echo $(( $RANDOM%100 ))
    
```

Now let’s use awk to add a column of random numbers

awk '{print $0}' reddest-with-z.txt
awk '{print $0, rand()}' reddest-with-z.txt

# If we run it again, you can see that the random numbers are actually
# the same! This is because AWK uses the same random-seed. This is to
# make random numbers 'reproducible'. If you want to actually change the
# random number for every execution, you must change the random-seed
awk '{print $0, rand()}' reddest-with-z.txt
awk 'BEGIN{srand('$RANDOM')}{print $0, rand()}' reddest-with-z.txt

# Now test it again
awk 'BEGIN{srand('$RANDOM')}{print $0, rand()}' reddest-with-z.txt
awk 'BEGIN{srand('$RANDOM')}{print $0, rand()}' reddest-with-z.txt
awk 'BEGIN{srand('$RANDOM')}{print $0, rand()}' reddest-with-z.txt

# It Changes! Now let's format the numbers so we can read them
# easily. Let's say we are only interested in ID, SPECZ, and the random
# number
awk 'BEGIN{srand('$RANDOM')} \
     {printf "%-8d%-10.3f%-10.3f\n", $1, $2, rand()}' \
    reddest-with-z.txt

Conditional

The holy if

# Simple
if [ 5 -gt 2 ]; then echo "Duh"; else echo "Seriously?"; fi

# Now use a variable
x=$RANDOM; if [ $x -gt 16000 ]; then echo "TOPHALF :-D $x"; else echo "BOTTOMHALF :-( $x"; fi

# You could also checking if a file exists, a string is matched,
# etc. Where to get the info? The info! Open bash and search for
# 'conditional constructs'.
info bash

Loop

The while loop

# Just print the ID and Spectroscopic redshift
cat magnitudes.txt | while read -r id z rest_of_line ; \
                           do echo "Object $id redshift $z"; done

# Now put each value in its own file!
mkdir sample
ls
cat magnitudes.txt | while read -r id z rest; \
                           do echo "$id $z" > sample/$id.txt; done

# Similarly you can achieve the same with AWK
rm sample/*
ls sample/
awk '{print $1, $2 > "sample/"$1".txt"}' magnitudes.txt

The for loop

Set the index and the iterable:

# My Very Educated Mom Just Served Us Nine Pizzas
for planet in Mars Venus Earth Mercury; do echo "Hi $planet"; done

# Or even list the files here
for f in $(ls); do echo "file: $f"; done

# BEWARE of white space in filenames as well! It's a good practice to
# use dash '-' instead of white space.

Now let’s print a sequence, using … seq!

seq 5
seq 10
seq 5 10
seq 5 0.5 10

Again, in the for loop:

for i in $(seq 5); do echo "Galaxy $i"; done

Now check for some ids in the samples

for i in $(seq 20); do echo "Sample $i" ; cat sample/$i.txt ; done

Some samples did not exist! Let’s check for their existance first and then print the details

for i in $(seq 20); do if [ -f sample/$i.txt ]; then echo "Sample $i" ; cat sample/$i.txt ; fi; done

Package

Let’s say now you’ve done some analysis and you’d like to archive it or send to a colleague. Instead of just sending it in its big size, you can compress it to prevent wasting space on the disk!

# Check the initial size
ls -lh catalog-raw.txt
du -h catalog-raw.txt

# Compress and check again
gzip catalog-raw.txt
ls -lh

# De-compress
gunzip catalog-raw.txt.gz

How about all the files we just created? Let’s put them into a tarball so it becomes a single file

tar -cvf my-discovery.tar *.txt
mkdir unpack
cd unpack
tar -xf ../my-discovery.tar

As you’ve already guessed, this can be compressed as well

cd ..
file my-discovery.tar
gzip my-discovery.tar
file my-discovery.tar.gz
ls -lh *.gz

Or all in one command

# Remove the previous compressed tarball
rm my-discovery.tar.gz

# Create a new compressed tarball in one command
tar -xvaf my-discovery.tar.gz *.txt
ls -lh *.gz

History

Now this is how bash figures out what was the last command!

history

Now check how many times we’ve called awk

history | grep awk
history | grep awk | wc -l
history | grep awk > hist-awk.txt

You can even search inside when you’re on the CLI using Ctrl+r

Ctrl+r <part of the command>
Ctrl+r asttable

Where to get the documents?

man awk
info awk
awk --help

Outro

If you’ve learned nothing, it doesn’t matter, take your time and watch the video, or even look for other tutorials.

Beware of “why shoud I care!? I’m not a programmer!”. If you’re writing a program, you’re doing a programmers work.

Two idioms:

One who has a hammer, sees everything as nails.
Do not reinvent the wheel.

Physicists are famous for solving complex problems. They break down the problem to smaller solvable chunks.

For instance, you get to where you must calculate an irregular area. The physicist’s art is done. Now you must figure out the answer with mathematics. An expert has invented a solution already. You know how to calculate the area of a simple rectangle! Divide it to infinitesimal parts and integrate over it! Remember: you’re not a mathematician, probably not one who can derive all formulas from scratch anyway! But you’re using the tools.

Next steps

Clean coding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bash-basic

bash-basic

README.org

LICENSING

Commandline interface and Shell

Bash

I already know Python, so why should I care?

Get the terminal ready for presentation

Goal

Check

Part zero, moving around the command line

The analysis

Variables

Conditional

Loop

Package

History

Where to get the documents?

Outro

Next steps

Files

bash-basic

Directory actions

More options

Directory actions

More options

Latest commit

History

bash-basic

Folders and files

parent directory

README.org

LICENSING

Commandline interface and Shell

Bash

I already know Python, so why should I care?

Get the terminal ready for presentation

Goal

Check

Part zero, moving around the command line

The analysis

Variables

Conditional

Loop

Package

History

Where to get the documents?

Outro

Next steps