Skip to content

Latest commit

 

History

History

bash-basic

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

LICENSING

Working in CommandLine Interface, Shell, and Bash

This is the script for the talk given at IPM Observational Astronomy Meeting [^1]. It is based on the IAC SMACK talks [^2].

[^1] https://ipm-oam.github.io/presentations/2022/Working_in_Command-Line_Interface/ [^2] https://gitlab.com/makhlaghi/smack-talks-iac

Original author:

Pedram Ashofteh-Ardakani <pedramardakani@pm.me>

Contributing author(s):

Mohammad Akhlaghi <mohammad@akhlaghi.org>

Copyright (C) 2022 Free Software Foundation, Inc.

The content and the scripts in this work are free work: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This file is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this file. If not, see http://www.gnu.org/licenses/.

Commandline interface and Shell

info bash

_C_ommand_L_ine _I_nterface (CLI) works with the keyboard. Where you can get exactly what you need, manipulate it the way you want, and automate the procedure using scripts.

With _G_raphic _U_ser _I_nterface (GUI), you are bound to what “the designers” have given you. You’re not in control of what you see and can’t automate the procedure. See how hard it is to test a GUI program (I feel you “front-end developers”). There are sophisticated tools created just for testing the GUI, and still, it is not easy to use.

Not to mention how slow you can be when chasing the mouse around the screen. Sure, “seeing” the commands can be convenient, but only for when you’re new to the ecosystem and don’t know the program’s capabilities.

One more thing, GUI changes from one program or even one version to another. With one upgrade, you might completely lose your way around. However, a good designed CLI has a backward-compatible nature. Which means, with the upgrades, you are still on the track while enjoying the new features.

Finally, in CLI you can pass around the results of your work from one program to another. This is called “pipe”ing. An awesome and extremely useful feature that is impossible by design in GUI.

Bash

info bash

Bash is just one form of shell, hence the name Bourne-Again SHell (pun intended in the naming). There are older shells such as sh and more recent ones such as zsh (pronounced zee-shell).

The purpose of the shell is to link different programs using the unified standard input (or input files) and standard output (or output files). Therefore unlike other common “programming” languages like C or Python, numbers for example are just a string of characters (which by coincidence are digits).

Of course, it also has similar structures to C or Python (like if or for); But in the shell, these are higher-level constructs to make it easy to conditionally run programs or run them in a loop.

Learning about the basic capabilities such as loops and conditional goes a long way. But you’re not limited to that. There are many GNU programs and other CLI tools that come to your aid. Without them, you might need thousands of lines of code to accomplish the simplest tasks.

Bash enables you to unleash the power of modularity. That is, you have access to many programs that are specialized to do one and only one job the best way possible. Combining these tools will liberate you from using various libraries and plug-ins for the programming language you use.

I already know Python, so why should I care?

Python has its own merits. It’s great for quickly developing solutions. However, it has a quick evolution where sometimes breaks backward compatibility. Your python script might not work on an older computer, or even on your own computer a few months (or two years at tops) from now!

The transition from Python 2 to Python 3 has not been easy for software that depended on it. For instance, have a look at [[https://arxiv.org/abs/1712.00461][the hardship hte Large Synoptic Survey Telescope pipeline is going through [1]]]. It took them almost one decade and needed a lot of extra energy to engineer the update without crashing the whole thing while running on real data - from Subaru Telescope’s Hyper Suprime Cam which was using it in those years, it is still using the LSST pipeline!

[1] https://arxiv.org/abs/1712.00461

In contrast, it is most likely that your bash script executes just find on a computer as old as 1998! This backward compatibility is a key backbone of reproducibility, an important element that is missing in many of the scientific endeavors done these days.

So be nice to yourself, and use more stable and reproducible programming languages for the long run.

Another issue with python is that most of its modules are themselves written in Python, needing many layers of interpretation before your command reaches the CPU. On the other hand, bash mostly runs on compiled programs that directly talk to the kernel.

Compiled programs are faster than interpreted programs in general, since they have already translated the source code to the specific machine you’re using. With this, you are able to fine-tune your programs and follow the best practices that allow for extremely useful optimizations. For example, in the Warp library of Gnuastro, we are dealing with tens of millions of pixels, and hundreds of millions of processing on them. Even 1 microsecond of delay in this huge scale could result in at least one extra minute, where 1 millisecond means 18 extra hours of runtime!

Bash is a layer of abstraction over a very powerful set of tools. Don’t be fooled by its easy looks and syntax. As soon as you get to know your way around the manual, it can do wonders for you.

With that said, let’s dive in.

Get the terminal ready for presentation

# Later I want to show the convenience of using 'alias'
unalias ls ll

Goal

Check

https://archive.stsci.edu/hlsp/uvudf

Part zero, moving around the command line

  1. Where are we?
    # Kernel name (Linux, Darwin, etc.)
    uname --kernel-name
        
    # Operating system (GNU/Linux, macOS, etc.)
    uname --operating-system
        
    # Current location (i.e. parent working directory)
    pwd
        
  2. Who goes there?!
    ls
    ls --help
    ls --color
    ls -l
    ls -ltrha --color
        
  3. But this is a lot of options to remember, and a lot to type, so let’s set an alias for ls and ll
    alias ls="ls --color"
    alias ll="ls -ltrha --color"
        
  4. Change directory
    cd w/bash-tutorial
        
  5. Check download URL
    cat url.txt
        
  6. Create directory
    mkdir dataset
    ls
        
  7. Download the Hubble Space Telescope (HST) UVUDF survey catalog.
    # The UVUDF survey catalog: https://archive.stsci.edu/prepds/uvudf
    curl https://archive.stsci.edu/missions/hlsp/uvudf/v2.0/hlsp_uvudf_hst_wfc3-uvis_udf-epoch3_multi_v2.0_cat.fits
        
    # We could have given it the output name in the first place by passing the '-o' option
    curl -o catalog-raw.fits \
         https://archive.stsci.edu/missions/hlsp/uvudf/v2.0/hlsp_uvudf_hst_wfc3-uvis_udf-epoch3_multi_v2.0_cat.fits
        

The analysis

  1. Copy the catalogue file with a better name
    cd ~/w/bash-tutorial
    cp dataset/hlsp_uvudf_hst_wfc3-uvis_udf-epoch3_multi_v2.0_cat.fits catalog-raw.fits
        
  2. Convert to text
    # Just bear with me, we're creating a human readable file from a binary
    # FITS format using Gnuastro's Table program. You'll learn about it in
    # the future sessions.
    asttable catalog-raw.fits --txtf64format fixed -o catalog-raw.txt
        
  3. Inspect the file with less
    less catalog-raw.txt
        
  4. Print the first 97 rows
    head -97 catalog-raw.txt
        
  5. They all start with ‘#’, so we can get them with grep as well (no need to speculate)
    # Contains the word 'Column' (case sensitive)
    grep Column catalog-raw.txt
        
    # Use the --color option to see the matches
    grep --color Column catalog-raw.txt
        
    # Or make it case insensitive
    grep -i column catalog-raw.txt
        

    Note that simply writing # would return an error since the pound sign has a special meaning: “comment”. Comments are lines that are ignored by the command line. So what actually happens, is that bash ignores whatever comes right after the pound sign. To avoid that, we’re sandwiching the ‘#’ with single quotes. This might happen when you’re looking for non-alphabetic characters as they might have special meanings. Be careful and sandwich them between ‘single quotes’.

    # Bad form
    grep # catalog-raw.txt
        
    # Correct form
    grep -e '#' catalog-raw.txt
        
    # [Advanced] use regex to say lines that start with the pound sign '#'
    # (read more about Regular expressions in grep manual).
    grep -e '^#' catalog-raw.txt
        
  6. Now write that to a new file, and write the body to another file as well
    grep -e  '^#' catalog-raw.txt > header.txt
    grep -ve '^#' catalog-raw.txt > data.txt
        
  7. Let’s check the header again, this time with more and cat
    cat header.txt
    more header.txt
    less header.txt
        

    Note that if we don’t add the .txt extension, nothing bad happens! The computer doesn’t care! It knows what these files contain. It’s only for us humans, and also, they can be helful when categorizing files. Wanna try? See:

    file header.txt
    file catalog-raw.fits
        
  8. The data has many occurances of -99 and 99 which are intented to be values that are not actually available. But having numbers can ruin our statistics without failing (which is a logical error, the nastiest kind of error). So let’s replace them with nan as in ‘Not a Number’:
    # See that the -99 are replaced with nan
    sed -e's/ -99 / nan /g' data.txt
        
    # But we need to store this data somewhere, also, we need to replace
    # 99 and the floating point -99.0000000000000 (and the positive number)
    # as well! So let's combine all of these criteria inside one 'sed' call
    sed -e's/ -\?99 / nan /g' -e's/ -\?99.0*0 / nan /g' \
        data.txt > catalog.txt
        
  9. Now, let’s say we need to extract the spectroscopic redshifts denoted by SPECZ from the raw catalog. First, we’d have to figure out the column number. But instead of scrolling through the 97 columns, let’s just grep it!
    # Note that order of the options could matter, in this case, it doesn't.
    grep SPECZ header.txt
        
    # Let's put it in a new file
    grep -i 'specz ' header.txt > select.txt
        
    # Check available filters
    grep -i mag_ header.txt
        
    # Let's get the 435 filter as well
    grep -i mag_f435w header.txt
        
    # Suppose there's a lot of them and we can't just remember them. Let's
    # put it in a new file for later reference:
    grep -i mag_f435w header.txt > select.txt
        
    # BUT WAIT! It just overrites the file! So we'd have to append it with >>
    rm select.txt
    grep -i ' id '      header.txt >  select.txt
    grep -i ' specz '   header.txt >> select.txt
    grep -i 'mag_f435w' header.txt >> select.txt
    grep -i 'mag_f606w' header.txt >> select.txt
    grep -i 'mag_f775w' header.txt >> select.txt
        

    How can we show them at the same time? Use the pipe | character. Since it is a special character, we need to escape it with slash \:

    grep -i -e'mag_f435\|mag_f606' header.txt
        

    Feeling bad about all the new information? You can get all of the information from here:

    info grep
        
  10. How about putting some colors in a separate file? Even better, let’s do some arithmetic over them simultaneously!
    awk '{print $1}' catalog.txt
        
    # [Advanced] We actually didn't need to put the data in a separate file
    # just to use AWK easier. AWK takes regex as well. For example:
    awk '!/^#/{print $1}' catalog-raw.txt > catalog.txt
    less catalog.txt
        

    See how the regex seems similar in both grep and awk? This happens a lot. So when you learn a concept, usually it applies to other programs as well. Especially the GNU family.

    # Get the ID, SPECZ, F435W, F606W, F775W. We want ID so we can identify
    # the final results for later use
    cat select.txt
    awk '{print $1, $94, $10, $11, $12}' catalog.txt
        
    # But I don't want to see all of them, just the last line would
    # suffice. How can we use "tail" here? Use the pipe "|"!
    awk '{print $1, $94, $10, $11, $12}' catalog.txt | tail -1
        
    # Let's calculate F435W-F775W to estimate "color"
    awk '{print $1, $94, $10, $11, $12, $10-$12}' catalog.txt | tail -1
    awk '{print $1, $94, $10, $11, $12, $10-$12}' catalog.txt > magnitudes.txt
        
  11. Now select the reddest objects
    # We're saying where 6th column is greater than 3, print it (default
    # behavior)
    awk '$6>3' magnitudes.txt
        
    # Explicitely saying print all columns (that's $0)
    awk '$6>3 {print $0}' magnitudes.txt
        
    # Only their ID and SPECZ
    awk '$6>3 {print $1, $2}' magnitudes.txt
        
    # Save them in a file
    awk '$6>3' magnitudes.txt > reddest.txt
        

    But it has lots of ‘nan’ values, let’s filter them out as well:

    # Add conditions, also, "nan" is a string, so sandwich it between
    # double quotations. In AWK, single quotations have special meaning, it
    # shows the start and stop of the commands, so let's be nice and not
    # confuse it.
    awk '$6>3 && $2!="nan"' magnitudes.txt
        

    It is OK, let’s put it in another catalog:

    awk '$6>3 && $2!="nan"' magnitudes.txt > reddest-with-z.txt
        
  12. Count how many objects we’ve got so far:
    # Use word count
    wc reddest-with-z.txt
        
    # Also, open the help and check the options
    wc --help
        
    # Now check lines, characters, etc. for demo
    wc -l reddest-with-z.txt
        
    # Compare with previous catalog
    wc -l magnitudes.txt
        
  13. Now let’s sort by SPECZ in ascending order
    sort -nk2 reddest-with-z.txt
        
  14. How do we get the object with the max redshift?
    sort -nk2 reddest-with-z.txt | tail -1
        
  15. What is its value?
    sort -nk2 reddest-with-z.txt | tail -1 | awk '{print $2}'
        
  16. We only need 3 decimals:
    sort -nk2 reddest-with-z.txt | tail -1 | awk '{printf "%.3f\n", $2}'
        
  17. Sneak peak at Gnuastro’s Table program:
    # Bug in table range! I used grep since the '--range=SPECZ,-98,98'
    # printed the '99' values as well!
    asttable catalog-raw.fits -cID,SPECZ,10,11,12,'arith $10 $12 -' --sort=SPECZ \
    | asttable  --range=6,3:inf --txtf64format fixed \
    | grep -ve' -\?99.0*0 '
        

Variables

  1. Let’s say we’d want a random floating point number as the last column when we’re creating mock galaxies, etc. How do we create random numbers?

    First we’d need to learn about regualr and special variables, how do we get or set them? There are rules for that:

    • Start with characters (case sensitive), and to split, use the underscore “_” character:
      foo=1
      Foo=2
      echo $foo
      echo $Foo
      2a=5
      # We get an error here!
      response="YAY!"
      echo $response
      echo "$USER: is this fun?"
      echo "audience: $response"
              
  2. Simple arithmetic, only works with integers NOT floating points!
    echo $(( 5+12 ))
    echo $(( $foo+$Foo ))
        
    # Put this into another variable
    bar=3
    baz=17
    foo=$(( $bar+$baz ))
    echo $foo
    echo "Variable foo is: $foo"
        
  3. How do I deal with floating point arithmetic you say? Use AWK ;-)
    echo | awk '{print 1.2 * 10}'
        
  4. Random numbers
    echo $RANDOM
        
  5. How do I know this? Cheating of course:
    # Go to 'Shell Variables' section and find RANDOM, show the bounds which
    # is the range from '0' up to '32767'
    info bash
        

    Notice that the internal variables are in all caps. Using ALLCAPS variable names are discouraged since you might accidentally overwrite a critical shell variable! So please just use lower case variable names.

    echo $PWD
    echo $USER
    echo $PATH
    echo $PS1
    PS1="\[\033[01;35m\]OAM$ \[\033[00m\]"
        
    # Also, you can check all the special variables using 'export'
    export
        
  6. Random number up to 100
    echo $(( $RANDOM%100 ))
        
  7. Now let’s use awk to add a column of random numbers
    awk '{print $0}' reddest-with-z.txt
    awk '{print $0, rand()}' reddest-with-z.txt
        
    # If we run it again, you can see that the random numbers are actually
    # the same! This is because AWK uses the same random-seed. This is to
    # make random numbers 'reproducible'. If you want to actually change the
    # random number for every execution, you must change the random-seed
    awk '{print $0, rand()}' reddest-with-z.txt
    awk 'BEGIN{srand('$RANDOM')}{print $0, rand()}' reddest-with-z.txt
        
    # Now test it again
    awk 'BEGIN{srand('$RANDOM')}{print $0, rand()}' reddest-with-z.txt
    awk 'BEGIN{srand('$RANDOM')}{print $0, rand()}' reddest-with-z.txt
    awk 'BEGIN{srand('$RANDOM')}{print $0, rand()}' reddest-with-z.txt
        
    # It Changes! Now let's format the numbers so we can read them
    # easily. Let's say we are only interested in ID, SPECZ, and the random
    # number
    awk 'BEGIN{srand('$RANDOM')} \
         {printf "%-8d%-10.3f%-10.3f\n", $1, $2, rand()}' \
        reddest-with-z.txt
        

Conditional

  1. The holy if
    # Simple
    if [ 5 -gt 2 ]; then echo "Duh"; else echo "Seriously?"; fi
        
    # Now use a variable
    x=$RANDOM; if [ $x -gt 16000 ]; then echo "TOPHALF :-D $x"; else echo "BOTTOMHALF :-( $x"; fi
        
    # You could also checking if a file exists, a string is matched,
    # etc. Where to get the info? The info! Open bash and search for
    # 'conditional constructs'.
    info bash
        

Loop

  1. The while loop
    # Just print the ID and Spectroscopic redshift
    cat magnitudes.txt | while read -r id z rest_of_line ; \
                               do echo "Object $id redshift $z"; done
        
    # Now put each value in its own file!
    mkdir sample
    ls
    cat magnitudes.txt | while read -r id z rest; \
                               do echo "$id $z" > sample/$id.txt; done
        
    # Similarly you can achieve the same with AWK
    rm sample/*
    ls sample/
    awk '{print $1, $2 > "sample/"$1".txt"}' magnitudes.txt
        
  2. The for loop

    Set the index and the iterable:

    # My Very Educated Mom Just Served Us Nine Pizzas
    for planet in Mars Venus Earth Mercury; do echo "Hi $planet"; done
        
    # Or even list the files here
    for f in $(ls); do echo "file: $f"; done
        
    # BEWARE of white space in filenames as well! It's a good practice to
    # use dash '-' instead of white space.
        

    Now let’s print a sequence, using … seq!

    seq 5
    seq 10
    seq 5 10
    seq 5 0.5 10
        

    Again, in the for loop:

    for i in $(seq 5); do echo "Galaxy $i"; done
        

    Now check for some ids in the samples

    for i in $(seq 20); do echo "Sample $i" ; cat sample/$i.txt ; done
        

    Some samples did not exist! Let’s check for their existance first and then print the details

    for i in $(seq 20); do if [ -f sample/$i.txt ]; then echo "Sample $i" ; cat sample/$i.txt ; fi; done
        

Package

Let’s say now you’ve done some analysis and you’d like to archive it or send to a colleague. Instead of just sending it in its big size, you can compress it to prevent wasting space on the disk!

# Check the initial size
ls -lh catalog-raw.txt
du -h catalog-raw.txt
# Compress and check again
gzip catalog-raw.txt
ls -lh
# De-compress
gunzip catalog-raw.txt.gz

How about all the files we just created? Let’s put them into a tarball so it becomes a single file

tar -cvf my-discovery.tar *.txt
mkdir unpack
cd unpack
tar -xf ../my-discovery.tar

As you’ve already guessed, this can be compressed as well

cd ..
file my-discovery.tar
gzip my-discovery.tar
file my-discovery.tar.gz
ls -lh *.gz

Or all in one command

# Remove the previous compressed tarball
rm my-discovery.tar.gz
# Create a new compressed tarball in one command
tar -xvaf my-discovery.tar.gz *.txt
ls -lh *.gz

History

Now this is how bash figures out what was the last command!

history

Now check how many times we’ve called awk

history | grep awk
history | grep awk | wc -l
history | grep awk > hist-awk.txt

You can even search inside when you’re on the CLI using Ctrl+r

Ctrl+r <part of the command>
Ctrl+r asttable

Where to get the documents?

man awk
info awk
awk --help

Outro

If you’ve learned nothing, it doesn’t matter, take your time and watch the video, or even look for other tutorials.

Beware of “why shoud I care!? I’m not a programmer!”. If you’re writing a program, you’re doing a programmers work.

Two idioms:

  1. One who has a hammer, sees everything as nails.
  2. Do not reinvent the wheel.

Physicists are famous for solving complex problems. They break down the problem to smaller solvable chunks.

For instance, you get to where you must calculate an irregular area. The physicist’s art is done. Now you must figure out the answer with mathematics. An expert has invented a solution already. You know how to calculate the area of a simple rectangle! Divide it to infinitesimal parts and integrate over it! Remember: you’re not a mathematician, probably not one who can derive all formulas from scratch anyway! But you’re using the tools.

Next steps

  • Clean coding