Notes On Shell
Intro
The following is a compilation of commands and notes on using the command line. Specifically bash
. Bash commands, or the corresponding ELF executables, are available on Linux/Unix based systems like Ubuntu and Mac OSX.
While they are not easily available on Windows, it is possible to run some commands via console emulators like Cmder with Git Bash, or more recently the Windows Subsystem for Linux which gives full access to ELF binaries.
For an introduction on the command line, there is a great practical book by Jeroen Janssens called Data Science at the Command Line, which can be read online here.
Generally speaking, the notes provided here are practical data science examples on the collection of tools exposed via the Unix command line (i.e. bash, shell, console). This can include the defaults or commonly used extensions.
Advantages
Unix commands execute specific scripts or tools which have been designed to do a single task, and do it as efficiently as possible without loading all the data in memory. In this case, if available, it is much faster to read/manipulate/process data using these tools than trying to do the same thing in another programming language.
Commands
Below is just a short list of notable commands I have found useful, and want to remember. On all commands, use --help
flag for more info and usage instructions.
echo
writes to outputprintf
prints with formattingtouch
creates an empty file (this command will NOT overwrite or delete contents of an existing file)>
also known as redirect, outputs text to a file, creating it if it does not exist.>
will overwrite an existing file, while>>
will append to an existing filehead
preview first 10 lines of filesort
sort a filegrep
search a file for a regular expression pattern.|
also known as pipe operator, combines multiple commandsfind
will find a filecat
will concatenate files or outputtruncate
with-s 0
option will delete contents of a fileuniq
will deduplicate data, however the duplicates must be adjacent (usesort
first). Use flag-c
to count the number of duplications. Also pass-d
flag to show duplicated data only.wc
command on a file by default will return the number of lines, words, and characters.
Text Editors
Nano
nano file.txt
Save a file with Ctrl
+o
, exit a file with Ctrl
+x
For Loops
A one line for loop looks like for i in {1..10}; do echo "Hello World!"; done
Examples
Basic File Manipulation
The is an example of how to use practical commands in sequence to explore file operations. Run the one by one in order.
echo Hello World!
touch data.txt
for i in {1..10}
do
echo "$i Hello World!" >> data.txt
done
head data.txt
sort data.txt -r
grep [3-7] data.txt
for i in {1..15}; do echo "$i Hello World!" >> data.txt; done
wc data.txt
sort data.txt | uniq -c
Curl
curl is used in command lines or scripts to transfer data. - website
jq
jq is a lightweight and flexible command-line JSON processor. - website
You may need to download it here
Here is an example of how to get the html
key from a returned JSON object, using the open Instagram oembed endpoint for one of my posts.
curl -s "https://api.instagram.com/oembed/?url=http://instagr.am/p/BjPSQKJFUYa" | jq -r ".html"
Scripting
.sh
shell scripts can be used for absolutely anything.
Example curl API, redirect JSON key to file
This below is an example of writing a shell script to automatically and dynamically redirect API data to a file in one command.
Fetch Latest Instagram Post Embed HTML
I recently created a very simple shell script to fulfill a very simple task for this website.
My problem was I wanted to embed my latest Instagram post on the about page, yet Instagram does not allow to get this dynamically. In this case I have to manually update the embed html for the latest post of interest. This is a hassle, and like any good developer I considered how to automate this.
After finding that Instagram has an open API for getting a posts embed html, it is possible to simply make a GET request, and parse the HTML from the returned JSON. So I explored using curl
to do the HTTP request, and jq
to process the JSON, and finally redirect this into a file. The following command works well.
curl -s "https://api.instagram.com/oembed/?url=http://instagr.am/p/BjPSQKJFUYa" | jq -r ".html"
Now, before I continue I must admit at this point I found someone who implemented this exact thing and already wrapped it up into a shell script. So credit goes to Wouter Bulten for the informational post on this topic already. My simplified and custom solution meeting my specific needs is continued below.
Now to just finalize the script, I add two dynamic variables. One for the path of the output file, hardcoded in the script. And the other is the required shortcode of the post of interest. The shortcode can be supplied as a command line argument or entered during execution when prompted.
I saved the file under the name, fetch_latest_instagram.sh
. Running the code therefore looks like fetch_latest_instagram.sh BjPSQKJFUYa
, and then for me the embed code is written to the correct file in my _includes
folder which I pick up on my about page.
Note that I added an additional prompt before closing because on Windows the shell script opens in a new window, which automatically closes after execution, making it impossible to read if there was an error.
if [ $# -eq 0 ]
then
echo "Enter post shortcode:"
read shortcode
else
shortcode="$1"
fi
path="_includes\integrations\instagram_latest_photo.html"
echo "Fetching embed HTML of given post"
echo "Writing to $path"
curl -s "https://api.instagram.com/oembed/?url=http://instagr.am/p/$shortcode" | jq -r ".html" > "$path"
echo "Successfully wrote embed code to instagram_latest_photo.html"
echo "Type any key to close:"
read dummmy
echo "Closing"