NASA Astronomy Picture of the Day bot for GNU Social

Published by Arun Isaac on June 11, 2016

Tags: gnusocial, astronomy, project, software, bash

A bash script, run periodically by a cron job, to post the day’s NASA Astronomy Picture of the Day to GNU Social. The script uses GNU Social’s Twitter compatible API to publish notices.

This project is now maintained in its own git repo. This page will not be kept updated.

This project is now maintained at https://git.systemreboot.net/nasa-apod-gnu-social-bot. Please find the latest version there.

I wrote a bash script to extract the day’s NASA Astronomy Picture of the Day (NASA APOD) and post it to GNU Social. The script is run periodically by a cron job. The bot may be found at https://social.systemreboot.net/apod.

Facebook, Google+, Twitter, etc. have NASA APOD bot accounts, but as far as I know, GNU Social does not. So, I decided to write one. Besides, it was fun learning to use the GNU Social API.

The following is a line by line description of the script. There is not very much GNU Social documentation, tutorials, blog posts or projects out there on the Internet. So, I am sharing this here in the hope that it might help somebody.

Shebang

We start with the usual bash shebang. Normally, bash would continue executing the script even if some of the commands fail. The -e flag tells bash to abort the script if any command fails. This is useful because if any of the network operations fail due to lack of connectivity, we want to safely abort the script and not blindly ply on.

#! /usr/bin/bash -e

Some settings

Then, we have some settings put into bash variables. SOCIAL_API_URL is the base API URL of the GNU Social instance. This can be obtained using Really Simple Discovery. The API uses HTTP Basic Authentication, for which we have the username and password in BOT_USERNAME and BOT_PASSWORD respectively. And finally, we have APOD_BASE_URL which we reference several times later in the script.

SOCIAL_API_URL=https://social.systemreboot.net/api
BOT_USERNAME=apod
BOT_PASSWORD=secret-password-here

APOD_BASE_URL=https://apod.nasa.gov/apod

Downloading the HTML page, parsing it and extracting required information

We download the HTML page for the picture of the day and capture it in a variable APOD_HTML. I didn’t want to make a mess downloading the HTML page into a file, and thought capturing it in a variable was cleaner.

Then, we process the downloaded HTML to extract the information we need – namely the title of the image, a link to the image, the date on which the image was posted, and a permanent link to the page itself. Sometimes, NASA posts a YouTube video instead of an image. That must also be handled properly. Extracting information from HTML pages is yucky disgusting business, and is not at all robust. And, NASA APOD with its ancient HTML, absent even CSS classes or IDs to select, does not help matters. There is a NASA APOD RSS feed however. RSS, being XML is more machine readable and might have been good. But, the NASA APOD RSS feed only has links to scaled down thumbnails, and not the full images. So, it is useless for our purposes.

Parsing of the downloaded HTML was done using pup. The “Copy Unique Selector” feature in Icecat’s (Firefox family of browsers) DOM and Style Inspector came in handy in constructing the CSS selectors required by pup.

APOD_HTML=$(curl $APOD_BASE_URL/astropix.html)

TITLE=$(echo "$APOD_HTML" | pup 'center:nth-child(2) > b:nth-child(1) text{}' \
                 | sed -e 's/^ *//' -e 's/ *$//')
IMAGE_LINK=$APOD_BASE_URL/$(echo "$APOD_HTML" | pup -p 'img attr{src}')
YOUTUBE_VIDEO_ID=$(echo "$APOD_HTML" | pup 'iframe attr{src}' | awk -F/ '{print $5}' | awk -F? '{print $1}')
YOUTUBE_LINK="https://www.youtube.com/watch?v=$YOUTUBE_VIDEO_ID"
if [[ -z "$YOUTUBE_VIDEO_ID" ]]
then
    MEDIA_LINK=$IMAGE_LINK
else
    MEDIA_LINK=$YOUTUBE_LINK
fi
DATE=$(echo "$APOD_HTML" | pup ':contains("Discuss") attr{href}' | awk -F= '{print $2}')
PAGE_LINK=$APOD_BASE_URL/ap$DATE.html

Constructing the GNU Social notice

Once we have extracted all required information, we construct the GNU Social notice in a variable.

NOTICE="$TITLE $PAGE_LINK $MEDIA_LINK"

Sanity checks

Finally, before publishing, we do a couple of sanity checks.

Make sure we are not publishing a duplicate notice

We make sure we are not publishing a duplicate of our previous notice. First, we get the previous notice using the GET statuses/home_timeline ¹ API call. The text of the most recent notice is extracted from the JSON response using jq, the command line JSON processor. If the notice we are about to publish is the same as the notice previously published, the script aborts.

If the script is run more than once a day, this duplicate notice check ensures that the same picture does not get posted several times. Also, polling NASA APOD several times means that we are more likely to get the image as soon as NASA publishes it.

The Twitter API claims to detect duplicate notices, and not publish them. But, GNU Social does not seem to do that (at least, as yet). Hence I had to implement this on my own.

PREVIOUS_NOTICE=$(curl -u "$BOT_USERNAME:$BOT_PASSWORD" $SOCIAL_API_URL/statuses/home_timeline.json \
                           | jq -r '.[0] | .text')
if [[ "$NOTICE" = "$PREVIOUS_NOTICE" ]]
then
    echo "Notice \"$NOTICE\" already published. Aborting..." >&2
    exit 1
fi

Check image and page links exist

We also make sure we have extracted the links properly and that they actually exist. The –spider flag makes wget only check for the existence of the link and not waste bandwidth downloading it. If the links do not exist, wget will exit with an error, and bash will abort the script.

wget --spider $MEDIA_LINK
wget --spider $PAGE_LINK

Publishing the notice

And, at last, if everything went right up to this point, we publish the notice with the POST statuses/update ¹ API call. From the response JSON, which contains a lot of things, the text of the notice and the publishing time are extracted and printed.

curl -u "$BOT_USERNAME:$BOT_PASSWORD" --data-urlencode "status=$NOTICE" \
     $SOCIAL_API_URL/statuses/update.json \
    | jq -r '"Published notice \"\(.text)\" on \(.created_at)"'

Downloads

apod-bot.sh – Complete GNU Social NASA APOD bot script

Footnotes:

I link to the Twitter API documentation because it is clearer with examples showing how to use it. But note that not all features of the Twitter API are supported by the GNU Social Twitter compatible API. Always check with the GNU Social API documentation, which for my instance is at https://social.systemreboot.net/doc/twitterapi

[root@spaceship-earth ~]# reboot

NASA Astronomy Picture of the Day bot for GNU Social

NASA Astronomy Picture of the Day bot for GNU Social

Shebang

Some settings

Downloading the HTML page, parsing it and extracting required information

Constructing the GNU Social notice

Sanity checks

Make sure we are not publishing a duplicate notice

Check image and page links exist

Publishing the notice

Downloads

Footnotes: