Thursday, March 14, 2013

Monitoring a web page for changes using bash

There's this conference that I'd like to attend and I've heard that it's a hard-to-get-into type conference.  When I go to their site it doesn't have any new info.

Rather than checking the site every day, I'd like to have it monitored and be alerted when something new DOES appear on it.

Now I know there are services like ChangeDetection.com that can monitor it for me, but I was wanting to cobble something together with the tools I already have.  I'd also like to have the ability to customize what it consider "a change" at my disposal when/if I need it.

To that end, I threw together the following bash script.  It monitors a URL and if it detects a change, it sends an email to my gmail account letting me know.

Hope you find it useful.  BTW, I'm using a program called sendEmail to send the email notification.  It's in apt if you're using a debian/ubuntu-like distribution.

#!/bin/bash

# monitor.sh - Monitors a web page for changes
# sends an email notification if the file change

USERNAME="me@gmail.com"
PASSWORD="itzasecret"
URL="http://thepage.com/that/I/want/to/monitor"

for (( ; ; )); do
    mv new.html old.html 2> /dev/null
    curl $URL -L --compressed -s > new.html
    DIFF_OUTPUT="$(diff new.html old.html)"
    if [ "0" != "${#DIFF_OUTPUT}" ]; then
        sendEmail -f $USERNAME -s smtp.gmail.com:587 \
            -xu $USERNAME -xp $PASSWORD -t $USERNAME \
            -o tls=yes -u "Web page changed" \
            -m "Visit it at $URL"
        sleep 10
    fi
done

Then from a bash prompt I run it with the following command:

nohup ./monitor.sh &

Using nohup and throwing it in the background allows me to log out and have the script continue to run.

24 comments:

  1. It's exactly the script I was looking for. Thank you, Steve, for making it!

    ReplyDelete
  2. Cheers Steve, very handy script

    ReplyDelete
  3. Is it possible to send tweet (private message) instead email?

    ReplyDelete
  4. I find this tool for posting tweets from CLi
    A command-line power tool for Twitter.
    http://sferik.github.com/t

    ReplyDelete
  5. I edited this to use PushBullet to alert instead of email

    #!/bin/bash

    # monitor.sh - Monitors a web page for changes
    # sends a PushBullet push alert on change every 10 min

    PBTOKEN="PB ACCESS TOKEN"
    URL="PUT URL HERE"

    for (( ; ; )); do
    mv new.html old.html 2> /dev/null
    curl $URL -L --compressed -s > new.html
    DIFF_OUTPUT="$(diff new.html old.html)"
    if [ "0" != "${#DIFF_OUTPUT}" ]; then
    curl -u $PBTOKEN: https://api.pushbullet.com/v2/pushes -d type=note -d title="Site Changed" -d body="Visit it here $URL"
    sleep 600
    fi
    done

    ReplyDelete
  6. Hi,
    Thanks a lot for your script, very helpful
    Is it possible to check exact part (text, image, ...) in the page and not the entire as we can put it dynamical things ?
    Regards

    AyGitci

    ReplyDelete
    Replies
    1. You could... possibly using grep and it's extended regex, but I suspect you'd be opening a can of worms by trying to parse the HTML DOM that way.

      Delete
  7. you need a sleep 10 after fi too , otherwise you will cause loads of problems

    sleep 10
    fi
    sleep 10
    done

    ReplyDelete
    Replies
    1. +1
      If there is no changes, the script is spinning in a loop without any delays, causing significant performance issues.

      Delete
    2. It did look like it was spamming the request non-stop, this extra sleep 10 fixed it after the if loop.

      Delete
  8. Since you're pulling the diff.. you may as well add the diff to the body of the email.

    ReplyDelete
  9. im getting this error in ubuntu 14.04 LTS

    check.sh: 10: check.sh: Syntax error: Bad for loop variable

    help me

    ReplyDelete
  10. Good post....thanks for sharing.. very useful for me i will bookmark this for my future needs. Thanks.
    New Branded Laptops and Desktops In Delhi

    ReplyDelete
  11. hie steve, i am looking for a script simple has your but to actually check a page where you need to log in first, any idea ?

    thx

    ReplyDelete
    Replies
    1. Have a look at curl's '-u' argument in the man page.

      Delete
  12. Thank you so much! I've been looking for a script like this. The only problem I'm running into is that it keeps saying that sendEmail, new.html, and old.html do not exist... I do have these but for some reason I keep getting these messages.

    ReplyDelete
  13. Hi Steve,

    Thanks for the script.

    Is it possible to monitor multiple websites? What triggers the on/off button for this script? How would I have the script continually run every 1 second?

    ReplyDelete
  14. Hi Steve,

    Thanks for the script.

    Is it possible to monitor multiple websites? What triggers the on/off button for this script? How would I have the script continually run every 1 second?

    ReplyDelete
    Replies
    1. Assuming you're running on a Linux system, you'd probably want to use cron (type 'man crontab' from a bash prompt). Cron tab run the script for whenever and however often you'd like. As for monitoring multiple sites, just run multiple versions of the script.

      Delete
  15. Hi there! I found your script very very useful. I modified to send DM on twitter using oysttyer and it works great.
    Thank you very much.

    ReplyDelete
  16. PLEASE fix the initial script to move the "sleep 10" outside of the email loop. This is obviously not the right place for it. It belongs *after* the fi, or it will HAMMER whatever URL you are pointing it at repeatedly, as fast as it can, and only pausing for 10 seconds after it detects a change and sends an email. Obviously not the behavior you intended.

    ReplyDelete