Keywords: Linux; bash; website; mutt; cron; grep; sed; awk

有时候我们特别关注几个网页,但是它们又没有更新的这么频繁。比如,有个叫 phdcomics 的网站,我个人比较喜欢这里的关于博士生与老板的漫画; 再比如,有个叫 Tecmint 的科技博客。但是它们不定常更新,总是要登录网页去查看有没有更新,不像是我们这种极客的风格,得有多浪费时间啊。我需要的是,它们一有更新,就有人马上通知我,然后我编程或者写论文累的间隙可以去刷刷时间。

所以我写了以下的脚本。主要思路是用 wget 来抓取 html 文本(new_*.html),并与初始的 html 文件(old_*.html)比较。如果有 diff 之后发现有更新,然后通过 mutt 给某个邮箱发送邮件。如果对怎么设置 mutt 有兴趣,请浏览这个博客

  1. bash 脚本

首先在 /usr/local/bin 的路径中 touch 一个脚本,比如 webUpdateCheck。

 sudo touch /usr/local/bin/webUpdateCheck

sudo chmod u+x webUpdateCheck 

然后在输入以下这些内容:

 #!/bin/bash

# monitor web pages for changes
# wget is employed to fetch the html file to /tmp directory
wget -O /tmp/new_comics.html -q www.phdcomics.com

# diff is used to compare the new fetched file with the old file
diff_output=$(diff /tmp/new_comics.html /tmp/old_comics.html)

# if the diff output is not empty, then send an email to email account
if [ "" != "$diff_output" ]; then
# extract the URL address of the updated comics
grep 'www.phdcomics.com/comics/archive/' /tmp/new_comics.html | grep 'src=http' | sed '1!d' | sed 's/^.*http:\/\///' | awk '{print $1}' > /tmp/temp.txt
# download the gif figure
wget $(cat /tmp/temp.txt)
# send email to an email account
echo -e "Hello! \n The phdcomics has a new updated comics \n Please visit www.phdcomics.com" | mutt -s "phd comics" example@gmail.com -a *.gif
# update the old html file
wget -O /tmp/old_comics.html -q www.phdcomics.com
# remove the figures
rm /tmp/*.gif
fi

# This is a demo for another website
wget -O /tmp/new_tecmint.html -q www.tecmint.com

#
diff_output2=$(diff /tmp/new_tecmint.html /tmp/old_tecmint.html)

#
if [ "" != "$diff_output2" ]; then
echo -e "Hello! \n The tecmint has a new updated page \n Please visit www.tecmint.com" | mutt -s "Tecmint" kingdomhql@gmail.com
wget -O /tmp/old_tecmint.html -q www.tecmint.com
fi

# This is a demo for my wordpress website
wget -O /tmp/new_wordpress.html -q kingdomhe.wordpress.com

#
diff_output3=$(diff /tmp/new_wordpress.html /tmp/old_wordpress.html)

#
if [ "" != "$diff_output3" ]; then
echo -e "Hello! \n Kingdomhe's wordpress has a new updated page \n Please visit kingdomhe.wordpress.com" | mutt -s "Kingdomhe wordpress" kingdomhql@gmail.com
wget -O /tmp/old_wordpress.html -q kingdomhe.wordpress.com
fi 

2. cron 设置

然后在 cron 中建立一个任务,要求每一段时间执行这个脚本。如果需要知道怎么设置,请参考这个博客。大致如下(每30分钟执行一次这个脚本):

 30 * * * * /usr/local/bin/webUpdateCheck 

注解:

首先,我直接写下来这个脚本,在我的服务器上测试了一下,基本没有问题。当中有个我比较喜欢的是, 直接下载更新的图片然后当作附件发送到邮箱里。这样我就根本不用去打开那个网页了,在邮箱里就可以看到。需要的时候还可以分享到微信朋友圈。

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s