24
Recently I had a problem where I was trying to figure out how to monitor a site for new content or changes in current content. The problem was the site didn’t have an RSS or Atom feed or anything similar to monitor it automatically via a simple PHP script or similar method so I had to come up with this method to monitor a site for changes.
As I sat and thought about it I realized there were couple ways to achieve this level of monitoring I was looking for. One method I came up with was to grab the entire contents of the page and cache it in a text file then periodically grab the page and compare the data to the cached version. This solution although it would work just fine was a little more trouble and effort than I wanted
.
The other method, and the one I’m going to illustrate in this tutorial is similar to the above method except we don’t cache the content of the site, rather we hash it and compare the pages hash to another hash to see if there has been any changes in the site. This method works perfectly for what I need it for because I don’t need to specifically know what the changes were just that there has been changes made. Basically I need to be alerted when the page changes but I don’t need to know the specific changes in the page.
So lets get this tutorial started! First off, what we are basically going to do is grab the page
that we are monitoring and hash it using md5, then we will compare that hash to a set hash of the page from the state we are comparing it to. That way even if one letter or character was changed the hash would be different and therefore would show us that a change has been made. It is really more simple than it sounds
Step 1
Since we are going to need a preset hash for our script to compare periodic hashes to we are going to make a script that will give us a preliminary hash of the page.
<?php
$url = “URL OF THE PAGE YOU’RE MONITORING”;
$page = file_get_contents($url);
echo md5($page);
?>
Now, save that script and execute it, it will give you the md5 hash that you need to insert into this next script in order for it to function properly.
Step 2
Now that we have the md5 hash that we need, plug it into the following script.
<?php
$url = “URL OF PAGE YOU’RE MONITORING”;
$page = file_get_contents($url);
//HASH YOU GOT FROM THE PREVIOUS SCRIPT (STEP1)
$md5_orig = “d41d8cd98f00b204e9800998ecf8427e”;$md5 = md5($page);
if ($md5 == $md5_orig) {
echo “Page content has NOT changed yet…”;
}
elseif ($md5 != $md5_orig) {
echo “Page content HAS changed, go check it out!”;
}?>
This script will basically go out and grab the page content, compare it to a hash of the old content and if it’s changed, it will tell you to go see the changes, otherwise it will tell you there has been no changes in the page.
I understand that this is a fairly simple and primitive approach to the problem, but hey, it works! Now, all you have to do is change the values where it is marked and save the script to your server and you’re good to go! This script will check the page everytime you load the script, an alternative to this is to setup a cron job to run the script and have it save the results to a file or email them to you, this is far more convenient and I highly recommend it!
Well, this pretty much concludes this tutorial, go try it out and leave a comment with your experience or any questions you have and I’ll try to address them ![]()
Great idea but it still doesnt replace RSS or Atom. I using just parse the page with preg_replace its more useful because i can actually see the content.
It’s a nice way to know if the page has changed but no so effecient comparad to RSS or Atom. Thanks for sharing it
The solution here was not meant to be as efficient as RSS or Atom, that’s why RSS and Atom was created
The script was created as a solution for sites that don’t have RSS or Atom feeds. I created it originally to track a page on GoDaddy for changes so I would know when they updated it. Since it is a static page and isn’t in the mainstream RSS feeds.
@Robkid - I also have used preg_match() in conjunction with this script so I could display the content that was changed in the update, I may post that script here in the next couple days
Finally. I love this explanation. I’ve been looking for something like this.. Thanks.