Monday, 1 February 2010

Track changes to a webpage by feed or email - Google's "follow changes"






How to monitor or track changes to a web page (e.g. a webpage listing new publications, what's new, press releases or announcements, news) if the website doesn't provide a feed you can subscribe to? (for non-techies: see my introduction to feeds).

If there's no newsfeed, which is the case with too many websites, most of us wouldn't bother to keep checking back to see if the page has been updated.

Google's "Follow changes"

Luckily Google have introduced a very helpful new feature for their online feed reader Google Reader, so you can still subscribe to changes for a webpage on a site which doesn't provide a feed or proper auto-discovery, and get alerts.

How do you track changes to a webpage?

You can "add" the webpage's normal URL (web address) to Google Reader (button on top left of Google Reader):

Paste in the webpage's URL (as an example I've used http://www.consumerdirect.gov.uk/news/press_releases/london/ which doesn't provide a newsfeed), and click Add:

If the site doesn't offer a feed or doesn't provide proper auto-discovery, Google offers to generate its own "feedified" version of that page for you, thus enabling you to follow changes made to that page:

Note that it will do so only if the page is in normal HTML in English, and not a Flash page or in a frame (though hopefully there will be improvements in future e.g. supporting more languages). And it won't track changes to a page if the website owner has chosen to block Google Reader follows by adding certain code to their page.

Changes to the page show up as a new feed item in your Googlified feed for that webpage, see illustrations below. You can add the new feed to a folder, rename it etc in the usual way.

You click on the item title (i.e. the "Generated feed for…" link) to go to the changed webpage.

Example: the UK's Office of Fair Trading, who have a lot of consumer-related responsibilities, don't provide an RSS or Atom feed for their press releases page. Using Follow Changes, I can be informed through my feed reader whenever new press releases are issued:

What's more, I can get those changes by email too, if I want - see below for a tip on how to do that.

What changes does Google track?

From my testing (through editing this web page on different days), the feed item won't display the entire webpage tracked. It shows:

  • additions


  • amended sentences (with the changed bits shown in bold in the feed item), so you can see what's changed.


However:

  • it doesn't flag deletions (just additions - e.g. the screenshot below doesn't note the deleted line, just a new line I added at a dfferent location to mention the deletion)


  • it will only shows snippets (cut off) if there have been more than just a few changes.

What if someone else has already added the webpage?

Then Google doesn't offer to create the feed for you, it just adds it automatically to your Google Reader. That's pretty efficient, it must just generate one feed for every webpage requested, and serves it up to all users of Google Reader who asks for it.

Just as, with a "normal" feed, Google must effectively just fetch it once for all subscribed Google Reader users (though they must store several copies on their own servers), rather than fetch it afresh for every user.

How often does Google Reader check for updates?

As far as I can tell from my limited testing, it's checked at least 2 times a day, at about 9 am GMT, noon GMT - so perhaps it's 3 hourly?

How to get changes to a page by email

Here's a little trick for using Google Reader's new "follow changes" feature to receive your tracked webpage changes by email, rather than feed.

The advantage of email of course is that it comes to you, whereas with a feed you have to remember to go and check it - unless you have your feed reader open all the time, but then you still need to go look at it. (Maybe an add on to make it sound a "Ping!" when a new feed item arrives? I use a Chrome extension Google Reader Checker that shows in the Chrome toolbar the number of unread feed items, myself.)

Anyway, here's how to get Google Reader follow changes by email, free:

  1. Subscribe to the page so Google creates a feed for it, as mentioned above.
  2. While viewing the generated feed (go to it by clicking the "from … Google feed by Google" link in the feed item, if you're not already there), click the "Show details" link on the right:


  3. Copy the Feed URL link http://whatever (it doesn't look like a link, but it is, so you can rightclick it and copy):


  4. Now go to FeedMyInbox (a free service) and paste the feed URL in the upper box, and the email address you want notifications sent to in the lower box, and hit Submit.
  5. You'll get an email asking you to click a confirmation link, so do so.
  6. You can also create an account with FeedMyInbox by clicking another link in their first email, so you can confirm new subscriptions direct without having to do the confirmation link thang. More convenient.

And that's it, you can now get your Follow Changes notices by email.

How could Google Reader's "follow" feature be improved?

Before Google Reader's new feature was introduced, I used veteran tracker service WatchThatPage (see their FAQ), which also lets you subscribe to webpages and get email notifications of changes (or view the changes on a single webpage). I chose email as again you have to remember to go to the webpage to see the changes, else.

That service is free, though if you use it for business or ask them to watch a lot of pages they say they want a fee. I hear that lots of organisations use it for business current awareness. WTP don't seem very commercially minded though, as I know enterprises who've requested info on fees in order to pay them, and they've never responded. (Personally I think they should just put up a list of fees based on the number of times they have to check a week for a particular user (number of pages x frequency of checking) and provide Paypal or other payment details, and I think they'd pull in a decent amount. But there we go.)

There are some very helpful aspects of the WTP service, which monitors any http (but not https) web pages:

  • the user can choose exactly how often and when a particular page should be checked, and use "channels" to get separate emails for selected groups of webpages
  • it helps eliminate false positives - web pages that seem to have changed, but not in substance - by letting users report "page problems". E.g. where the only change on a page is the current date. (I've not used Google's Follow Changes long enough to know if it's smart enough to ignore those kinds of changes yet)
  • similarly users can report if it the alert says everything has changed when only part of a page has changed, and they adjust things accordingly.

Opportunities for Google to appeal to enterprise users

Google Reader could be really valuable for enterprise users to keep up to date with business-related information, what information managers and librarians call "current awareness", but oddly enough Google don't try to tout its use for that. They should, especially in conjunction with the ability to get Google Alerts as a feed.

Google should include Google Reader in Google Apps, which it markets to organisations, and develop Google Reader's features with the aim of making it more enterprise-friendly.

For example, with paying enterprises:

  • timing - let users choose if they want Google to check certain followed pages more often than the standard, e.g. 5 times a day at specific times set by the user, like WatchThatPage do
  • timezones and date stamps - let users set timezones for Reader, so the times shown on feed items are correct - and in addition clarify how timestamping on feeds works; it should show the time the feed item was received by Google (or the time the page was updated), rather than the (variable) time the individual gets round to opening the feed item. In business it's often useful or indeed essential to know the timing of things
  • adapting to problems - provide a feature like WatchThatPage's to enable false positives to be eliminated e.g. ignoring changes in the current date, and similarly false negatives
  • follow WatchThatPage! - basically just see how WatchThatPage does it (they've had years of experience), and do it at least as well. Buy them out, whatever...

I have other thoughts on how Google Reader (and other Google products) can be made more enterprise-friendly, but I'll leave that to another blog post.

3 comments:

jordisan said...

Hi;
I'm trying to develop an application similar to what you describe in this post. It's www.favoritious.com , and I'm trying to integrate different systems (including Delicious).

You're invited to use it (remember it's an alpha version); any feedback will be welcome :)

Anonymous said...

Does Google Reader still offer this? I just tried and didn't get the "we can create a feed for you" message, instead seeing "your search did not match any feeds"

Anonymous said...

As of November 2010 Google removed this feature. Its frustrating because this is exactly what I needed.