Your blog and Google (Sitemaps)Tuesday, August 30, 2005
[Edited 31 August 2005:] For the short version, see this post!
If people search Google and your blog post is spot on what they're looking for, of course you want them to be able to find it (and they'd want to find it too!). But that depends on Google indexing your blog in the first place, and also how comprehensive, accurate and up to date its index of your blog is.
Getting on Google: the basic way is to submit your blog URL to Google - but only your top level URL, not individual posts; and don't get impatient and keep resubmitting it, they won't like it! Also if others link to you, and they're crawled by Google, then you're more likely to end up being indexed by Google. Posting on forums with your blog URL prominently in your sig (as text, not a graphic) also might help!
Until recently, there wasn't an easy way to "ping" Google to let them know your blog had recently been updated with new posts, and request to them to re-crawl your blog. You were dependent entirely on Google getting round to it, and I certainly found it was recrawling my blog maybe once every week or two.
When did Google's spider last visit your blog?:To find out, go to Google and in the search box type (without the quotes) "cache:yoururl" where yoururl is the URL of your blog, e.g. I would type "cache:http://consumingexperience.blogspot.com", and run the search. The results should say at the top when Google's bot last visited your blog, or rather the latest version of your main blog page in Google's cache which stores their indexed pages. Annoying the Blogger navbar blocks this info after a split second, but you can find it by viewing source, then search for "as retrieved on" (without the quotes) and it'll be the date just after that phrase.
Well last month Google introduced Sitemaps (in beta), which enables webmasters and bloggers to inform Google's crawlers when their sites have been updated. As Google say in their help, "By using Sitemaps to inform and direct our crawlers, we hope to expand our coverage of the web and speed up the discovery and addition of pages to our index. If your site has dynamic content or pages that aren't easily discovered by following links, you can use a Sitemap file to provide information about the pages on your site. This helps the spiders know what URLs are available on your site and about how often they change."
How to submit your blog sitemap to GoogleAt first sight this all seems a real palaver: you have to create a sitemap in a supported format and upload it to your site's server, then submit the link to Google, and update the map whenever your site changes. Look at the stuff on Google about creating a sitemap, and your eyes cross. Or at least, mine do...
But there's an easy way to benefit from sitemaps if your blog or site has a newsfeed. It so happens that RSS 2.0 and Atom 0.3 feeds are, while not their preferred format, still accepted by Google. So, all you have to do is submit your feed's URL.
Here's how to do it:
1. Go to https://www.google.com/webmasters/sitemaps/ and login - if you have a Gmail account for your blog, use that; if you don't, then obviously you need to register first.
2. Under Add Site, enter the URL of your blog and OK. Against your blog URL, click Add a Sitemap, choose type Add General Web Sitemap. In the list that appears, tick all the boxes:
- I have created a Sitemap in a supported format.
- I have uploaded my Sitemap to the highest-level directory to which I have access.
- My Sitemap URL is:
Your feed URL: if you're on Blogger it will be http://yourblogurl/atom.xml, e.g. mine is http://consumingexperience.blogspot.com/atom.xml (I use Feedburner but didn't know how Sitemaps would react to their smartfeed so I thought my raw Blogger feed, which is Feedburner's source feed anyway, would be safer).
That should be it. As Google say, "When you initially submit a Sitemap, the status [that's the Sitemap Status column] displays as Pending. Once Google processes your Sitemap (which may take several hours), this status will change to either OK or to an error. If you receive an error, click on it to view additional information." My page looks like this:
Verifying your blog[Edited 1 Sept 2005:] After it's done the first re-crawl, under your site name there's a link labelled "verify" which asks you to upload a verification file to your server. If you can't upload anything (which is the case with us Blogspot users), don't worry about it.
Verification just gives you access to more detailed Google stats; the lack of it won't stop Google from re-indexing your blog at all - see the Sitemaps help (but consider lobbying Blogger (the Suggest New Feature box at the bottom) and also Google Sitemaps to allow what we need - tell 'em you want Blogspot users to be able to have full Sitemaps functionality including verification!) [Added 28 April 2006:] As from 26 April 2006 you can now verify your Blogspot blog via a meta tag - see this post for a howto and why you might want to verify.
(An oddity: Sitemaps says my blog was last downloaded 3 hours ago. Yet the cache says it was retrieved 25 August i.e. nearly a week ago. What gives? Maybe the cache takes a while to be updated with the download. Also odd, it's been saying "3 hours ago" for the last few hours!)
When you update your blog or siteYou can log back in to Sitemaps and in the Sitemaps tab tick the box against the feed of the blog which has been updated, then click Resubmit selected, to get Google to recrawl your blog.
Or, you can ping Sitemaps. One way is to enter the following URL into your browser address bar, changing the URL to your own and making sure you change the colon (:) to %3A and any forward slashes (/index.html) to %2F (it looks like gobbledygook but this URL encoding is what Google request), and hit enter. (To make it easier for Blogger/Blogspot users I've filled in most of the details for you so you only need to change "yourblog" to e.g. truckspy, or chickybabe, or whatever is the right one for your blog):
Or enter your feed URL in this box:
Or else use this bookmarklet (on what bookmarklets/favelets are and how to use them and edit them, see this post, scroll down a bit), again making sure you edit it first by changing "yourblog" (or more if necessary) so that it refers to your own blog feed's URL:
(PS. I've written this post for those of us who don't have access to our blog servers to upload any files apart from our posts. Non-Blogspot users and users of other blogging platforms who host on a separate server can by definition probably figure it all out themselves anyway! But Blogger users who host on their own servers rather than Blogspot may want to take a look at Stephen Newton's hack.)
Technorati Tags: blogs, blog, blogging, Blogger, Blogger.com, searching, search engines, Google, Google Sitemaps, Google index, Google crawler, ping Google, Improbulus, A Consuming Experience, Consuming Experience