Communicating with the Googlebots: Go Fetch!
This post’s title kind of makes it seem like we’re going to talk about communicating with and domesticating aliens or some other type of foreign creature. But no, we’re not talking about some frightening, otherworldly species. We’re talking about a creature that is far more complex, far more fundamental to our well-being as a virtual people, and to the successes of our businesses, but yet far less creepy-crawly than your average critter: we’re talking about the illusive Google spiders, also known as Googlebots.
Fetch as Googlebot
Yesterday, Google announced that there is a new feature available for communicating with the Googlebots called “Fetch as Googlebot.” While the ability to be able to direct the Google spiders to crawl URL is not new, this new updated version allows you to skip the beginning of the crawling process and jump right to requesting indexing.
Image courtesy of Google Webmaster Central blog. The “Fetch” feature is located in Google Webmaster Tools and will be useful for webmasters who need to submit new or updated URLs for indexing (though not every URL is guaranteed to be indexed). Once a webmaster fetches a URL, he or she has the option to submit it to the Google index – then, Google will quickly crawl and consider the URL for indexing.
When to Fetch
This new method may be useful, but it is also limited. Google will still use all of its old processes and methods of finding URLs and considering them for its index. The fetch feature would be best used to submit a URL that needs near immediate indexing; use it for immediate concerns, like, for example, let’s a site needs to update its content as soon as possible because it is advertising an event and the information on the page being used to do so is incorrect. This is a situation that is perfect for utilizing the fetch as Googlebot feature, because Google says that it will work to crawl and evaluate the sites submitted for indexing via fetch within a day.
What Happens When Google Plans to Crawl a URL?
Once Google finds a URL or one is submitted for crawling, the URL is added to a crawling scheduling program. Then, Google places the URLs in order of priority (based on such factors as Page Rank and how often content is updated and/or changed) and crawls pages according to that priority list. With millions upon millions of sites out there, Google cannot crawl and/ or index every URL that is submitted or found. Where does these URLs come from? They are discovered via the following methods:
- Submission of XML Sitemaps
- Creation of Requests
- Plain Old Discovery
- Usage of the new Fetch as Googlebot feature
These are the ways that Google becomes aware of new or updated pages. Before we discuss these a little more in-depth, let’s discuss crawling and indexing.
Crawling is defined by Google as the “process by which Googlebot discovers new and updated pages to be added to the Google index.” Using an algorithm, the Googlebot is directed by computer programming what sites it is to crawl, how many pages it is to fetch, and how often it is supposed to do so. Googlebot crawls sites to keep the search engine’s index updated and ensure that there are no issues that may affect a site’s ranking, such as duplicate content; in extreme cases Googlebot can find content that must be removed from the Google index altogether (this can happen if a site doesn’t follow Google’s quality guidelines, which can be found in Google’s Webmaster Tools Help section).
By Google, indexing is described as the process by which Googlebot compiles a “massive index of all the words it sees and their location on each page.” Google adds, “In addition, we process information included in key content tags and attributes.”
Now that we have covered what purposes crawling and indexing actually serve, let’s get back to exactly how Googlebot becomes aware of URLs (other than the new fetch feature, which we discussed above).
Submission of XML Sitemaps
Submitting a sitemap is the classic way that webmasters can submit site pages for crawling. The XML file lists a site’s URLs and metadata (update date, change frequency, importance) about them.
This is where you can submit your XML sitemap via Google Webmaster Tools.
XML sitemaps give Googlebots the information they need to crawl pages (this does not guarantee indexing, but it does allow for more efficient or “intelligent” crawling).
Anyone can request that a URL be crawled in a form now called “Crawl URL.” In fact, up to 50 requests by one Google account user are allowed per week. While these can be useful, they may not be your best option (if you are a webmaster), since these public requests don’t require you to be a verified site owner (so, if you notice a site where something is broken or missing, you can report it here).
With this is mind, Google may not be as inclined to carry these requests out as quickly as others that require that you are a verified site owner for submission of requests (like the fetch feature or XML sitemaps).
When Googlebot crawls one page, it also adds any other URLs on that page to its crawling list. In 2009, Google also added RSS/Atom feeds to its discovery process. Since Google is constantly looking for new content, this type of discovery works well for the search engine. Webmasters can allow Google to crawl their RSS or Atom (by ensuring it is not disallowed by their sites’ robots.txt) feeds, and therefore allow Google to quickly find new content on their sites. This speeds up the crawling (and hopefully indexing) process.
Hopefully, in addition to the traditional methods of URL submission, the new fetch as Googlebot will speed up the crawling and indexing processes for sites with an urgent need to update outdated content or get new content out there.
EverSpark Interactive is an Atlanta SEO agency. For more Google and SEO related news, check out our blog for daily updates, follow us on Twitter @EverSparkSEO or visit our Facebook page.