Google has now added new details that explain the three categories its Google crawlers fall into, they include Googlebot, special-case crawlers and user-triggered fetchers.
In addition, Google now lists a JSON formatted file containing the list of IP addresses each of these different crawler types use.
Types of Google crawlers. At the top of this Googlebot page, Google listed these three crawler types:
- Googlebot – The main crawler for Google’s search products. Google says this crawler always respects robots.txt rules.
- Special-case crawlers – Crawlers that perform specific functions (such as AdsBot), which may or may not respect robots.txt rules.
- User-triggered fetchers – Tools and product functions where the end-user triggers a fetch. For example, Google Site Verifier acts on the request of a user or some Google Search Console tools will send Google to fetch the page based on an action a user takes.
IP addresses. Google also listed the IP address ranges and reverse DNS mask for each type:
- Googlebot – googlebot.json (crawl-–––.googlebot.com or geo-crawl-–––.geo.googlebot.com)
- Special-case crawlers – special-crawlers.json (rate-limited-proxy-–––.google.com)
- User-triggered fetchers – user-triggered-fetchers.json (–––.gae.googleusercontent.com)
What is new. Here is the section of the page that was updated; the rest of the page is mostly unchanged.
Why we care. I believe Google made this change after they saw some of the reactions to the GoogleOther robot they announced the other day. This now explains how Google crawlers act, when they respect the robots.txt and how to identify them better.
Now, if you want not to block Google’s main crawler, Googlebot, but you decide to block the others, you can better identify those crawlers more accurately.
The post Google explains the use cases for its different crawler types appeared first on Search Engine Land.