Of course, we all know that to be indexed in the Google search engine (which is a precursor to being ranked), your site needs to be able to be crawled by Googlebot.
There are a few factors that determine the likelihood that Googlebot will crawl a sufficient number of pages, and those factors make up something called crawl budget.
During the English Google SEO Office-Hours from October 22, 2021, John Mueller, who’s a Search Advocate at Google, addressed a question about crawling.
A participant asked John (and I’m paraphrasing):
“Recently, the crawl request on my company’s site has dropped nearly over 90%. We checked all aspects according to the google official doc like robots.txt…and we also [checked] other possible technical factors that [have caused a] sudden drop in crawl requests.
“What else do you recommend we also check?”
The video below is queued to the ~7:25 mark, which is when this question was asked.
Crawl Demand and Crawl Capacity
John started his response (which I’ve paraphrased and cleaned up for clarity):
“So…it sounds to me like…our systems have trouble accessing your content quickly enough, so when it comes to the number of requests that we make on a website, we have two kinds of things that we balance.
“On the one hand, the crawl demand, which is how much we want to crawl from a website, and assuming this is a reasonable website then the crawl demand usually stays pretty stable. It can go up if we see a lot of new content; it can go down if we see…very little content, but usually, these changes are very kind of slow over time.
“And the other side is the crawl capacity. This is how much we think the server can support from crawling without causing any problems, and this is something that we evaluate on a daily basis, and it can react quite quickly if we think that there’s a critical problem on the website.
“So, for critical problems, we think of things like server errors. If we see a lot of server errors, if we can’t access the website properly, if the server speed goes down significantly (so, not not the time to render a page, but the time to access the HTML files directly).
“And those are kind of the three aspects that play into that and, if, for example, the speed goes down significantly, you would see that in the Cross Stats Report in Search Console…”
And, If Google Thinks That It’s Contributing to Your Site’s Issues…
So, to be sure, I don’t think that Googlebot can cause these issues on your site. Instead, perhaps the seed of the issue was some limitation with your site (like your server reaching near its capacity), and then, the appearance of Googlebot pushed it beyond, resulting in a slow down.
But with that in mind, John said, “Yeah and uh that’s something where…if we think we cause problems from crawling too much, we will scale that back fairly quickly.”
(I think he’s talking about scaling back the crawl frequency, which makes sense if your server is having trouble responding.)
The person who asked the question then followed up with a confirmation:
“Oh, I see. So the response time is highly relevant and it is highly related with the call request.”
John replied: “Right. Yes, correct.”
4XX (400-Level) and 5XX (500-Level) Errors
The Q&A continued. John was asked, “Do you think 400- and 500-level response codes may reduce the crawl rate?”
John’s response was:
“500-level errors, definitely. Those are server errors, which we would see as being potentially problematic. 400 errors are less problematic, because it’s basically content [that] doesn’t exist, so we can crawl normally.
“So if a page disappears, that’s no problem. If it has a server error, that is a problem.”
Or, to restate my understanding of John’s responses: So, if there’s a 400-level issue, that’s no problem (as far as crawling is concerned). If there’s a 500-level issue, that is a problem (as far as crawling is concerned).
There were a few more questions that were asked, which were basically questions that went further into this topic. You can hear them in the video mentioned above.
If you’d like an additional resource, there’s a Google document titled, Large Site Owner’s Guide to Managing Your Crawl Budget, which contains a lot of the key information that is the basis of what you read here.