“What Do I Need to Know About API Calls and Crawling?” Google Responds

Application Programming Interface (API) calls are counted toward your crawl budget. Here are 2 things you should know.
SIA Team
October 15, 2021

NOTE: What’s discussed here may not encompass the entirety of what you may have to know about APIs, particularly in your situation. However, when Google’s John Mueller was asked about the topic, he mentioned a few things. 

This article relays what he said. 

Google frequently hosts what are called Office Hours, which are basically Q&A sessions that are recorded. Webmasters, SEOs, and entrepreneurs are welcome to these sessions, and they can ask questions pertaining to various aspects of Google Search.

During the English Google SEO Office-Hours from October 8, 2021, there was a question about API calls:

“I have a website that connects to APIs on the client-side to get data. Are those URLs being included in the crawling budget? If you disallow those URLs, would that create any issues?”

The video below is queued to the point where the question was asked:

The First Thing to Consider: Are APIs Included When Your Page Is Rendered?

John’s response was: 

“So I think there are two things here. On the one hand, if these APIs are included when a page is rendered, then yes, they would be included in the crawling, and they would count towards your crawl budget, essentially, because we have to crawl those URLs to render the page. 

You Can Block Them

“You can block them by robots.txt if you prefer that they’re not crawled or not used during rendering.

“Totally up to you if you prefer doing that. Especially if you have an API that is kind of costly to maintain or takes a lot of resources, then sometimes, that makes sense.

The Second Thing to Consider: If You Disallow Crawling

“The tricky part, I guess, is if you disallow crawling of your API endpoint, we won’t be able to use any data about the API returns for indexing. So if your page’s content comes purely from the API, and you disallow crawling of the API, we won’t have that contact. That’s kind of the one aspect there.

“If the API just does something supplementary to the page, like maybe draws a map or– I don’t know– a graphic of a numeric table that you have on a page, or something like that, then maybe it doesn’t matter if that content isn’t included in indexing. 

How Does The Page Function When API Is Blocked? Consequences?

“The other thing is that sometimes, it’s non-trivial how a page functions when the API is blocked.

“In particular, if you use JavaScript and the API calls are blocked because of robots.txt, then you have to handle that exception somehow. And depending on how you embed the JavaScript on the page, what you do with the API, you need to make sure that it still works.

“So if that API call doesn’t work, and then the rest of the page’s rendering breaks completely, then we can’t index much because there’s nothing left to render.

“However, if the API call breaks, and we can still index the rest of your page, then that might be perfectly fine. So those are kind of the options there. 

“I think it’s trickier if you run an API for other people, because if you disallow crawling then, you kind of have this second-order effect that someone else’s website might be dependent on your API. And depending on what your API does, then suddenly, their website doesn’t have indexable content.

“And they might not even notice because they weren’t aware that suddenly, you added a disallow there.

“And that might cause indirect effects. But that’s, ultimately, all up to you.

So, those are a few things to consider: API and crawl budget, the potential consequences of blocking Googlebot from API, and if you run an API for second- or third-party users/software.

Source: Google Search Central YouTube channel