Before I get into this topic, I think I should make a distinction: there’s a difference between crawling and indexing. A page that’s crawled may not necessarily be indexed. That is, Google may come to a web page and crawl it, but that web page may not be indexed in Google’s search results.
All indexed pages have been crawled.
With that, let’s dive in.
Google routinely holds Q&A sessions where a Google representative, usually John Mueller (Search Advocate). John answers questions that webmasters, business owners, and SEO professionals may have pertaining to Google search and website performance.
“I’ve been waiting for my pages to get crawled for more than one month. I’ve tried to improve my overall pages outside of those non-crawled pages. What would your advice be to improve crawling on a website?”
John replied, “It’s really hard to say without knowing the website itself. Usually, this is something that falls roughly into the category of crawl budget.”
Crawl budget is based on the idea that the number of pages on the Internet is practically limitless–so large a number that Google can’t possibly index all of them.
So, Google has to limit what it crawls, and for each website, particularly large websites (for example, over 1,000,000 pages), Google might not crawl as many pages as the website owner would prefer.
The article, What Crawl Budget Means for Googlebot, and the guide, Large Site Owner’s Guide to Managing Your Crawl Budget, are 2 resources you can refer to learn more about crawl budget.
Crawl Capacity: One Part of Crawl Budget
Getting into aspects of crawl budget, John continued, “…essentially, there are two sides that are always at play there. On the one hand, [there’s] the capacity for crawling, like how much can Google crawl. If this is a very small website, then probably, we can crawl everything.”
Crawl Demand: The Second Part of Crawl Budget
About crawl demand, John said, “…and then there’s a crawl demand: how much does Google want to crawl, and the crawl demand is something that you can help us with. [You can use] things like internal linking to let us know about your relatively importance of pages.”
Also, Google has a way of determining which pages of your site may be the most important. On this, John said that it’s “something that we can pick up over time by recognizing, ‘Well there’s a lot of really good and important content here, so we should perhaps be investing more time and more crawling here.’”
Can You Be Creating Too Many Pages?
John brought up something that was interesting: “If you’re looking at a bunch of pages that you’ve created and they’re just not being indexed, my hunch is that maybe you’re creating too many pages, and you should be focusing on fewer pages and making sure that they’re better first, and then at some point, when you realize that the pages that you create…actually get indexed fairly quickly, then go off and create more pages. So first, improve the quality and then as a second step, go off and kind of like grow your website step-by-step.”
Another Thing to Keep In Mind
It’s obvious, but can easily be overlooked: be sure that you’ve allowed search bots to crawl your page.
In WordPress, with some plugins, there’s an option to not allow bots to crawl individual pages/posts. This option can be good while one is drafting a post, so that if somehow, the post gets published before it’s ready, it doesn’t get crawled.
However, I’m sure some have made this mistake: their post looks great, is ready to go, and they click Publish. But…they’ve forgotten to uncheck the box that disallows search bots, so if Google comes to that page, it won’t crawl or index it.
In conclusion, if you find that your pages aren’t being crawled, focus on quality content, use internal linking to link to your important pages (that haven’t been crawled), and ensure that there are no technical obstacles in the way of Googlebot (such as a noindex).