During Google’s most recent Q&A-style session, titled English Google SEO office-hours from August 13, 2021, Google’s John Mueller fielded a number of questions, both from in-person attendees and people who had previously submitted theirs.
At the ~32:41 mark, a submitted question basically asked, “Which is better: 404s, or home page redirects?”
Let me give you some background.
Apparently, the person who submitted that question had a site that had suffered from a malware attack. Their full question was:
“What would be a better strategy: having a 404 page for pages that don’t exist, or redirecting any non-existent page to the home page? Our website is recovering from a malware attack in which tens of thousands of pages that were made were redirecting to some shady website. We’re fixing it, but now we probably have 150,000 pages with 404 errors in Search Console.” [The bolding is mine for emphasis. I’ll get to that below.]
That’s a lot of pages, a colossal hassle, and quite a bit to consider.
Muller’s response was: “In a situation like that, I don’t think there’s any big difference with regards to doing a 404 page to redirecting to the home page.”
He went on to say that a redirect to the home page would be considered a ‘soft 404’ page, and would be treated similarly to a 404 page.
One of the Most Important Things to Be Sure of
After a brief pause, Mueller also said, “What I would try to do in a case like this is focus on the most important pages of your website and make sure that all of those work really well and make sure that those are updated in search…”
404s Are Probably Easier than Redirecting
Mueller continued, “…and all the rest…probably 404 is the easiest approach where if you remove those pages it returns 404 by default.”
If you’re a webmaster or know about building web pages, you can see how simple 404s (just removing the unwanted pages) is easier than doing redirects (which presumes that you’d have to manually (or use a software) go to each page and redirect each one individually).
But wait…what about for pages that already existed?
Above, I relayed that Mueller said he’d focus on his most important pages and making sure that those were working. And that makes sense.
But, for a site that’s over 100K pages in size, that might be a lot of work that will take time.
What can be done in the interim?
Pages That Were Made Vs Pre-Existing Pages
Above, I mentioned that I bolded a part of the original question that said that “pages were made”.
From that, I gather that the malware created pages on the site that weren’t there before.
That seems to be the case, but my question is: what about the pre-existing pages? Did those get taken over by malware as well, and were effectively transformed into malware pages?
Ideally, for the pages that existed before the malware attack, you want to get all of those up and running, beginning with your most important pages.
Before I go on, I should mention an assumption I made: I’m assuming (and yes, I know that you shouldn’t assume things) that this site was still a rather large one before the malware attack. There’s nothing in the original question that explicitly implies that the 150K figure is the number of pages that existed before or after the attack.
If the site was, say, only 25 pages before the attack, and 150K after, then for sure, 404 on the created pages might be the way to go.
But let’s say that the site was thousands or tens of thousands of pages in size before the attack, and had a large number of good pages indexed…so many that it’d be hard to rebuild all of them in a short time frame.
In that case, as a temporary measure, on the pre-existing/pre-attack pages (or URLs) that I didn’t have time to rebuild or weren’t my most important pages, I’d just do 301 redirects to either the home page, or a more appropriate page that I rebuilt (like a category or parent page). (While 301 redirects are defined as permanent, I believe that, for SEO, they’re more preferable than, say, 302 redirects, but the choice is yours.)
So, in my view, the 404-vs-redirect issue isn’t necessarily an either-or proposition that applies to every single page of a post-attack site. For pre-existing pages that are yet to be rebuilt, redirects may be preferred. For pages/URLs that were created by the malware, simple page/URL removal or 404s may be preferred.
“…Hopefully You Can Lock Things Down to Avoid This Kind of Situation in the Future…”
That was something Mueller said in the earlier part of his response.
And that got me thinking…
How could this have been prevented?
Did the hosting company have routine back-ups?
Now, I can understand how, for a large, dynamic site, restoring from a backup might not be as simple as it would be for a smaller, more simple site.
I wonder what protective measures were in place, if any, prior to the attack.
This is why, if you plan to, or have, a large site (or a site of any size, really), you ensure that there’s some type of protective and/or restorative measure in place.
I know of at least one hosting company that does daily backups, and keeps each backup for something like…28 days or so.
Or, maybe you can use a software or plugin to make backups of your site.
If you do use backups, be sure you’re well-rehearsed on the site recovery process. Ideally, on a sandbox or practice (non-live) version of your site, you want to practice restoring your site from a backup, so that in a real situation, you’re not doing it for the first time in a panic situation.
There are probably many solutions, but just be sure to use a proven one, because I can only imagine how large of an undertaking it must be to try to restore a very large site.
I’m reminded of the 80/20 Rule, or Pareto’s Principle: in this context, 20% of your pages make up for 80% of the pages that are viewed. (Of course, it might not be those exactly: it could be 10/90, 5/95, or what have you.) A small portion of your pages might make up for a majority of the views/visits.
So, first focus on restoring as many of those as you can, as fast as you can. That way, most of your traffic will not even notice the other pages. That will give you time to use what you’ve learned from this article to make decisions about what to do next.