For years (seemingly, ages in Internet time), there’s been a recurring question: is duplicate content a penalty, or filter?
This question, which seems simple, actually can provoke a lot of misunderstanding and incorrect responses from people.
So, I’m going to try to settle things here.
I’ll start with defining duplicate content.
One definition of duplicate content is this: having 2 or more pages (or URLs) on a website (or domain) that have, overall, the same content. That is, they’re duplicates.
(I know that there can be 2 or more pages that may only have partially duplicate content. That is, content that’s overly duplicate, but not 100% identical. While I don’t want to get into extraneous detail as to what percentage of duplicate content is a benchmark for concern, we have found that there’s a 51% rule.)
Why You May Want to Be Certain About How Duplicate Content Is Handled On Your Site
Some website platforms, such as WordPress, depending on how individual installations are set, may generate duplicate content.
One example is tag pages. On some WordPress-powered sites, tag pages are automatically created, and each of these pages contains blog samples from the many posts within that tag.
(Tag pages shouldn’t be confused with category pages, although in this context, the same process–duplicate content–can take place.)
If your tag pages are set to noindex (which can be done in wp-admin, you should be fine).
Alright, with that in mind, let’s move on to the real subject of this news item.
Last week, Google gave one of its numerous Q&S sessions, and this specific one is the English Google SEO Office-Hours from October 8, 2021.
During this particular session, Google’s John Mueller was asked,
“Does it negatively affect our SEO performance in terms of…these are duplicate pages but, canonicalized…?”
This video is queued to ~ the 2-minute, 35-second mark, which is approximately where the question was asked.
John’s response was:
“No, no… it’s perfectly fine. It’s purely a matter of…the crawling side of things. If we see that they’re duplicate, then we will try to treat them as duplicates and we will just focus on one canonical URL for them.”
Duplicate Content Doesn’t Negatively Affect SEO…at Least, Not If…
…Not if at least one of those pages has a canonical tag.
But…what is canonicalization?
A Short Introduction to Canonicalization
In short, let’s say that you have web pages (or URLs) A, B, and C, all of which are entirely identical.
For canonicalization, what you’d need to do is designate one of those pages as the original, or the one you want indexed. Let’s say you choose A.
So, you put a canonical tag on B and C, referring to A. You’re basically saying, “I want A to be the page that’s seen as the reference for this content.”
Duplicate Content Isn’t a Penalty Per Se; It’s More of a Filter
And that makes sense, right? Google, especially if you use canonicalization or noindex your duplicates, isn’t going to penalize your site. Instead, as described in the canonicalization example above, it’s more likely it’ll use a filtering process and try to look for canonicals (or noindex) to try to come up with one you want indexed.