In the Google SEO Office Hours last February 18, a user asked if having a pdf and the same content in an HTML blog article on the same site, is considered duplicate content. Will having the same content but delivered differently, have a negative impact on the site?
John Mueller responded that they would not see the PDF file and the HTML blog post as duplicate content because they are different types of content.
One is an HTML page while the other is a PDF.
Even though the primary piece of content there is the same, the whole thing around it is different.
With regards to any negative effect, the most difficult that can happen is that both can show up in the search results at the same time. Though whether you would like that to happen or not, is more of a strategic question.
For Mueller, he does not see it as a negative when it comes to SEO, but maybe for the user, there are strategic reasons to only have either the PDF or the HTML page show up in search.
They can compete with each other in the search results though. For the most part, PDFs are less visible just because they are less tied in with the rest of the website.
It would really be up to the owner of the PDF or the HTML page is the preferred page to get indexed and ranked or even both.
You have the option to set a canonical to the page that you would prefer to rank and to also no-index the page that you would not like to rank.
Great insights on different content types and duplicate content. Perhaps SEOs can come up with a hack making use of such, in dominating the SERPs. What do you think?
For more details, check out the SEO Office Hours episode at: