Duplicate content is the content that can be found in more than one place on the internet. That place can be different pages on your site or on a completely different website.
Yet duplicate content is not always about exact copies. Slightly different content is also considered as duplicate.
This means you cannot swamp product name, company name or location and consider unique content.
For example, these two can be considered identical content pieces by search engine crawls:
- Water evaporates at 212° F.
- Water vaporizes at 212° F.
In this guide, I’ll walk you through:
Why Duplicate Content Matters
Here we have two players: search engine crawlers and webmasters.
If you do not know already, here is the first step of the ranking process.
The search engine crawls visit sites and looks for new pages to index—indexing results showing your site in the SERP.
When you have duplicate content, the crawler does not know which page should be indexed and which one not. As a result, the crawler is confused and does not know which result to show in the search engine results page.
Google tries to cope with duplicate content, and it is not that easy. They state:
“Google tries hard to index and show pages with distinct information. This filtering means, for instance, that if your site has a “regular” and “printer” version of each article, and neither of these is blocked with a no-index tag, we’ll choose one of them to list.”
In simple terms, Google will only show one result of the same page. It will randomly pick the page unless you tell them which one to choose (bear with me, I will tell you how).
As a result, pages that were not intended to rank might appear in search results.
The Impact of Duplicate Content
Duplicate content can have mild to severe consequences. Below are some of them:
Let’s imagine that you have 3 pages on our site that have the same content. The crawler is not sure which one to show in the SERP. As a result, all three pages will suffer, resulting in traffic loss.
The crawler might also randomly pick one of the three pages with lower user experience (Murphy’s Law :D). This will result in an increased bounce rate as people will abandon the website because of the poor structure.
The duplicate content can also affect the ranking of key pages. Because when there is similar content, Google thinks of your whole site differently (E-A-T).
Crawlers cannot index millions and billions of pages from one single site. There is a crawl budget, and you should be aware not to exceed or waste it. If some of your site’s key pages are not getting indexed, you likely have duplicate content that uses your crawl budget.
Now the severe consequences. This is not that common and does not happen to every site. But sometimes Google can penalize your site for duplicate content.
Think of Google as your professor or plagiarism detector in this scenario. If you copy and paste a lot, then you will probably fail the assignment. Here is what Google has to say:
“As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.”
How to Prevent Duplicate Content
There are many ways you can remove duplicate content from your site. Here is what we use:
The most important element in fighting against duplicate content is canonical tags.
The rel=canonical attribute is a snippet of HTML code that tells Google the original publisher of the content and the owner.
Google will consider the page with the canonical tag as the main page, and it will be shown in the google search results page.
In other words, you are telling Google that there are multiple pages with the same content, but the one with the canonical tag is the main one.
You should add rel=canonical attribute to the HTML head of every duplicate site.
Google also considers canonical tags as a better option to deal with duplicate content.
Another way to eliminate duplicate content is redirecting.
Duplicate pages can be redirected to the main page. This way, the main page will receive all the benefits.
If possible, always use 301 redirects. 301 redirects are clues for search engine crawlers that the page has been permanently moved to another page.
Also if all of your duplicate pages are performing, then redirect all the pages to the highest performing page for your own benefit.
If you are running a blog in your site, chances that you might create duplicate content or slightly different content are high.
For example, if you are writing about SEO, you might be tempted to create checklists for the upcoming year. Instead of having separate checklists for three pillars of SEO (off-page, on-site, technical) have one checklist consolidating all the pillars like the one we created for 2021.
Meta Robot Tags
When it comes to technical SEO meta robots are also helpful for identifying duplicate content.
Meta robot tags will tell Google which pages to index and which not.
You should add a “no-index” meta robots tag to the HTML code on your page. Essentially you are telling Google not to show that page on the SERPs.
Some of the duplicate content can be created unconsciously. This is mostly due to structural URL elements because they can affect how search engines perceive URLs.
Usually, a different URL means a distinct page for crawlers.
Below are the most common duplicate URL varieties.
All you have to do is checking which version of the webpage is performing the best and redirect traffic.
Take Care of Duplicate Content Now
Duplicate content is a big no-no. It can pose risks to your site and in some cases, even penalize your site and remove from Google. Make removing duplicate content as your nearest action item to prevent any future hazard.
Comment below how you deal with duplicate content!