What exactly is Duplicate Content?
To put it short: Duplicate Content is content, that can be found online on more than a single website, meaning it can be accessed via multiple URLs.
Why is it bad?
It’s hard for search engines to determine which content is the best match for a specific search query. Should there be multiple websites with the same or very similar content, Google will make sure not to show all of these in its result pages. It will rather try to determine which page is the original source and thus granting it a higher relevancy. This can be bad news for a website, as all pages with duplicate content will have to share relevancy score and consequently will lose significance and ranking positions.
As soon as the Google Bot visits a website it will index all content visible to it, just like it should be doing. As long as duplicate content elements only appear on your own site the bot will try to figure out whether it is dealing with spam or necessary legal information in a web shop that have to be place on every single page, for example. In such a case, there is no penalty for duplicate content to be expected. On the other hand, there will most certainly be negative effects on a page’s rankings, should the bot find any content that has knowingly been copied from external sources or is placed across multiple pages in a spam like manner.
Different Types of Duplicate Content
Duplicate content can have its origins internally, on your own website, or externally, on a different site.
Common causes for internal duplicate content:
- URL parameters: Various parameters for click tracking, analytics codes and such, that are added to a URL can in some cases trigger duplicate content warnings.
- Session IDs are often a reason for duplicate content penalizations. This happens, when each visitor to a website is assigned a specific session ID that is visible in the URL.
- Print versions of a website that are not excluded from indexing.
- When a website is accessible with and without www or http and https or via different domains (e.g. dedicated mobile pages).
- After moving a domain, while the old website is still indexed.
- Different regional versions of a website using the same language, that are not correctly marked as such (e.g. us, uk, ca, etc.).
- Outdated SEO Strategies like creating multiple identical websites for each city district that all offer the same content and services.
A common reason for external duplicate content warnings is copying content from other websites. This phenomenon ranges from web shops copying product descriptions from the makers‘ websites to plain plagiarism.
How do I find out if my website has a duplicate content problem?
A free and efficient tool to check your website for duplicate content is www.siteliner.com. The tool points out all content elements that pop up multiple times across a website
The Solution to Duplicate Content
There are various ways to mark up a page that uses duplicate content so that it won’t see any negative effects in search engine rankings.
- 301 Redirect via .htaccess: A 301 redirect (a permanent redirect), leads from a duplicate page (multiple URLS for a single page) to the original one. This way there is no competition between pages and the original source is awarded a higher relevancy which has a positive effect on rankings.
- Rel=“canonical“: A different way to solve the problem are Rel=“canonical“ tags. This way the same amount of link juice as with a 301 redirect is transferred to the original source, while being easier to implement. The tag has to be added to thesection of the duplicate page and indicates to search engines where to find the original material. We also recommend adding a self-referencing canonical tag to the original page, just to be 100% on the safe side.
- Noindex, follow: These tags can be added to pages you want excluded from the search engine index. Doing so allows the search engine bot to crawl all of the page’s links while not indexing its content thus avoiding the creation of duplicate content.
- ModRewrite: By adding a „ModRewrite“ to the .htaccess file you can make sure your website is only reachable with or without www.
- Multi-regional and multilingual versions: A language version is marked by a hreflang=“x“ in the header section, making it distinguishable and avoiding duplicate content.
When it comes to external duplicate content there is not much to say: “adopting“ and plagiarizing content that isn’t originally yours is, for obvious reasons, always a bad idea.
As you see, there is a solution to each and every kind of duplicate content that should help you to get the rankings your page and its content deserves.