Down the duplicate content rabbit hole

Published: May 25th, 2011 By: David Wilding In: SEO Tips

Duplicate content rabbit hole

If you have even a passing interest in SEO, from a business perspective or a technical viewpoint, you have probably heard of duplicate content and it being an issue in gaining good search engine rankings.

And this is perfectly true, a poorly structured website can create its own internal architecture issues and deter search engines from crawling, indexing and ultimate ranking your webpage highly for a target term.

In this article I am going to venture down the duplicate content rabbit hole and show how a normal website, with completely uniquely written content, can create horrendous internal duplicate content issues just though some simple technical mistakes.

So come with me and Alice as we follow the white rabbit down the weird and wonderful rabbit hole of duplicate content…

Canonicalisation

Canonicalisation is when two versions of the website can be accessed on both the www-version and the non-www version of a domain. Canonicalisation is usually a site wide issue which affects every single page within a website.

http://domain.com
http://www.domain.com

With Canonicalisation in place, in the above examples, we immediately have two versions of the homepage available for search engines to find.

HTTP secure

The Hypertext Transfer Protocol Secure (https) provides encrypted communication and secure identification of a website. It is most commonly used for payment transactions.

http://domain.com
http://www.domain.com
https://domain.com
https://www.domain.com

The use of HTTPS on a site wide scale once again can create another two copies of the entire website. In the above example we now have four versions of the homepage.

Index.php

When you visit the root of a domain (or the root of a folder) that server pulls in a default file to show. These files usually have a name like index.html, index.php, index.asp, or default.aspx.

It is often the case though that within an internal linking structure both versions of the page are linked to; both the root version and the index version. This issue is often prevalent on the homepages of websites.

http://domain.com
http://www.domain.com
https://domain.com
https://www.domain.com
http://domain.com/index.php
http://www.domain.com/index.php
https://domain.com/index.php
https://www.domain.com/index.php

We’ve now burgeoned out into eight different versions of the homepage.

Parameter Usage

Parameters add an extra piece of information to URLs. This information is often used to change the content being shown on the page, for tracking, or to save users preferences.

http://domain.com/index.php?para=4589
http://www.domain.com/index.php?para=4589
https://domain.com/index.php?para=4589
https://www.domain.com/index.php?para=4589

Some larger websites use multiple parameters throughout the site meaning multiple different combinations of URLs with and without these various parameters can be generated. The above examples only begin to show the tip of iceberg of the amount of additional pages that

At the very least we have now added another additional four homepages into the duplicate mix.

Session IDs

Whilst similar to parameters in the fact that they append a variable to the end of URL they have the potential to cause even more issues for search engines.

Session IDs will assign a unique ID to visitors to a website. In the case of search engine bots they often receive completely unique session IDs every single time they visit a website.

http://www.domain.com/index.php?PHPSESSID=01elm211kprftyhcliulbgekf5cbys6

If session IDs are assigned to search engines then the search engines see’s a different URL every single time it visits the website, endless duplication.

Uppercase and lowercase URLs

Some websites have URLs that will render on both the uppercase and lowercase version of a URL. This is something that often occurs on Microsoft based servers.

http://www.domain.com/category/product.html
http://www.domain.com/Category/Product.html

Sloppy internal linking, as well as external links, means almost an almost endless number of duplicates have the potential to be generated.

Absolute and Relative Links

These duplication issues can often be compounded by the type of internal linking a website uses.

It is often the case that these duplicate content issues are caused by a single rogue link, for example pointing to http://www.domain.com/index.html or to http://domain.com.

If a website uses Absolute links then the issue can stop there. An absolute link is a link that explicit declares the full address of the linked to page, for example ‘http://www.domain.com/cat/product.html’.

If a website uses relatively links then a single rogue link can cause a search engine to see an entire duplicate of a website. Relative links only declare part of the destination address, for example ‘/cat/product.html’. The web browser, or search bot, simply appends the first part of the current address to this relative URL. So if the page is an HTTPS version of an address the search engine will go on and index the entire website using HTTPS as the beginning of each URL.

Relative links are used extensive by many websites.

Wasted link equity and wasted opportunities

As well as damaging the performance of your website in search engines, this duplication also dilutes your incoming link equity; links are crucial to gaining top ranking positions.

Exactly how many external links do you think are out there pointing at the incorrect URLs within your website?
Exactly how much search engine authority you are wasting?
Where could your rankings be if you resolved these issues?

These are all questions marketing managers and web developers need to be asking themselves and addressing today if they want to compete in search engines.

So, how many homepages do you have?

So does your website suffer from any of the above problems? If so these issues are likely affecting the ability and willingness of search engines to crawl your website and rank it highly for your target terms.

Your rankings, traffic and ultimately the number of conversions your website is generating is being affected by YOUR duplicate content issues. Not to mention, the WASTED link equity.

Need more help or advice on this or any other SEO issue? Get in touch with Blueclaw today. 0113 234 3300 [email protected]

The causes of duplicate content outlined within this document are just those I can think of off the top of my head. Do know any others? Disagree with anything I’ve said in this article? Let me know via the comments form below.

Search Engine Optimisation Services - SEO Company UK

Down the duplicate content rabbit hole

Canonicalisation

HTTP secure

Index.php

Parameter Usage

Session IDs

Uppercase and lowercase URLs

Absolute and Relative Links

Wasted link equity and wasted opportunities

So, how many homepages do you have?

Posted In:

No Comments

Leave a Comment

Welcome

Recent Posts

Recent Comments

Search Engine Optimisation Services - SEO Company UK

Down the duplicate content rabbit hole

Canonicalisation

HTTP secure

Index.php

Parameter Usage

Session IDs

Uppercase and lowercase URLs

Absolute and Relative Links

Wasted link equity and wasted opportunities

So, how many homepages do you have?

Posted In:

No Comments

Leave a Comment

Welcome

Recent Posts

Recent Comments

Blogs We like