Duplicate content is a rankings killer. By publishing duplicate content you confuse search engines as to which page should appear in the rankings and you introduce problems with link dilution where different people may point links to different pages that actually feature the same content. In this lesson you will learn how to ensure that Google rewards you for your original content and learn to better control the paths that users take to find and interact with your content.


Duplicate Scraped or Aggregated Content

Another big duplicate content problem that can harm your website is when your site features content that is duplicated around the web. This can happen when your site is set up to:

  1. Automatically aggregate content from external sources.
  2. Allow Users to publish content that may have previously been published elsewhere.
  3. When your content gets scraped or aggregated by another website, and their website gets indexed by Google before your site gets indexed by Google. Making them look like the originators.

The first thing you must do to address this problem is become aware of it. Use http://www.copyscape.com to find out if your pages are featuring content that has duplicate content issues with other content found online. Simply go to CopyScape.com, enter the URL of your page and it will list duplicate content pages from around the web. If no results come up, then your content is 100% original.

If you are scraping and aggregating content without adding significant value, you may want to rethink this aspect of your web strategy as Google has introduced numerous updates over the past few years that have been aimed specifically at hurting sites that engage in this practice. Consider adding your own editorial opinion instead of just aggregating other peoples information.

Also, make sure that your site is getting consistently crawled by Google by submitting and updating your XML sitemap and using Fetch as Googlebot through Webmaster tools:

https://www.google.com/webmasters/tools/googlebot-fetch

This will get Google to index your site immediately.

Duplicate Page Titles or Descriptions

This is very confusing to search engines. Especially when the same titles or descriptions are used on pages that feature different content. What this typically means is that either your site is automatically creating this problem through some templating issue, or you have been making very sloppy mistakes.

The first key here is finding the duplicate content. Google Webmaster Tools and Moz.com Analytics are both great solutions for finding these issues on your site.

Duplicate URL Structures (Canonical Issues)

One of the most common types of duplicate content is when every page on a site is published multiple times under slightly different URLs. For instance:

  • http://example.com
  • http://www.example.com
  • http://www.example.com/index.php

How To Fix Duplicate URL Structures:

The first thing to do is to make sure that you are using a rel="canonical" link in the header section of every webpage. This tag should be configured to point to the desired version of the URL. The canonical tag looks like this:

<link href="http://www.example.com/canonical-version-of-page/" rel="canonical" />

While the canonical tag is a good first like of defense against these types of issues, the best method to fix this problem depends on the platform the site is built upon. Most CMS systems have well documented solutions addressing this problem. Typically this issue is best addressed by setting up a rewrite rule in the .htaccess file of your site. While many CMS systems will have a simple configurable setting available to the site admin, this is also easy to accomplish by simply adding (or even just uncommenting) the appropriate following rewrite rule based on the URL version that you want.

For PHP based sites, typically, this can be addressed by creating a .htaccess file in your domain root, and adding the following: (To choose one option, simply uncomment each line of code in the chosen option)

Rewrite www. to Non-www. Version or Vice Versa

Use the canonical link element in the “head” section of every page to point to the chosen page version. This is where you add the following code to every version of the page, with the canonical link pointing to the version of the page URL that should get all traffic and back link credit.

  • # If your site can be accessed both with and without the 'www.' prefix, you
  • # can use one of the following settings to redirect users to your preferred
  • # URL, either WITH or WITHOUT the 'www.' prefix. Choose ONLY one option:
  • #
  • # To redirect all users to access the site WITH the 'www.' prefix,
  • # (http://example.com/... will be redirected to http://www.example.com/...)
  • # adapt and uncomment the following:
  • # RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
  • # RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
  • #
  • # To redirect all users to access the site WITHOUT the 'www.' prefix,
  • # (http://www.example.com/... will be redirected to http://example.com/...)
  • # uncomment and adapt the following:
  • # RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
  • # RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]



The Canonical Link Makes Your Page Stand Out As Original Content...

That Is Super Important!


Next In the next lesson we will take a look at a lot of awesome URL rewrite rules that you can start using by adding these code snippets to your .htaccess file. These are great