Duplicate Content Solutions & The Canonical Tag

June 2, 2009
By Lisa Barone in Internet Marketing Conferences SEO

Now that we’ve gotten the bugs out, it’s time to get technical! Watch as my head spins. Actually, Stephan Spencer is on this panel which means I may actually throw up from the knowledge overload he’s famous for throwing out. Get those motion sickness bags ready, this should be fun!

Alex Bennert is moderating a panel with some of my personal search favorites , including Adam Audette, Nathan Buggia, Jordan Glogau, Maile Ohye and Stephan Spencer.  They’re turning down the Wallflowers, so let’s hop in. Also, Adam Audette has a fresh hair cut. Which means he’s like about eight and a half years old. And I want to be his best friend.

Up first is Jordan Glogau.

What is the Canonical Link Element?  It’s used to flag duplicate content. It’s an HTML tag embedded in the head section of a Web page. It’s treated as an internal 301 redirect. It’s supported by the four major search engines.

What is it good for? It’s best purpose is to revitalize internal discussions about site architecture. Uh, great. An HTML tag to make people talk!  Once implemented it’s a confidence builder to truly correct site design flaws.

Case study: 1800flowers.com

Background: It’s a very old Web site. They have a redirect which takes you to canonical URLs and use level one load balancing. They don’t have a single URL. The Robots redirected to the canonical URL and leaded PageRank.  No one knows code anymore so they’re stuck in their current process.

Six weeks before Mother’s Day they added the canonical tag. It was an easy fix because the canonical URL was the same as the one in the CMS. They just had to swipe the code from the robot farm. It helped them with long tail traffic.

Results: By Mother’s Day they had seen a 20 percent increase YOY in organic search. It firmed up the commitment inhouse to move from level one load balancing to level three load balancing, and it directed traffic to product and category pages.

Case study: Eyeglasses.com

They rushed a revised Web site and simply cloned their old site, and therefore, all of their old duplicate content problems. Obviously wasn’t the best way to do it. They went in and added the canonical tag.

Over the past week they’ve begun seeing some very positive results. Their brand URLs are ranking better, they’re getting the Google indent and they saw a 15-20 percent increase in rankings for a number of the brands.

Up next is Adam Audette.

It’s exciting times, he says! There’s a new gun in town – the canonical tag. He starts humming the theme to Bonanza and we all giggle. He comments that this is his first attempt at adding humor into this presentations. Hee! So far so good, Adam!

The link canonical tag can be used in a lot of ways. So many that it’s a tad confusing. It’s great for duplicate content but has the potential to break things, too.

  • Good: Easy to implement, appears to work
  • Bad: It’s the “poor mans 301”. Powerful + New/Untested can equal Epic SEO Fail. The canonical tag is NOT a replacement for fundamental URL structure. It doesn’t solve duplicate content. May be better to redirect.

Adam goes over a few ways sites can use the new tag:

Zappos: They have an internal tool called the ZFC (Zappos Flux Capacitor) that does fancy stuff with URLs. Adam suggests deploying the link canonical tag on plurals, faceted, sorted URLs and subdomains.  [He shows us a page of socks as an example, only we all giggled cause it sounded like he said “a page that sucks”. There’s lots of sock/sucks banter for the rest of his time. Search marketers are hilarious. :) ]

Ticketmaster: They have a lot of issues with URLs. They can use the canonical tag on Artist pages, main category pages, etc.  Google has lots of versions of Ticketmaster’s page on Madonna (as they SHOULD). They need to consolidate the PR. It’s better to 301 all those to the master, but while that’s happening — add the canonical tag.  Same applies for their [baseball tickets] page.

Google Directory: Google has four different versions of the Google Directory. Adam suggests they use the canonical tag there to redirect to the main one.

Next up is Stephan Spencer. I’m crying already. Stephen is known for killing livebloggers.

The canonical tag includes your sitelinkes in Google. It’s about recovering leaked PageRank. If the page is allowed but meta robots noindexed, it also passes PR. Thankfully, when obeyed, the canonical tag aggregates PageRank.

Tools for Collapsing Duplicates

  1. The Canonical Tag: Great new addition to the SEOs arsenal, but not your best weapon. The tag works best when  its used in concert with other signals. It’s a hint to Google. Don’t rely on it.
  2. 301 Redirect: More absolute. No followed links aren’t even used for discovery by Google.

PageRank Leakage Scenarios

  • Robots.txt disallows the duplicate page = PageRank is leaked to the duplicate and it can show up in the SERPs.
  • Meta robots noindex the duplicate page = PageRank is leaked but won’t show up in the SERPs.
  • Rel=nofollow on the links to the duplicate = PageRank can still accumulate through other links and it can still be indexed.
  • Meta robots nofollow the duplicate = PageRank that accumulates on that page can’t be passed on.
  • XML Sitemaps file only includes the canonical version = only used as a hint, sups still be indexed.

[I’m going to hope that all made sense to you…because it definitely did not to me. Thanks.]

Stephan spends some time talking about the limitations of the canonical tag. He notes that it doesn’t work across domains, though cross-domains are supported.  He also shows a bunch of examples of duplicate content and the canonical tag in action…but I am dumb. And it all goes way over my head. The examples are also pictures, not text. Liveblogging Fail.]

Duplicate Content Issues and Fixes

Pagination

Excessive pagination dilutes “crawl equity”, causing numerous pages of product listings to not get crawled. Reduce the number of pages in the pagination system to improve. Consider disallowing “view all” links and forcing spiders through subcat pages. Display as many products per page as possible within the 150k file size. Fewer products per subcat = few pagination pages to crawl at subcat level for max product indexation.

Faceted Navigation

Faceted navigation provides clickable product inventory breakdowns by brand, color, price, etc. Doing so also creates a huge number of permutations for the spiders to follow. Problem is exacerbated with clickable, resorted column headings. Nofollow all links leading to low value facets.

Affiliate URLs

Rarely do they help your SEO because they’re 302, not 301. Run affiliate programs inhouse, use 301s and or canonical tags. Third party affiliate solutions have a vested interested in not playing ball.

Click-Tracked URLs

He offers up how to 301 static URLs with a tracking parameter appended to its canonical equivalent. And if you think I know how to blog code….you must be new here.

Distance yourself from the thin affiliates. Augment with substantial amount of unique, valuable content. Use customer reviews. Don’t use mashups with Wikipedia, Twitter and the usual suspects.

“Uniquify” content. It’s not sufficient to shuffle the page’s content around. Think about overlapping “shingles”. Do NOT use the same titles and meta descriptions!

Click-Tracked URLs
He offers up how to 301 static URLs with a tracking parameter appended to its canonical equivalent. And if you think I know how to blog code….you must be new here.

Legacy URLs
What would the lookup table for the above look like?
1001/products/canon-g10-digital-camera
1002/products/128-gig-ipod-classic

Distance yourself from the thin affiliates. Augment with substantial amount of unique, valuable content. Use customer reviews,. Don’t use mashups with Wikipedia, Twitter and the usual suspects.

“Uniquify” content. It’s not sufficient to shuffle the page’s content around. Think about overlapping “shingles”. Do NOT use the same titles and meta descriptions!

Nathan and Maile didn’t present but were on hand for the Q&A.

SEO Drama Alert: A debate broke out mid-session when Matt Cutts got involved about whether or not nofollow is still effective. Of course, as soon as it got hot, all search representatives got very tight lipped about who said what and what they really meant. As far as I could, Matt Cutts did NOT say that they ignore nofollow, but he DID hint that it is less effective today than it used to be. Nathan from Microsoft alsooff-the-cuff mention that if you use nofollow as a way of PR sculpting and they feel it’s not beneficial to the user — they’ll adjust the algorithm.

Very, very interesting words. If you were in the session, I’d love to hear your thoughts on what went down, how you interpreted it and what you think is or is not true. Let’s get some debate going.

On another note: That session melted my brain.

Reputation Management
Reputation Management

Google on Manipulating Search for ORM

on Oct 22 by Rhea Drysdale

On Tuesday, Search Engine Roundtable covered the WebmasterWorld thread, Create more content to bury negatives? Fair game says Google, in…

Online Marketing
Online Marketing

How to Build Agency & Client Trust

on Dec 10 by Sean Stahlman

As search marketers, building trust and relationships should be second nature, it’s vital to what we do for our clients.…

Online Marketing
Online Marketing

Creating Kickass Infographics On A Budget

on Feb 7 by Danika Atkins

Every so often, I’m tempted to turn infographic sightings into some sort of drinking game. But then I realize the…

HomeInternet Marketing BlogInternet Marketing ConferencesDuplicate Content Solutions & The Canonical Tag
^Back to Top