Duplication, Aggregation, Syndication, Affiliates, Scraping, and Information Architecture

Hello, friends. Ready for some more hot-and-heavy liveblogging? I hope so because we’ve got a LOT more coming your way. Up next is a more technical session that I will do my very best to keep up with for ya.  Speaking we have Brian Cosgrove, Vanessa Fox, and Matt Heist.  I have a feeling this is gonna be a doozy so let’s just hop right in.

Vanessa starts off by saying good morning.  Then she spots Barry Schwartz in the audience and asks if he’s liveblogging.  He asks why he’d be here if he wasn’t. Hee. Ouch. Those Third Door kids are so crazy. Also, hi, Vanessa, I’m here, also. It’s cool. I don’t need any recognition. Really.

Annnyway. First up is Brian. Worth noting: it smells like onions in here. I half expect my mother to walk out with a meatloaf any minute now.

Brian’s up.

  1. Thin Content
  2. Low Quality
  3. PageRank dilution

He calls these three very big problems on the Web today. There are a lot of reasons they happen but his belief is that it’s not necessarily technical know-how that’s causing them, it’s process oriented.

The Challenges

Feeds from suppliers

  • Products
  • Real Estate Listings
  • Travel Listings
  • Deals
  • Other Feeds

Too many catetories

  • Men’s Hats
  • Brown Hats
  • Men’s Brown Hats
  • Cowboy Hats
  • Men’s Brown Cowboy Hats
  • Straw Hats
  • Men’s Brown Cowboy Straw hats

Too many similar items:

Nearly identical items are better arranged as options on one page.

All of this leads to mindless content across the Web. Everyone has seen it. Copies and copies of these listings that are written in all different voices, they’re not consistent, they’re not clear, they don’t speak to the site that they’re on or that site’s audience.

Writing unique content is the cost of entry for SEO. He understands that a lot of people find that challenging, but what he recommends is that you focus on the category pages where those feeds move into and work on developing clean and unique content there.

He’s going to present us with a Toolset for Success.

SEO Strategy

  • Define SEO opportunity related to business goals.
  • Categories of terms and relative volumes
  • Competitive analysis
  • Paid search and social coordination plans
  • High level tactical plans: link acquisition

Keyword Mapping

Maps URLs to Keyword and Keywords to URLs. It can be a bit daunting but start building out your map. It will make your life a lot easier and provide clarity in your SEO direction.

Content Strategy

  • Define content needs as it relates to the business
  • Should have integrated SEO content needs
  •  Describe categories of content needed
  • Quantify the amount of needed content
  • Provide timelines and goals for content production
  • Define the teams and roles involved

Work flow

Define your workflow for the content development process.  For him it looks something like:

SEO Research – Deliver Creative Brief – Research and Write Article – Reviews (SEO, legal, editorial) – Publish

Style Guide

  • Reiterate brand values and site values
  • Define the Web’s Voice, Tone and Style
  • List out quality guidelines
  • List out Legal guidelines and considerations
  • List out generation SEO considerations
  • Add amendments and updates often
  • Keeps this document ALIVE

Brian said the fact that they have this document is the only reason they’re able to outsource content. Makes sure that when they bring a new writer on board they know exactly what to expect. He considers it a contract or an agreement.

Content Calendar

  • Is a prioritized queue of content being developed
  • Include exact dates for assigment through publication
  • Contains a description, length and timing for content being assigned

He talks about sites with clear voice and branding and mentions Woot, who always has pretty awesome product descriptions. They’re not rich from an SEO perspective, but they’re interesting.

Next up is Matt. He’s going to talk about his personal experiences with duplicate content. I wonder if there will be tissues involved. GOD, I HOPE SO!

Matt starts off saying he’s the CEO of HighGear Media. And that CEO means “overhead”.  Heh. Word, Matt. Word.

His company is one of the leading content publishers focusing on the automotive vertical. They do it by having 7 full time writers on staff and working with freelancers. They produce 1,200 pieces of premium content monthly.   They own and operate 4 of the leading “in market” and segment focused sites (as well as 3 opinion blogs).

Pre-Project Focus

Founding Thesis: 100s of sites with niche, passionate communites AND original content contributors. Potential content contributors want to full service sites.

Exectuion:

  • Launch each one with with thousands of pages typically found on full service sites.
  • Have it all be original content: 7 Full time writers, 20-30 freelances
  • Syndication: On-Network (small sites receive original HGM content from more established sites). Off-Network (license HGM content to branded media publishers looking for automotive content).

They had all this great content but they were careless with how they distributed it. That was one big mistake they made.

Out of that, they had 107 sites, many of which looked the same. They were essentially competing against themselves in search.

The Good:

  • 25 sites (out of 105+) with solid original content
  • Passionate audience around ~10 of their sites. Traffic concentrated on key sites.

The Bad:

  • Too many sites without clear audience segments
  • Undifferentiated look and feel between sites.
  • Over-sharing of our own original content on our sites

The Ugly:

  • They were hit with the Panda update.

What did they do? Basic blocking and tackling

  • Elimated (301 redirected) non-core sites (105+) down to 7 sites (4 core sites and 3 blogs) build around specific target audience segments
  • Properly canonicalized HGM product duplicate content
  • Aggregated content with strong user engagement was kept BUT no-indexed – notwithstanding revenue impact
  • Unique expert reviews (on top of traditional expert news, analysis, opinion) written for each target segment site
  • Challenging from a P&L perspective

Redesign

Balance content between news/opinion/analysis and hardcore reviews. Audience based around differentiation; create engagement around content; reviews written with voice for targeted segment.

FamilyCarGuide – Family and Safety oriented news and reviews

MotorAuthority – Luxury, Performance

Summary Learnings

  • Fewer/larger sites helpful on several forums. Journalists like writing for larger brands and advertisers want to speak with larger audiences
  • Differentation around content AND design matters; design is not “fluff”
  • While costly, original content was HGM’s asset all along BUT must be disciplined redistributing content on owned and operated properties. Tough decisions around costs are required; keep original content/cut elsewhere.
  • Knock on wood: search traffic trends are positive
  • With the evolution of social, premium content that is authoritative and fresh will flourish.

Next up is Vanessa.

A Task Force has been set up in the federal government to clean up the 24,000 Web sites they own. Vanessa is working with this task force for a few months. Because that’s how Vanessa rolls.  With the government of the United States!  But it’s a mess. Because everyone wants to launch new Web sites for new policies. They’re not thinking about their audience and what they want to accomplish. Instead, they’re creating a giant mess.

One of the teams she worked with was the department of education.  You would think it’d be super easy for students to find out about student aid since it’s all online — but there are 14 different Web sites. Because every time there’s a new policy about it, someone  creates a new Web site.  She flips through all the different sites to show how absolutely confusing it is. You have all these different sites and the gov’t didn’t know what to do.

How do you fix this problem?

She told them about personas.  Think about your audience and what tasks they need to accomplish. When you start to think like that you realize what site architecture you should have and what kind of gaps that you might have. What you want to be able to do is come up with an information architecture where you land on your home page and you can get into a nice structure anywhere on the site, both up and down and on a peer level.

She shows an example of About.com. They have 60,000 results about [counting calories]. That signals a duplicate content problem. You probably don’t have THAT many unique pages on counting calories.   That’s when you start to cluster what pages you have and what’s going on. At that point, you can look at what the quality is. The mapping will also show you where the gaps are.

That’s the methodology that she used with the department of education. They did a site search and looked at all the different things they had and clustered the pages by the topics. She used a combination of keyword research and the assets already on the site and mapped it to make it more manageable.  It doesn’t mean its an easy process.

If you take syndicated content from others and its in a subdomain separated from unique content – does this negatively reflect your unique content?

Yes.  This is the thing with Panda, is that typically things have been assessed on a page-by-page basis.  but what’s happening with Panda is that your Panda Score is a site-wide evaluation. They’re looking at a bunch of different signals site-wide, being how much content is unique vs duplicate. If it’s found that based on that signal the overall quality of the site is below what they want, it will affect the whole site.

Share this post

About the Author

Lisa Barone

Lisa Barone co-founded Outspoken Media in 2009 and served as Chief Branding Officer until April 2012.

Get social with Lisa at Twitter

One thought on “Duplication, Aggregation, Syndication, Affiliates, Scraping, and Information Architecture

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Comments links could be nofollow free.