The Spam Police

March 10, 2011
By Lisa Barone in Internet Marketing Conferences

Hey, hey, it’s time to talk about spam!  Maybe we’ll even break out the Kevlar vests and get into the copying/bullying scandal that erupted between Google and Bing not so long ago. We’ll have to see.  Right now up on stage we have Matt Cutts, Sasi Parthasarathy, and Rich Skrenta. Let’s go.

Ooor not. Danny just about starts the session when One Of Us starts playing in the session room.  We all “ooo!” and decide to start the session 3.5 minutes late so we can enjoy it. Hee.  I love this song. Danny says maybe we’ll transition it into what if Google was one of us. Search people are crazy.

Danny starts off saying if you’re asking whether or not you’re spamming Google, you’re probably not. Heh. You probably don’t even understand the depth of spam. True words.

Up first is Sasi.

What is spam? A spam page is one that uses one or more spam techniques to inflate its rankings in the search results, in a way that adds no value to the user. At Bing they try to neutralize the effect of spam on a search ranking so that all the available content is on a level playing field. At Bing, we take both manual and algorithmic action.

Search engines, at a very high level, care about two things when ranking:

  1. Content of the page
  2. Links pointing to the page

Spammers target the content or the links to gain an unfair advantage.

High Level classifications

  • Page-level spam
  • Link-level spam

Spam Techniques

Page Level

  • Keyword stuffing
  • Parked domain: Some of them are harmless. But a lot of sneaky parked domains actually have typos of popular sites and fill them with ads.
  • hidden content: You think search engines are really stupid, but they’re not. Hiding content filled with keywords to promote the page is bad and THEY SEE YOU. You can be hidden texts or links. Hidden links on a page can be hacked.  People do it via tiny font or invisible text.
  • machine generated content: content is stitched with keywords taken from query logs to rank high for those words. No logical meaning when trying to read the content on the page.
  • redirect spam: Search engine crawler sees content filled with keywords while the users are redirected to a different page from search results. Can be hacked.
  • Hijacked sites: Part of the site is hijacked to host adult content or sell products. EDU sites and social discussion components are the ones commonly hit.
  • leading to Scareware: Target spiky queries that follow popular events. Hijack part of popular sites to hack and get popularity for free. Stuff the hijacked page with keywords targeting the latest events – search engines see this. Redirect to scareware pages when clicked as a search result.

Link Farms

Pages of little useful content but that link to other sites and usually the sites link to link back to the farm page.

Paid Links and Link Exchanges

A spam-oriented link exchange occurs whens completely unrelated web sites reciprocally exchange links. This is done to inflate ranking. All links intend to manipulate search engine results will not be counted.

Re-inclusion in Bing

Use content inclusion request option from the above.  He shared a bunch of Bing blog posts on spam. I can’t copy them down fast enough. Sorry. Use Google. Er, I mean Bing. Use Bing.

Next up is Matt Cutts.

We’re going to walk through  a rogue gallery of spam. Oh goodie. This’ll be fun to blog.

He shows us the home page for a small hotel with a giant block of white space at the bottom of the site.  Yeah. Turns out it’s all hidden content. Matt says its okay to put content ON THE PAGE. You don’t have to hide it.  We talk a bit about cloaking – which is when you show different information to users that you show to Googlebot.   He shows some gibberish pages and some pages that use Markov text, which, basically, is just more gibberish.

He gets a lot of link exchange requests which he says is a bit like asking a cop where you can get some good drugs. Hee.  If you’re sending a link exchange email, have the decency to customize it a little bit. You may as well say, “I am an annoying person”.

Google talks a little bit about paid links more than some search engines, but if you ask anyone on this panel, they’ll say paid links are something they disapprove of.  The best links are the ones that are freely given, that are editorially given.  Come up with angles, hooks and stories that make people want to link to you. Don’t just pay because then you’re trying to skew the playing field and it’s not level for anybody. [Beeecause it was before? UNICORNS!]

Hack sites was the bane of Google’s existence in 2010. They put a lot of resources into that area.  The best thing you can do is make sure your server is patched. A few days delay in patching your software can cost you an incredible amount of headache.

Panda Feedback

  • User feedback: extremely positive
  • Feedback from an SEO:  “Oh and congrats on the Content Farmer update. In general, I think it was fantastic. We finally saw a site of ours that had been making revenue off syndicated articles lose some serious traffic, which it rightfully deserved to lose. “Matt congratulated the very self-aware SEO. Heh.

He talks about the new chrome extension that lets you block sites. Matt says more than 120k people have installed the block extension in Chrome.

RIGHT NOW, you can also block sites that you don’t want to see. If you click on a site, don’t like it, and hit back – you’ll get an option to block that site so you never see it again.

Webspam Team

There are engineers who write algorithms and manual people.

More transparency re: parked domains, link buyer, link seller, etc. Register your site with Google Webmaster Tools because if they think your site is spamming, they’ll send you a message to give you a heads up.

If you need to file a reconsideration request, go to google.com/webmasters.  Once you send it in, if it’s something where your site is only affected by an algorithm than that’s not something they’ve taken manual action on so it’s closed out automatically.  If you’ve been affected by manual action, that’s when the reconsideration process kicks in. Typically it’s processed in about a week. They’re thinking about are there ways  we can send more information back. Like, there was never a problem with your site. Likewise, if you haven’t corrected it or you have corrected it.

Next up is Rich.

They have a feature called slashtags which will let you put modifiers on search queries. Blekko created 100s before launch, users have created another thousand. Yesterday they blocked 1.1 million sites from its index.  What they’ve done is they came up with an algorithmic technique that paired thin content with aggressive behavior in online ad networks. People asked if they’re censoring the Web? Well, search is editorial. People make algorithms and they change them on a daily basis. Translation: Blekko has iron balls. I love it.

For them, spam isn’t just hacking. It’s about the quality of content. There’s spam in the real world. People in San Fran would tell you the phone book is spam. If you go to Las Vegas, they hand you spam (and DISEASES) on the street.

What is a search engine? It’s a top ten list on anything you ask it about.  Millions of people are trying to rank. It’s like a race.

  • Winners: 3
  • Top 10
  • Unmentioned: Nearly everyone else.
  • Disqualified?

There are rule. You can get ejected from the game if you ‘bulk up’ inappropriately. What can get you disqualified?

  • Pay offs/undisclosed relationship
  • non-experts
  • sweatshop labor
  • too slow
  • too aggressive promotion
  • bad conduct
  • unpopular

They refuse the right to refuse service to anyone. They’re doing this to clean up the Web.  GET SOME!

Danny: One of the challenges of saying we want to get rid of things not written by experts, you have to get rid of everything written by journalists everywhere. That doesn’t seem like a good definition of spam. Good content doesn’t have to be written by expert,

Rich: He was talking from Claire from the New York Times (NAMEDROP! ;)) and he told her there were a handful of exceptions. About.com actually has guide written by subject matter enthusiasts. Journalists have a code. When you break the code, you lose your job. There’s not that many of them. You can make a list of all the About.coms and you don’t get to one million. There are maybe thousands of that caliber and then there are hundreds of millions of things that you really don’t want to send anyone to.

Danny: Why did you need th NYT to do a paid link story on JC Penney. Couldn’t you figure out that by yourself?

Matt: What the NYT article said was here’s a case where someone had repeatedly bought links. JC Penney says it wasn’t them, but there was certainly some link buying going on. It had showed up on their radar 2-3 times. After they’ve taken corrective action, then maybe you escalate and take stronger action.  It is interesting that over the past few months, regular reporters are realizing that if they talk about SEO they’ll get a lot of attention.

Do you recommend big brands hire SEOs so they can throw them to the wind when they’re caught buying links. Hee!

Matt: No. We recommend you do awesome things.

Aaaand on that note, we’re done. Go grab something to eat. I’ll see you back here a little later. If you’re not hungry, maybe go give Blekko a whirl.  They may the small guy on the market, but you have to respect the chutzpah.  Maybe they’ll help clean up the Web. Or at least give Google a reason to get off its ass and make it happen.

SEO
SEO

Is Google Moving In The Wrong Direction?

on Dec 11 by Lisa Barone

It’s been one hell of a week for all of us. I’m exhausted and I need a nap. I do…

Social Media
Social Media

If There Could Only Be One: Twitter or Facebook?

on Aug 27 by Lisa Barone

Pop quiz, Hot Shot. There’s a bomb on a bus. You’re a small business and you only have the resources…

Content Strategy
Content Strategy

12 Flavors of Content To Attract Your Audience

on Jul 6 by Lisa Barone

There aint no hiding now, kids. Hubspot spread the word that there are now more companies blogging than there are…

^Back to Top