What’s Really Important For Technical SEO

March 9, 2011
By Lisa Barone in Internet Marketing Conferences

Welcome back my friends. It’s time for a session on technical search engine optimization which means if you hear whining from where you are, don’t worry, that’s just me trying to keep up and understand what they’re saying. Sign. Why do I do this to myself? I have no idea.

Up on stage we have Vanessa Fox moderating friends Greg Boser, Jonathan Hochman, Todd Nemet, and Brian Ussery. Hello, all. Oh, it seems Vanessa is also presenting. And she’s deciding she’s going first. She brings up her slides and notices she’s spelled her own name wrong on her Twitter account. Technology, 1; Vanessa, 0.

We start.

The Importance of Technical SEO: A Case Study

We’re going back to a site Vanessa analyzed at SMX East back in October. There’s lots of code showing Googlebot unsuccessfully trying to fetch site maps. A large percentage of their crawl allotment was spent trying to crawl site maps so there wasn’t much time left over to crawl the URLs. There was no way to find this out other than through the server logs. Because their site has a dynamic set up, every one of their site maps changed all the time because you’re always adding new listings, causing the bots to keep fetching. To fix this they built static sitemaps for everything that stayed the same.

A new site analysis was done yesterday and found that Google can now fetch the actual listings on the site. Yey. Progress. In October 2010, Google spent 27 percent of their time crawling site maps. In March 2011, they spent just .05 percent. Their traffic has doubled. This is why SEOs and Vanessa are awesome.

Next up is Greg. He heads up organic strategy at BlueGlass. This is the first time I’ve ever seen Greg B0ser deliver a PowerPoint presentation. I can’t even process this. He swears he’s done it before. I’m not buying it.

Technical SEO: Understanding CPR

Core Concepts You Need to Understand

Proper Prioritization is the key to success – every site has tons of things that could/should be fixed. Know where to start.
Proper top-level analysis is critical – in order to properly prioritize, you need to have a thorough understanding of the “big picture” before you start.
Google is no longer “page” focused – The days of Google determining what will or won’t rank primarily based on page-level analysis are gone. Overall content performance is the key. Google is looking at your site as a whole, so YOU need to start looking at your site that way, too.

Content Performance Ratios (CPR)

Taking the time to understand how your content is performing will help you determine where to start.

Questions you need to answer:

What is the ratio between total pages indexed and the total number of pages generating current organic traffic?
How do those numbers break down based on landing page type and content topic?

If Google was his engine, Greg would think less of a site if they continually fed him a high percentage of garbage they’d never show. We have to assume Google is doing the same thing.

Breakdown Down Your CPR

Total URLs visited: 537,000
Total visits: 215l
Total URLs: 17,445
CPR= 3 percent

Google has decided to actually show somebody 3 percent of the total pages that have been indexed on the site. That is terrible and is a very bad thing. If you look at some of the footprints of the sites that got hit in the Panda update they probably fit this characteristic.

Break it down further

Browse URLs: 16,676
Browse Visits:67,884

Category URLs: 109
Category Visits: 6,754

Product Detail URLs: 12l
Product Detail Visits: 90k

Pagination URLs: 490
Pagination Visits: 2,421

These are the kinds of things you need to know before you start making technical tweaks so you don’t make things worse.

Follow up with linking metrics:

Total URLs indexed: 537,000
Total Visits: 215,273
Total URLs: 17,445
Total URLs with External Links: 3,365

Those numbers show a pretty poor distribution and they’re going to tie into the other numbers.

What the Data Tells Me

Definite duplicate content issue
We’re irritating Google – making them work too hard to find the content
Poor architecture focus – not enough torso or head traffic
Possible poor external link support for torso/head

This data maps out his plan for how he’s going to use canonical or noindex/nofollow to sculpt the architecture and make it clear to Google where the pages are that are most important. He can also see there’s going to be an issue with links at the top category level. And he can get someone working on that while they’re figuring out the best way to do others. When you walk through that process it will pinpoint where you need to go.

Where We’re Going to Focus First

Trim down the total indexed content – shooting for an initial goal of a 30 percent CPR
Exploring external backlink structure
Further analyze site structure to determine effectiveness on top-level category support
Build an initial action plan based on those three items
Deal with page-level items after this work is done

Use the information, hammer out an action plan and don’t move on to the page level stuff until a plan is in place. If you take the time to do that, things will jump out at you that are painfully obvious that you never seen when you just go through the site thinking up ideas. Make it about analytics.

Vanessa congratulates herself on getting Greg to make Power Power slides. Greg tries to redeem himself saying it wasn’t him he made it and they have a graphic designer for that stuff. Hee! :)

Next up is Todd.

Evaluating Technical Architecture

What can we evaluate about the technical architecture? You can analyze your network. Looking at the GSLB, local load balancing, DNS, etc. There’s Web access logs, HTTP headers, domain register, etc.

Network Analyists are very confident. You have to ask them questions.

Interviewing a network engineer

Is your local balancing round robin?
How does the server do health checks? Are there any reverse proxies in your configuration?
Do you do any URL rewriting
May I have a sample of your web access log files?

We can ask those questions, but we can also check ourselves. We don’t have to wait for him (or her!) to do it.

He looks at server latency. Do 10 real quick grabs of the home page and time it.

Isolate the network latency. They can see a slow network, packet loss. You’ll want to talk to a network, engineer.

[He’s showing all the codes on how to do this but, um, yeah, I’m not a robot, people.]

Check for duplicate sites and shut them down to clean things up.

Go to Robtex.com – they mine DNS information.

Web Access Logs Analysis

If a browser goes to a Web site, that gets written in a log file. He shows a typical log file. Yup, looks like keyboard mashing to me.

Figure out when and how often you’re being crawled
Referers: What links are bringing actual visitors
URL Path: Where is this crawler spending its time
HTTP Response: Am I redirecting correctly? Errors?

Nine By Blue Web Log Parser

Bot activity
Hierarchical View
Query Parameters
Reverse DNS
HTTP Response codes

He shows a lot of screen shots of actual logs…but again…real person, not robot. This is where a ticket to SMX actually comes in handy.

Many more areas to investigate

Cache control headers
Domain health
Page level analysis

Next up is Brian.

Types of SEO

Architectural
Linguistic
Reputational

Brian says that’s not true because the other two don’t exist if you don’t have architectural. Take, THAT!

Path to Indexing

URL discovered via links/sitemap
Time allocated for crawling URL
Accessible unique content
Don’t block the bots and remove obstacles

Hosting Issues

Hosting can actually have a big impact. Most hosts don’t know much about search engine optimization.

Hosting Obstacles: 403 Errors

He says Google Webmaster Tool is a great place to go for information

Hosting Obstacles: Robots.txt

The host can actually block access to your Web site.
User-Agent Switchers don’t switch IPs

Hosting: Speed

Crawl efficiency is very important to search engines and to you. The more efficient your page is, the less time it takes to crawl, the more pages you have that get crawled. You can find this information in the Crawls Stats section of Webmaster Tools

DNS: This is the time taken for the DNS lookup of the hostname
Connect: This is the first phase of the http GET request when the TCP/IP connection is setup by the remote server
First byte: This is the time from when the last byte of the http GET request is sent until the first byte of the response header is received
Total: The time from when the http GET request is started until the last byte of data is received
Server efficiency: Compress files and be sure your server supports If-Modified-Since
Response time: Be sure your server responds quickly

How? Netcraft will go into detail every month and has complete ratings,. He calls it a very helpful resource when looking for a host.

Unique Content vs Duplicate Content

http://site.com
http://www.site.com
http://www.site.com/default.aspx
http://site.com/default.aspx

He uses the Google store as an example

If you go to the http://googlestore.com it redirects you to the .aspx page instead of the root. If you look at the Google Store in a text browser you’ll find two links – one has the pound. The Google Store is actually two different Web site. The US site isn’t okay, the UK site [http://google-store.com/] however is a mess. He shows a canonical issue between different versions of the same site that Google doesn’t seem to have figured out yet.

Front End Speed

80 percent of load time comes from the front end. He shows a waterfall chart. There are a lot of great tools you can use. Google measures page speed and site performance difference. page speed is the amount it takes the site to load. Site perfomance is the time it takes for the page to load + redirects. Forty percent of people will leave a site for good if it takes more than two minutes to load. Dude, who’s sticking around for two minutes? I’m not.

Technical SEO Checklist

Host access
Host crawl efficiency
Provide a clear path
Robots.txt
Unique content
Use Google Webmaster Tools

Next up is Jonathan. His says his presentation is going to be dirty because there’s lots of details and code. I’m basically never blogging a technical session ever again. I’ll just nap instead.

Why Details Matter

Staging Server Mischief

Be careful when copying files from stagging to live. Contents of their live robots.txt file was there for five yeras. oops! Never put a robots.txt file on your staging server because it may go live by accident. A better way to protect a staging server is to require a password via .htpasswd on Apache

Duplicate content in CMS and ecommerice systems

Examples:

Some OS commerce configs have funky session IDs in the URL parameters. You can download a module that fixes them.
For WP, the All in one SEO pack

Running out of crawl time or stage

If your site has millions of pages, code optimization should be a high priority strategy to get more pages indexed. Good code is often five times shorter than average code.
Watch out for infinite URL spaces, such as calendars.
If all your pages are indexed, this tactic might improve the frequency of indexing
Submit sitemap.xml to Google/Bing to get accurate feedback on how many of your pages are indexed

Sitemap.xml file

<500 pages, see XML-Sitemaps.com
>499 pages, download GSiteCrawler.
A sitemap won’t help indexed pages rank better. However, it may help pages that aren’t indexsed or help identify duplicate content.
Inspect sitemaps.xml top to bottom and make sure each page is listed and unique

Unique Titles and Meta Descriptions

Same meta data on multiple pages is a sign of low quality, generate less clickable search listings and makes pages more likely to be considered duplicate.
Best to have some code that provides acceptable titles and descriptions by default.

Spelling and Typos

Why does my listing in Google have a spelling error? Why don’t I rank? Has happened numerous times that pages didn’t rank because of typos in critical places such title tags and anchor text.

Broken Link

Use Xenu Link Sleuth whenever Google webmaster tools reports broken internal links, and whenever you do a major overhaul. Dead links are bad for user experience and a waste of link juice

Hacking & Malware

If your site gets hack, traffic will tank. Scanning for malware is weak, best scans only detect 30 percent of threats. Real security requires regular software upgrades, file integrity monitoring, version control and strong access controls. Top reason for hacks is failure to patch CMS. Jonathan mentions how the IMCharityParty Web site was hacked, which ruined their marketing efforts for the event that took place last night. They saw much fewer attendees and donations than normal because of the warning that deterred people from going to the site.

Code Validations

People love to argue about whether code validation is worth the trouble.
Validation increases the chance of cross platform/browser compatibility. Not a magic SEO strategy. Don’t expect rankings to instantly improve, they won’t.
Validation helps you check for errors automatically. It is easier to clear all errors and warnings than to pick and choose.
Search engines can parsed messed up code, but sometimes bad code confuses spiders.
If you look in Google Webmaster tools and see a code snippet appearing in the most common keywords, that may be a symptom of missing or malformed HTML tags.

SEO Intangibles

Happy visitors generate referrals, tweets, bookmarks and lnks, Unhappy visors don’t. Happy visitors are more likely to trust you and convert.
What tends to make happy visitor? Sites that loud correctly, quickly and smoothly on any browser, any computer and any mobile device. It doesn’t matter if you have the perfect keywords when your site is slow or won’t render.
Some people like to print Web pages. Do you have a print media stylesheet? For larger ticket items or B2Bs, printing may be important.
Do you still have those obnoxious messages chastising your visitors if they have the wrong browser?

The Big Picture

Technical SEO won’t magically lift your rankings, but correcting errors may help.
Don’t think about technical SEO only in terms of ranking signals. User behaviors are a ranking signal, when users react favorably to a web site, search engines eventually notice.

And that’s it. Technical SEO hurts my brain. We’ll see you after lunch, kids.

About the Author
Lisa Barone

Lisa Barone co-founded Outspoken Media in 2009 and served as Chief Branding Officer until April 2012.

Internet Marketing Conferences

Industrial Strength SEO

on Mar 3 by Lisa Barone

We. Are. Back! I hope you are, too. I just grabbed a handful of M&Ms from Dana Lookadoo and some…

Online Marketing

The Outspoken Media Video Intro

on Apr 14 by Lisa Barone

Hey. I was playing around with Google’s new Search Stories Video Creator this afternoon and decided Outspoken Media needed it’s…

Internet Marketing Conferences

Storyteller Marketing: The Art of Storytelling Matches Up With the Business of Marketing

on Mar 24 by Lisa Barone

Welcome back, friends. We’re going to talk about storytelling. With us are Stewart Quealy moderating speakers Joshua Palau, Dana Todd,…