Darrin Ward Blog

Darrin Ward's Internet & SEO Blog

Interim Update: Madrid, Spain Office Coming Soon!

October 26 2009, 10:56pm in Random Stuff
0 Comments

As always, work has been so hectic that I've been neglecting the blog. It's ironic... part of our consultation to clients is often that a blog should only be started so long as a commitment is made to post frequent updates, lest the blog reflect the negative image of poor organization upon the company. And here I am falling prey to that very same mistake.

Anyway. Part of what's keeping me busy is that we are in the early stages of taking Darrin Ward's professional SEO services to the Spanish market. The decision has been made to open an office in Madrid, Spain, and I am spending a lot of time working through the logistics. Fortunately, we already have some very talented Spanish speaking SEO's on the team, so this will be a very natural extension, and I am extremely excited.

If you are interested in becoming a client in Spain or know someone that may be interested, please contact me. If you just want to keep an eye on how things are progressing, www.darrinward.es will be the Website address.

There is no definitive launch date for Darrin Ward's operations in Spain, however things look like they are progressing nicely for Q2 2010.

As they say... "Watch This Space". There's a few other major announcements that I will be making over the next 3-6 months.

- Darrin Ward

Google Penalty for No rel=nofollow on Affiliate Links

September 24 2009, 10:11am in SEO
2 Comments

Barry Schwartz over at the Search Engine Roundtable reminds us today that you should use rel=nofollow on your affiliate links, or else you may receive a Google penalty.

The inherent illogic of stuff like this makes my blood boil sometimes... Why would/does Google penalize content from ranking just because links to affiliates or other sites do not use rel=nofollow? Either the content on the page is useful and it deserves to rank, or it doesn't. I don't see why having links lacking rel=nofollow alone should be a determining factor in that decision. Using rel=nofollow is a technicality.

If Google determines that the links on a page are against their paid-linking policy, then they should just discount any "link juice" that might get passed on from them. That's something they could do transparently in the background, without having to force Webmasters to consider this ridiculous rel=nofollow tag, and without having to deprive searches of valuable content (assuming Google otherwise determined it to be valuable except for the non-rel=nofollow affiliate links.)

Alas, the Google insidiousness continues, and we continue to begrudgingly comply forthwith so that we may get some rankings love! Although the whole thing does remind me of the pied piper sometimes :)

Why Doesn't Google Have a Dictionary? Still Link to Answers.com

September 21 2009, 10:34am in Google
1 Comments

I sometimes use Google as a dictionary replacement, as I suspect a lot of people do. I search for the word on Google and then click on the "definition" link beside the word in the horizontal blue information bar. Google links to answers.com, which gives the definition:

Google's Answers.com Definition Link

What I don't understand is why Google hasn't licensed the content from the Oxford dictionary or some other dictionary and made their own dictionary function. Probably a better idea would be to license the content from multiple dictionaries to make sure they have all of the right definition variants, including those that are regionally specific.

Granted, Google does have the "define:keyword" operator that attempts to define words by scraping content from pages across the Web. But, anyone that has experimented with this function to any degree will tell you that it can be horridly inaccurate. I've often seen it pull definitions from adjacent words on pages, yielding a completely irrelevant definition.

It should be noted that, according to compete.com, Google is responsible for 61.19% of answers.com's traffic:

Answers.com Referrals

It's not clear how much of this comes from the definition links and how much comes from regular organic listings. Either way, that's a pretty significant share of the traffic.

Hey, Google... If I set up a dictionary site, will you link to me instead?!

Google Stepping Up to Counter Bing's Growth?

September 18 2009, 11:01am in Search News
0 Comments

First, sorry for the lapses between blog posts. The unfortunate reality is that the blog does take something of a backseat versus servicing clients, business development and doing all of the interesting things that are going on with the De Ward Group right now.

Is it just me, or has Google really stepped up their efforts over the last few weeks and months? They've made many changes to their services, from minor UI tweaks, experimenting with different ad formats and organic listing formats, right up to introducing entirely new products such as FastFlip in the Google Labs.

Granted that Google has always been experimental, but one has to wonder if perhaps they are pushing things a bit harder now to counter the fact that Microsoft's search solution - now known as Bing - has crossed 10% market share for the first time in as far back as anyone cares to remember. These Nielsen ratings only came out the other day, but it was obvious that Bing had been gaining some traction. And knowing that Microsoft may very well soon accrue a on of traffic from Yahoo - assuming the deal gets regulatory approval - can only be increasing the pressure on Google.

These are very interesting times for us here in the world of search engines. I very nearly fully exited the SEM industry back in 2003 when sold the SEOChat.com company. Even though I did make a conscious decision to become a less public figure, I'm very glad that I chose to stay within the industry. We've got one hell of a roller coaster ride ahead of us in the coming 2 years, and I would hate to miss it.

Bing still have a long way to go. But as much as I love Google, I have to hope that Bing can step up to the challenge and give Google a run for their money, because competition is good for you, me and everybody.

One-Way Folder Syncing: Mac to Blackberry Folder Sync

August 27 2009, 5:55pm in Random Stuff
0 Comments

I use a BlackBerry 8820. I've got an iPhone, used to have a Sony Experia X1 (Windows Mobile) and I have tried a plethora of other phones (including other BlackBerry's), but the BlackBerry 8820 is the one for me.

However, one thing that used to irritate me about the phone was that it wasn't very easy to sync my iTunes music podcasts and some business documents from my Mac to the SD card in my BlackBerry. So I wrote a simple shell script that takes care of those things for me. The script I show here should also work with other BlackBerry's.

The script uses rsync to overwrite folders on my BlackBerry with folders on my Mac (like my iTunes folder). I saved the following code into a file named bb-sync.sh in my ~/ folder:

rsync -u -v -I -r --delete "/Users/DWard/Music/iTunes/iTunes Music/Podcasts/" "/Volumes/BB/iTunes/Podcasts/" &&
rsync -u -v -I -r --delete "/Users/DWard/Desktop/WalkMusic/" "/Volumes/BB/iTunes/Music/" &&
rsync -u -v -I -r --delete "/Users/DWard/Documents/Passwords.kdb" "/Volumes/BB/Documents/Passwords.kdb"

The formatting of the above code may look weird owing to linebreaks, so you can also Download bb-sync.sh.

This is 3 separate rsync commands because I am syncing 3 folders. On each line, the first reference to a file or folder is local on my mac and overwrites the second stated file/folder, which is on my BlackBerry SD card (they all start with "/Volumes/BB/").

My blackberry SD card mounts as a volume named "BB". Yours will probably mount as something else, but you can check by using "cd /Volumes/" in Terminal when your device is connected to see what name it uses when it mounts. You may need to plug it in and out to see the differences between mounted/unmounted states. The Volume name will probably also show up on the OS X Desktop as a drive when your BlackBerry SD card mounts. Substitute BB for the name of your BlackBerry SD card volume and change the directories that you want to sync.

When my BlackBerry SD card mounts, I sync by opening up terminal and typing "sh bb-sync.sh" and it prints out a report of the files it's deleting and new files it's uploading.

There are two last things that I will say: 1) That this technique will work for any mounted volume; it's not specific to BlackBerry, and; 2) You can fiddle with the rsync flags and options to get a two way sync, or some other functionality. But I'm not going to bother with that. See the man page for rsync if you want to do something other than what I have described here.

Now if you'll excuse me... I am going to listen to some recently synced podcasts on my BlackBerry while I go for my evening walk. :)

Google Changes Homepage: Preferences Now Search Settings

August 27 2009, 11:13am in Google
5 Comments

I change the number of results-per-page setting quite a lot, and I just noticed that the "Preferences" link on the Google homepage that I normally use to change my results-per-page setting is missing. Instead, it is now located in a "Settings" drop-down menu up at the top, named "Search Settings".

Before, the "Preferences" link was to the right of the search box, I believe under "Language Tools". In the words of Stewie Griffin... "I don't like change".

Take a peek:

Google Preferences Link Moved to Search Settings

Edit: The "Settings" link at the top will be a drop-down to include both "Search Settings" and "Account Settings", if one if logged into an account. If one is not logged into an account, the "Settings" link will change to read "Search Settings", and it will go directly to the settings/preferences page.

Google Showing Sitelinks for AdWords Sponsored Links

August 25 2009, 6:10pm in Google
5 Comments

I'm not sure if this is a new thing or not, but I just searched on Google for "staples.com" and I noticed that the Staples paid listing at the top was sporting sitelinks... something that I don't recall seeing before. Upon refresh, they were gone.

Take a look:

Google Sponsored Link With Sitelinks

Google Street-View, StoreFront Barcodes & Extended Store Details

July 27 2009, 6:01pm in Fun Stuff
1 Comments

BBC has a cool article piece and video today on the potential future of barcodes, called Bokodes. Bokodes are basically very small but versatile version 2.0 barcodes that can hold lots more information than traditional barcodes.

What's very interesting however is the mention of integration with Google street-view, where the small bokodes posted on the storefronts could be read by the Google street-view camera as they go by. The bokodes could contain information such as menus, hours of operation, etc... This is pretty amazing stuff and it really is a glimpse into the future. This is really taking digitization to the next level and I wholly encourage you to check it out!

BBC Article: Barcode replacement shown off.

West Coast Airfares Rising Faster than East Coast Airfares?

July 27 2009, 1:32pm in Random Stuff
0 Comments

I absolutely loathe flying. I'm not scared of flying (in fact I find takeoff and landing to be quite exciting), but rather I just find the whole experience of public air travel to be utterly deplorable, and frankly, disgusting! Airports are congested, people on planes have zero personal hygiene, getting the shakedown at airport security, etc. is just an invasion of my personal space that I rather not endure, which is why I only fly when it's absolutely essential.

However, there are a couple of upcoming projects for which I may have to travel, so I was recently looking at some airfares, which is why I was very interested to see that the bing travel blog has an interesting recent post about airfares rising faster on the west coast vs. east coast. Cross-country fares have also risen by a whopping 23% over a 4 week period.

Anyway, I just thought it was interesting enough to share. Personally, I think I'm just going to line up as many conference calls as I can over a 2 day period and drive where I need to go rather than fly. That way I can still get work done and I don't have to fly. Unfortunately, I was hoping to make an international trip and I may just have to concede and fly, because I can't drive and I can't afford to charter a large yacht (though I would if I could before flying).

Bing / Microsoft Ramp Up Usage of msnbot/2.0b

July 20 2009, 12:03pm in Search News
0 Comments

Rick DeJarnette posted on the Bing blog on Friday that we should expect to see an increasing number of visits from msnbot/2.0b, a second (or third?) generation of the crawler used to power Bing.

This really isn't all that helpful to us Webmasters. In all reality, if they didn't tell us that they were testing a new crawler and didn't change the UA string (1.1. to 2.0b), we probably wouldn't have ever noticed.

Why do they still call is msnbot though? Shouldn't it be BingBot? I wonder if Microsoft has an identity crisis.

Facebook Consumes Most of Americans' Online Time

July 16 2009, 12:16pm in
0 Comments

PC world underscores Nielsen's report this week that American's spend more of their time on Facebook than any other Website. Quite an achievement for Facebook, but the really interesting part of the Facebook phenomenon is that they are penetrating the age 55+ market more successfully than social media sites.

Facebook is a great site and I used to be an avid status updater, but for me it became boring pretty quickly. There has been a lot of recent talk about Facebook's direction, whether or not they can successfully become profitable (including a recent comment by Facebok board member Mark Andreessen stating that the site would be posting billions of dollars in revenue in the coming years.).

It will be really interesting to see how things pan out, but I can help but think that all of this is just hype. Rather few social sites have generated anything near where lofty expectations were.

How Important is an ODP/DMOZ Links for SEO?

July 6 2009, 2:33pm in SEO
3 Comments

If you've been in the Internet Marketing industry for any length of time, then you will have heard of the "ODP" or "DMOZ", the Open Directory Project that resides at www.dmoz.org. The ODP is a large general Web directory edited by volunteers. And for years it was considered almost the holy grail for inbound link developers. Some still consider it to be so.

A member at WebmasterWorld asks "Is DMOZ still relevant in 2009?". The responses are interesting.

As part of our SEO campaigns, we do perform directory submissions to a select number of top-tier general directories and a small number of niche directories (the number depends on the niche). The ODP is still in the top 3 of our most desirous general directory link acquisition targets. But it's certainly not a holy grail of any sort.

The ODP certainly has is problems. It's very slow to get anything listed in the ODP due to the lack of editors/volunteers as compared to the volume of submissions they receive. Internet users seem to be tending away from directory type Websites and converging on social/search type sites. And, ODP hasn't done anything even remotely innovative in years (in fact, I don't know if they've done anything innovative, ever.)

But the ODP still gets used in countless places across the Web. So a listing/link in the ODP inherently means links from many other places. The value of the ODP link itself probably carries more weight than all of the subsequent links combined, but it's still a positive.

Yep - for me submitting to the ODP is still relevant in 2009. Not as much as it used to be, certainly. But it's still relevant. I do however recommend that you read my insights on submitting to directories for SEO.

Putting multiple Lat/Long Points on a Google Map

June 26 2009, 5:52pm in Fun Stuff
6 Comments

Latitude Longitude Coordinates / Points Mapping Tool.

I was having a tough time finding a tool to map lat / long coordinates for a project. I needed to just copy and paste a bunch of coordinates and have points show up on a map, but I couldn't find a tool to do that, so I put a simple lat/long point mapping tool together.

Simply paste your geo-coded coordinates into the text box and hit submit. The page will reload and your points will be mapped. However, you must be sure to provide the latitude and longitude coordinates in the right format (this was just for internal use so I didn't bother with formatting or error checking). The right format is to use ONE combination lat,long per line. Separate the lat/long with a comma and there should be no spaces.

Latitude Longitude Coordinates / Points Mapping Tool.

Changing from 'Remember Me on this Computer' to 'Stay signed in'

June 24 2009, 9:12pm in Google
1 Comments

Here's a very small but interesting change. Google has changed the text label beside the checkbox on the Google account login form that keeps the user signed into their Google account. It's changed from "Remember me on this computer" to "Stay signed in".

I wonder if the previous label was confusing people. Good to see their experimenting with little usability things.

Here's how the login form looks now:

New Google Account Login Form Checkbox

And here's what it used to look like:

Old Google Account Login Form Checkbox

Start your persistent cookies, get set. GO!

Reports of New Google PageRank (PR) Update Already (June 2009)

June 24 2009, 4:08am in Google
0 Comments

A thread over at WebmasterWorld has some reports of a PageRank (PR) update going on with many people seeing new PR values, just one month after Google did their last PR update.

It will be interesting to see how this plays out during the course of the day... is this an incremental update or a full PR update? Are Google returning to their monthly PR update cycle from many years ago? It's not impossible... as more new content is generated, people want to see the PR values for those pages. Leaving PR updates on cycles of just 6 months leaves huge gaps in the number of pages for which Google has missing PR values. That can reflect poorly on Google.

Updated SEOmoz SEO Best Practices / Policies

June 23 2009, 2:23pm in SEO
0 Comments

SEOmoz has published some updated SEO best practices guidelines. The guidelines are apparently based on "correlation data", which means that they looked at rankings and analyzed the different components on the ranking pages.

The list of SEO best practice items gives recommendations for:

  • Title Tag Format
  • The Usefulness of H1 Tags
  • The Usefulness of Nofollow
  • The Usefulness of the Canonical Tag
  • The Use of Alt text with Images
  • The Use of the Meta Keywords tag
  • The Use of Parameter Driven URLs
  • The Usefulness of Footer Links
  • The Use of Javascript and Flash on Websites
  • The Use of 301 Redirects
  • Blocking pages from Search Engines
  • Google Search Wiki's Affect on Rankings
  • The Affect of Negative Links from "Bad Link Neighborhoods"
  • The Importance of Traffic on Rankings

This is great stuff, but as with everything in the "SEO" world, it needs to be taken with a pinch of salt. Each element that gets analyzed essentially introduces another unknown variable into a simultaneous equation.

One of the most interesting items is that H1 tags have been reduced to having nearly no importance in search engines. What I'm wondering is whether or not SEOmoz also looked at the CSS styling for the H1s to determine if H1s styled to a smaller font carry less weight, or if the reduced importance of the H1 is blanketed. We know that Google look CSS and JavaScript.

Google Showing Products Results Under Sponsored Links on Right

June 22 2009, 8:29pm in Google
0 Comments

Google Blogoscoped has some screenshots of Google showing product results on the right side of the SERP underneath the Sponsored Links. This is almost certainly related to Google experimenting with commission-based advertising. I had seen screenshots of the product links up top, but not at the side.

I should note that Google has experimented with CPA (Cost Per Action/Acquisition) advertising before with AdSense Publishers back in 2006, but they stopped that program. My thinking at the time was that it turned out to be ineffective because the ads weren't targeted enough, so CPC probably yielded a larger income for Google and their publishers. But with direct products integrated into search pages, I'm thinking that they should actually work quite well. It will be interesting to see where this goes.

A Quick Look at Google Voice (with Screenshots)

June 22 2009, 12:25pm in Google
2 Comments

There has been a lot of talk this past week about how Google Voice is set to open up to the public soon. At the moment it's only available to Grand Central users - Grand Central being the name they have been using for the last few years. Fortunately, I've had a Grand Central account since the very beginning, so I'm now testing out the new Google Voice system. Here's a screenshot of the homepage / inbox: when you first join:

Google Voice after First Joining

As you can see, this layout is very similar to GMail. You've got an Inbox, Starred, Spam, Trash and some other items specific to Google Voice like Voicemail, SMS, Recorded, Placed, Received and Missed.

The first thing I wanted to do was check out the transcript feature for voicemails, so I called my Google Voice number to leave a voicemail. This is what I said:

Hi, this is Darrin Ward i'm testing out my beautiful new google voice account. Please leave your message after the tone and I will get back to you as soon as possible. Thank you

And here is what Google delivered to my Inbox:

Google Voice Voicemail Transcript

Google also delivered a transcript of the message to my email Inbox:

Google voice Transcript Email

Obviously, this is not a perfect transcript, but I have to say that it is still pretty accurate. I was disappointed however that Google didn't deliver a .wav file of the message like Vonage does.

Clicking on the "SMS" button in the top left will allow you to quickly send an SMS text message:

Google Voice Send SMS

This message came through on my phone (BlackBerry) very quickly, but what I liked most was that it appeared to have come directly from my Google Voice number and the message did not have any labels or advertising indicating that the message was from Google Voice. It looks just like it came from another mobile phone:

Google Voice SMS on BlackBerry

When I replied to the text message from my mobile phone Google alerted me that I had 1 new message. This was a link which I had to click to see the reply in my Inbox:

Google Voice SMS Reply

I never use SMS messages, but I do use the BlackBerry PIN service. So I'm a little disappointed to find that this is not an option on Google Voice. I frankly don't understand why people use SMS messages. There's a character limitation (or you break up into multiple messages and get charged multiple times). BlackBerry PIN is free with an internet connection and there is no limitation, or at least none that I have ever run into.

Google Voice is painfully slow for me right now. Pageloads are taking upwards of 15 seconds. However this problem may be on my end; We have a new fiber connection in the building that doesn't have a reverse DNS yet, and this seems to cause huge delays with sites that require reverse DNS lookup.

Google recently acquired 1 million new phone numbers from Level 3, so they are obviously expected a surge in numbers soon. I think Google Voice is a cool service, but it doesn't add anything that I don't already have. You will still need another phone to make outgoing calls and the SMS functionality for me is useless (though I might use SMS more if I can proxy it through email or BlackBerry PIN so I don't have to pay.)

Please share your thoughts on Google Voice!

Google Truncates Some URLs to One Line

June 19 2009, 9:41am in Google
0 Comments

Barry Schwartz at the Search Engine Roundtable has some screenshots of Google SERP listings where some of the long URLs are truncated to be only one line.

It doesn't seem to be happening for all URLs on all SERPS, but it's definitely happening for some of them. A quick look seems to indicate that Google are more likely to truncate part of a URL path (the folder names) instead of the filenames. They always seem to keep the beginning of the filename (whether it's located in the root or in a folder/directory), but they sometimes truncate the end of the filename.

Gotta love Google with incremental changes!

China Disables Part of Google Functionality for Porn Material

June 19 2009, 9:22am in Google
0 Comments

China is at it again. They have disabled part of the Google search engine that they claim displays "pornographic and vulgar content", according to the New York Times:

On Friday evening, it appeared that the associative-word feature of the Web site had been disabled. That is the function that displays a drop-down menu of words related to a search word that is typed into the search engine.

I'm not clear on whether Google disabled it themselves voluntarily or if China disabled it by force in some way.

Google Implements Persian Translation to their Translate Tool

June 19 2009, 7:40am in Google
0 Comments

Amid the ongoing Iranian election madness, Google has added Persian (Farsi) as an available language to their Google Translate tool, according to the Google Blog. The language is currently in "Alpha" status, meaning in the very early stages of testing (pre-beta).

Today, we added Persian (Farsi) to Google Translate. This means you can now translate any text from Persian into English and from English into Persian ... The service is available free at http://translate.google.com ... We feel that launching Persian is particularly important now, given ongoing events in Iran

Freshness Optimization - Optimizing for Google Fresh Rankings

June 18 2009, 6:57pm in SEO
0 Comments

Bob Heyman today on Search Engine Land notes that the Google freshness factor may mean big implications for retailers. He notes that the EVP of ice.com, a large Internet retailer, is making proactive changes to their site because of the recent search "options" functionality introduced by Google that allow searchers to select "recency" as a criteria.

They are indeed correct. When you search on Google you will see a "Show Options" link at the top of the SERPs. When you click this link, you will see the "recency modifiers" options of "Any time", "Recent results", "Past 24 hours", "Past week" and "Past year". These allow searchers to refine the search results based on how recently the pages were updated.

If you sell products online then you probably don't need to update your product pages all too often. This will have a negative impact on your traffic levels if many people adopt the usage of Google's recency modifiers, because your pages that haven't been updated in a long time won't get listed in SERPs that require recently modified pages.

So, what can you do about it?

The first few things that come into my mind are: "daily changes", "Last-Modified", "checksum" and "page size". If you can keep all of these in mind and know how they relate to each other, then you should be able to engineer yourself into always having fresh content.

Google are looking for pages that are recently modified, so the best way to fit into that criteria is to actually add new content to pages daily. Keep in mind though that they are probably look for pages that exceed some threshold of new content before the page is actually considered changed or updated. So just adding or changing 1 sentence on a page with 100 sentences probably isn't going to cut it. I don't know what the threshold is, but I would be comfortable recommending a guideline minimum of 10-20%. This means 1 or 2 new stories every day for a page that normally features 10 stories.

I know what you're thinking... I'll add some random content and every time a search engine sees the page it will be different. I generally advise against this because if Google find that your content is completely random, then they will be a lot less confident sending traffic to you for a specific keyword, given that the relevant content that was on the page at the time they spidered it will likely be gone when a user goes to see the page. Frequent change = good. Random = bad.

So. Commit to making a few changes throughout the day and you should always be there for a "Past 24 Hours" search.

"Last-Modified" is an HTTP header which a web server sends with the response to a request. The Last-Modified header tells the client (the search engine spider in this case) when the page was last modified. It's very likely that Google and other search engines wanting to determine freshness will look for this header. However they won't completely rely on it because it can be "faked" to whatever date the Webmaster wants. So, search engines will still look for content changes. Always sending the current time is bad.

It's important to note that the Last-Modified header is not always sent by default. It is sent most of the time with static content/pages, but sites that are dynamic generally don't send this header by default due to the complexities in calculating the true last time of modification. If you're selecting a CMS, this may be a worthy consideration. Incidentally, there is also something called the "If-Modified-Since" header, which you should look into.

Finally, a quick and dirty way to check for changes to a page would be to compare the checksum values and the file sizes to previous versions of the document. I won't go into much detail here because I'm not sure that Google are using these methods, but if 2 versions of the same file pulled on different times have exactly the same size, then there is at least a small probability that they are identical.

The checksum method is more accurate, but still not perfect. A checksum comparison will compare the checksum of 2 versions of the same document, and if the checksums are identical then there is a good chance that the documents themselves are identical. This method gives a pretty accurate yes or no answer as to whether the 2 documents will be identical. It does not measure the degree to which the documents' contents differ (the percentage of content that is different).

I hope this helps to at least get you thinking about this important issue. I know that I'm using the recency modifiers quite a bit, but I don't know what the adoption numbers are; hopefully Google tells us at some point. Submit a comment or get in touch if you have something to say!

Analytics: Referring Search Keywords & Keyphrases. What's the difference?

June 18 2009, 12:14pm in Random Stuff
0 Comments

Many analytics programs allow you to see referring search engine "keywords" and "keyphrases" as two separate reports, and it's important to understand what the difference between these for SEO or PPC. ("Keyphrase" is probably more correctly written as "key-phrase", but my spelling has never been perfect, so why start now!)

A Quick Overview For The Impatient

A "keyphrase" report will show you the exact referring search phrases, usually sorted by volume/hits. A "keyword" report will show you the hit count for each unique keyword across all of the referring keyphrases.

Keyphrases

Keyphrases are pretty simple. The analytics program will track each exact referring phrase from each search engine, and each time it sees a new hit for a keyphrase, it will increment the count. For example, if 10 different people find your site by searching for "download music", then that keyphrase will have 10 hits. If another 10 people find your site for "music download" (the same words reversed), then this phase will also have 10 hits, and the keyphrase report will be:

  • download music: 10
  • music download: 10

Keywords

A keyword report will separate each individual keyword from it's keyphrase and find the hit count for that keyword across all search keyphrases. So, given the same 20 referring search keyphrase hits from the example above (10 for "download music" and 10 for "music download"), a keyword report would show the following:

  • download: 20
  • music: 20

This is because the words "download" and "music" appear a total of 20 times each.

Why is This Important?

First and foremost, the keyword report does not give you a clear idea of exactly how people are finding your site. Instead, it gives you a very broad overview of the main keywords that are being used to find your site, but not the exact keywords. These keywords may have a "long tail". In our example above, none of the traffic came directly from searches for "download" or "music", yet both of these show 20 hits. Looking at the keyphrase report will tell you the exact keyphrases that drove traffic.

The total hit count for a keyword report will also be inaccurate. In our example, the total traffic was 20 (10+10), yet our keyword report gives the impression that we received 40 (20+20) hits.

All-in-all, both keyword and keyphrase reports have their place in SEO and PPC, but you need to know what you are looking for. Most often you'll really want a keyphrase report rather than the keyword report.

It's interesting to note that Google Analytics has a keyword report, but it's actually a keyphrase report. Google Analytics doesn't have a true keyword report.

Early comScore & Compete.com Bing Search Traffic Numbers

June 17 2009, 2:04pm in Search News
0 Comments

Danny Sullivan for Search Engine Land runs though some search engine usage numbers, looking at how traffic is going for Bing in these early stages.

It's tough looking at these early numbers because it's very difficult to determine any potential long .term changes when you look a short-term numbers. comScore sees and increase in usage for Bing, but that doesn't seem to be reflected by compete.com's numbers. Danny does a great job of going through the numbers are explaining what's really being measured.

Getting people to change their habits is hard, and I'm still not sure how I feel about Google... What do you think? Is it great or not? No doubt some people will want to use it more than Google just out of principle. Why? Because It's Not Google!

Oh, another Microsoft thing that I thought was funny today was that they are insulting some browsers as part of an Australia promotion called "Ten Grand Is Buried Here". See tarnished Chrome, boring Safari and old Firefox.

Interesting Points on Google's rel=nofollow / PageRank Change

June 16 2009, 12:49pm in Google
0 Comments

Rand Fishkin of SEOmoz recently posted on the issue of rel=nofollow changes at Google, which prevent Pagerank being saved and sent through to the other links on the page (sculpting). The post comes after a blog entry by Matt Cutts on the topic.

I agree that it can be frustrating when Google reverses direction, but I've always maintained my belief that using rel=nofollow to sculpt Pagerank was a bad idea and counter to the idea of Pagerank in itself. I just plain don't like rel=nofollow, unless it used as a purely protective measure for potentially spammy UGC (User Generated Content).

Rand says one or two things that I disagree with:

"I'm saddened to say that given this change, we, as SEOs, are going to have to also recommend the best practice that comments (in all forms of UGC) no longer accept links ... Comments that contain links, unfortunately, will actively detract from a site's ability to get pages indexed (as they'll pull away link juice from the places that need it)"

The web has changed, and the notion of preserving Pagerank that existing in the early 2000's is gone. What matters today is not so much the Pagerank of pages, but how pages are intertwined, themed or fused with pages on the same topic or closely related topics.

A blanket statement that links in UGC should be ignored is a terrible idea. UGC often introduces some quality links that can actually add to the ranking potential of the page. Granted however, there is a risk of spam links causing a negative impact on rankings, but sites should be sufficiently moderated to prevent such links. If quality control and moderating isn't in place, then how trustworthy is the site anyway? Besides, rel=nofollow can still be used. Worrying about saving the Pagerank is not a big thing in my view.

"From now on, if you wish to sculpt PageRank, you'll want to use one of the following classic PR sculpting methodologies:"

And Rand goes on to list some of the old-school methodologies to prevent links being displayed directly on the page. There are a couple of problems with these trickery ideas:

  • They are specifically intended to deceptively hide links from search engines... isn't this against best practice?
  • We all know that Googlebot has come a long way. Google crawl CSS, JavaScript and Flash. I'm sure they can either find most of the links that would be hidden via such methods today, or would in the very near future anyway.

I agree with Rand in that this a pretty big shift in how Google handles PR flow, but I would certainly say that it was a very predictable move. In hindsight, I'm very glad that my team has never adopted the notion of PR sculpting with rel=nofollow. We have other more practical methods.

PS. My blog CMS is a custom solution and I haven't gotten around to implementing links in comments yet, but I do plan on doing it. When I do, the links probably won't use rel=nofollow (I haven't thought much about it yet), but I will be removing spam links, so I'm not too worried.

Optimizing Inbound Link Anchor Text Through Diversity for SEO

June 15 2009, 4:17pm in SEO
1 Comments

A poster at WebmasterWorld.com asks about using different anchor/link text to point to the same page for SEO.

This is a good question and I though I'd share some quick insight as to how I normally approach link development in terms of anchor-text diversification.

The short answer is that diversity in inbound links is a good thing, because it shows that: a) the links are less likely to be auto-generated or copy/pasted everywhere, and; b) the page is relevant within a variety of slightly different contexts (presuming the same general topic).

The longer answer is that although some diversity in anchor text is a good thing, you need to be careful not to overdo it in case you dilute ranking potential for the real keywords. If a page has 100 inbound links but none of them are the same or they don't even contain the same keyword, then how will a search engine know which keywords are most relevant (besides looking at on-page content!). The best thing to do is to make sure that at least 50-60% of inbound links contain the root keyword - or one of it's closely stemmed variants or synonyms - in combination with other words. The rest of the links can be whatever, but ideally there would be some consistent phrase usage too.

The other thing to consider is the positioning of the link itself, and whether or not it's an internal link or a link from an external site. If the link is from a global navigation menu, then it's not practical (or good) to make that link different on every page just for SEO (plus, UX people would scream at you). Also, if the links are internal, then I think the tolerance to lower diversity in link text is higher (the same people will probably use the same descriptions multiple times). Considering links from multiple external sites, it makes sense to think that there should be higher diversity in link text because there are probably multiple writers involved, and no 2 writers will do exactly the same thing.

Webmaster Jam Session 2007 Slides

September 24 2007, 9:51pm in SEO
0 Comments

by Darrin J. Ward:

A big thank you to everyone that came to the Webmaster Jam Session this year in Dallas, TX. Although I got a lot less rest than I would have liked, I can truly say that I had a fantastic time.

I had one or two requests for the slides from this year's Search Engine Strategies, so without any further delay, click the link below for the PDF file. Please contact me if you have any questions.

Search Engine Strategies Slides (September 22nd 2007)

Vital Academic Papers/Articles for SEO (Search Engine Optimization)

August 4 2007, 2:21am in SEO
0 Comments

by Darrin J. Ward:

I've always been greatly interested in mathematics. Well, not always, but I did come to have a lot respect for applied mathematics and physics during my latter years of school and college. Now, I have to also admit that I don't understand as much as I'd like, because it would simply take far too much time to learn it all. The deep stuff is beyond me and I admit that. Nonetheless, I remain fascinated by the sheer logic in math and the fact that it transcends race, time, other languages, etc. It's a universal language

Ever since I learned about Fermat's Last Theorem, I've been absolutely engrossed by the notion that a simple-looking and simple-sounding statement could boggle the minds of the world's greatest mathematicians for over 350 years. The theorem states, simply, that xn+yn=zn has no solutions where x,y and z are integers greater than zero and n is an integer of value 3 or greater. You'll note that n=2 would be the pythagorean theorem!

So, where is all of this going and how does it relate to SEO? Well, in reading the amazingly complicated Proof of Fermat's Last Theorem [PDF] by Andrew Wiles (and yes, I've actually had a printed copy in my office for the last few years), I've been forced to learn a little bit about some intriguing things in number theory. One such thing was Eigenvectors. In doing further research on these I came across a wonderful paper entitled "The $25,000,000,000 Eigenvector - The Linear Algebra Behind Google" by Kurt Bryan & Tanya Leise, which is basically about Google's PageRank (an Eigenvector).

I've read quite a lot of academic papers that theorize on various thing, but I had not come across this particular one before, so it was a pleasure to look through it. I mostly use academic papers as a source of inspiration rather than a solid foundation for an SEO campaign. They are extremely wonderful in provoking me to think about abstract things which eventually help me get ahead in the SEO world.

The fact of the matter is that search engines are nothing more than big calculators (though, with an arguable component of manual reviewing, a-la Google's Patent # 7096214). If you know how they work and understand the steps that they make in performing their calculations, then you have a significant competitive advantage. Looking at what's being proposed in these academic papers therefore makes a lot of sense as they are a great source of the latest in terms of strategies.

So, here are some of the papers that I usually recommend to people wanting to learn more. They do have a lot of mathematics in some cases, but you can usually get some good info even without understanding everything (I will update this list every-so-often, Contact me with addition considerations):

Authoritative sources in a Hyperlinked Environment
-- by Jon. M. Kleinberg

Site Level Noise Removal for Search Engines
-- by Andre Luiz da Costa Carvalho, Paul-Alexandru Chirita, Edleno Silva de Moura, Pavel Calado, Wolfgang Nejdl (2006)

The Anatomy of a Large-Scale Hypertextual Web Search Engine
-- by Sergey Brin and Lawrence Page

A Survey of Eigenvector Methods For Web Information Retrieval
-- by Amy N. Langville & Carl D. Meyer

ParaSite: Mining Structural Information on the Web
-- by Ellen Spertus

The $25,000,000,000 Eigenvector - The Linear Algebra Behind Google"
-- by Kurt Bryan & Tanya Leise

Preventing Bad Bots / Scraping / Email Harvesting

July 19 2007, 10:12am in SEO
0 Comments

Every serious webmaster than I know, and especially SEO's, have complained at least once about their content being stolen (scraping). Computer software known as "spiders", "robots" or "bots" regularly crawl the internet and visit our websites. Search engines use this spidering software to visit your website and include your pages in their indices. Unfortunately, people that are out to steal your content also use this software to download websites in huge volumes and then republish it elsewhere, thus detracting from the uniqueness of the "scraped" content. Or, some robots just relentlessly rip through pages in an attempt to harvest as many email addresses as possible - so that they can spam them later. They way to stop either these scrapers or email harvesters is the same. Update: SpyderTrax is a tool that automatically bans bad robots and tracks good ones.

Obviously, we don't mind the search engines from accessing our website because they are "good" bots, but how do we prevent the content-robbing "bad bots" from accessing our websites?

Many people will tell you about something called mod_rewrite in the ".htaccess" file , which is an Apache directives file. Many of the directives/code which you will find on the internet use a very simple filtering system to prevent known-bad robots from accessing your content. The problem is that almost all of them rely on the spider-supplied "User-Agent" field. Some others rely on blocking known-bad IP address blocks. But, none of them allow you as a webmaster to dynamically detect and block these bad robots.

First, let me give you a brief summary of all of the information that we will have available to us in order to make a sound judgement.

  • IP Address: The IP Address is a numerical identification of the computer making the request.
  • Requested URL: This is simply the page being requested by the robot.
  • User Agent: This is how the spider identifies itself. This value can be easily faked, so it's very untrustworthy.
Through analysis of these 3 parameters, it is possible to design an accurate system which will prevent unauthorized robots from accessing (scraping) our content.

I have being using a proprietary script for some time that attempts to solve the bad robots problem. Today, I'll share with you the logic that I have implemented, which has demonstrated great success. I know of a few other people that also use the following method with some variation. In fact you may be able to find a script or service that does this for you.

An brief overview of the steps that I take to detect scraping "bad bots" is (also see additional considerations):

  1. See if client/spider follows a blank URL.
  2. If it does, get the IP address and perform a reverse DNS lookup.
  3. If the DNS lookup resolves to an untrusted domain, block access in .htaccess (or httpd.conf).

So, here is a some more explanation about the proceedure:

I have coded a hidden link into all of my web pages. Why? because robots/spiders follow almost all links, and I want those spiders to follow this special hidden link, not ordinary users - so I made it invisible. By invisible, I mean that it have no anchor text... for example: . Note that this code links to a php file and that I have used the rel=nofollow attribute (to prevent it from being followed by spiders that understand that command).

It is logical to assume that the majority of hits to this php file will be culpable, though we cannot yet accurately form an opinion as to whether the hit is from a good robot, a bad "bot" or some other source (such as a browser pre-fetch). However, we can use the php file to try and help us out, since only automated requests should be made to this file.

So, again, what information do we know from the hit to the php file. Remember, for each hit we have an IP address, requested URL and the User Agent. We can't really use User-Agent field, because it is supplied by the requesting agent, and thus potentially mendacious. We can trust the requested URL because it is logged on our side (but we've already exhausted this to our advantage by detecting the hit to robots.php). We can also trust the IP address.

Yes, IP's can be faked/spoofed, but the majority of scrapers are small time and don't have the resources or know-how to perform IP Spoofing (which is actually very difficult to do over TCP/IP to websites).

OK, so let's look at the IP. If we suspect the request is suspicious (which it is since it followed the invisible link to the php file), then we need to determine whether or not the IP can be trusted i.e. if it's a good or bad bot, and that can be done via. reverse DNS Lookup. Reverse DNS Lookup tries to translate an IP address into the domain name that it belongs to. Here is an example of some IP addresses (taken directly from my log file) and the domain names to which they correspond:

  • 66.249.72.12: crawl-66-249-72-12.googlebot.com
  • 207.234.130.25: 207-234-130-25.ptr.primarydns.com (I found this masquerading with User-Agent "Googlebot")

So, we found that the first IP address does actually belong to Google, but the second one doesn't. We immediately ban the second IP addresses by adding it as a "deny from" in the .htaccess file, because it is crawling our site and downloading our content, as evident by its request to the php file in the blank link.

We don't know what that domain is and we don't want to trust it with unrestricted access to our content. When I add an IP address to my list, I regenerate my .htaccess ban file immediately, which ensures the bad bot is banned immediately.

At this point, you should realize that you will need to compile a list of trusted domain names. You should also thoroughly test your script so that you know it won't erroneously block a trusted domain.

I'm going to leave the logisitics of the unban up to you. I present my banned users with a captcha challenge to see if they are really humans. I also automatically unban IP addresses after a set time frame. Please read the additional considerations below.

I'm not going to talk about the script too much other than how it operates. Many people will use different platforms and programming languages. The above described system could be coded for almost any platform, with almost any language, so I won't talk about the actual code. You should however consider the following: -

Additional Considerations:
I'm not going to talk too much about the these, but you should be aware of the following, if you choose to use the above descibed system:

  1. IMPORTANT: You simply must have some kind of feature on the script that will let a user unban themselves. Inevitably, a minute percentage of false positives will occur, and you don't want that visitor to be turned away forever. The way to accomplish this is by having some kind of a challenge response if the user's IP does get banned. I use a custom Captcha system which allows the user to unban their IP. Human users can perform the unban and continue surfing whereas the automated bad bots cannot.
  2. Some users will visit your site through multi-IP proxies (AOL is a good example), so you need to account for hits from multiple IPs, otherwise your ban program may get confused.
  3. Browser Pre-Fetch mechanisms are likely to give a false positive on this system, since they will follow the blank URL, so looking for the pre-fecth header is something you might want to do. However, be aware that this header could also be sent by a bad bot, so don't trust it too much. It's a little more trustworthy than the User-Agent field.
  4. There can be freak occurances where a hit is made to the php script by a legit visitor. So, you may want to think about running multiple instances of the system in parallel, wherein a ban only occurs if hits are made to all of them consecutively. If you do choose to work on a system like this, one thing to look at might be the number of seconds between the hits, because it's only automated systems that will make multiple hits in very short time-frames (1 or 2 seconds).
  5. Spam bots change IP addressess every now and then because they do eventually get blacklisted. When that happens, the old IP address gets reassigned to other web surfers that are not necessarily bad, so only block IP addresses for a maximum amount of time (I go for 1 month).
  6. It is also possible to perform an "IP Whois" on an IP address to find the NetBlock owner, which would give you some extra details, in addition to a simple reverse DNS lookup. This could be helpful for making a ban decision.
  7. Sometimes search engines have IPs that resolve to domain names other than what would seem intuitive. You need to be aware of these so that they don't get banned. These are usually found through trial and error.
  8. You may also wish to consider adding the entire C-Class IP block to the ban list, since it is possible that the bad bot with use multiple IPs from within the same range, and you certainly do want to prevent those from grabbing your content.
  9. You will need to use a database to keep track of all the bans, IPs, unban, times, requested and referring URLs, etc... I use MySQL with about 15 fields (yes, there is that many parameters to track, when you think about it!!).
  10. You must keep the script under close observation during its test run. You will need to be "tailing" and "grepping" log files to keep an eye on your script and what it is doing.

This all sounds like a relatively easy process, and for the most part it is, but it is very very tedious. You simply must try to think outside the box when you are implementing the logisitics of this program. I haven't quite told you everything but you do now have enough information to detect and block bad bots.

I wanted to make the script that I use available to the public (for free of course). The unfortunate reality is that it would take too much effort to tidy it up the code and make documentation, so I'm going to have to refrain from doing that for now, unless I get enough requests for it.

I hope that this will shed some insight on how to block bad bots, because you really need to protect your content from thieves, email harvesters too! Your thoughts and feedback is welcome.

Googlebot Does Look at CSS and Javascript Include Files!

July 18 2007, 2:19pm in Google
0 Comments

by Darrin J. Ward:

Although this is not a new topic, we still see a lot of people attempting to trick search engine robots by using JavaScript "include" files in order to perform nasty redirects or to set a particular element's visibility:none (via JS or CSS) - thus making it invisible to users but visible to search engine robots. Similarly, we see a lot of people that use CSS to set the H (H1, H2, etc) family of elements to a much smaller font size than that of their natural appearance.

The fundamental premise of such implementations is normally that the search engines do not actively look at these include files, thus the "tricks" will remain uncovered. Such an assumption would be incorrect.

It's not exactly "new news" that search engine crawlers do indeed look at these files. In fact, I distinctly remember posting about Google's crawler making hits on .js (JavaScript) and .css (Style sheet) files years ago, literally. To prove it: Here are some hits taken from the raw access log files from this very blog (which has only on this new domain for a number of days):

66.249.72.20 - - [16/Jul/2007:18:27:44 -0400] "GET /inc/js/share-this.js HTTP/1.1" 200 1178 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.72.20 - - [16/Jul/2007:18:28:40 -0400] "GET /inc/js/prototype.js HTTP/1.1" 200 14471 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.72.20 - - [15/Jul/2007:19:15:47 -0400] "GET /inc/css/styles.css HTTP/1.1" 200 2180 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.72.20 - - [17/Jul/2007:22:22:24 -0400] "GET /inc/css/styles.css HTTP/1.1" 200 2264 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

How do I know that these are actually from Google and not a fake? Simple. I take the IP 66.249.72.20 over to the ARIN IP Whois tool and see who owns that IP. OK, so how did I actually get to see these entries? Well, even though SpyderTrax is a great tool for checking on robot activity at the page level, it doesn't show details on hits to these .css and .js files. So. I logged into my server via SSH and performed the following command on my access log:

grep 'Googlebot' FILENAME | grep '.js|.css'

This command shows me all of the hits that contain "Googlebot" along with either ".js" or ".css". If I only wanted to see one or the other, the I would use only '.js' or '.css' for that last part (not escaping backslash).

So, in writing this, I was wondering when all of this activity actually started. I know it's been going on for years. Lucky for me, I'm a fanatical Analytics fan (not Google Analytics), and I know the value of being able to retroactively look at Key Performance Indicators - So I have log files dating back to the start of 2003. And, not just on one site - but on enough sites to actually take a peep and see when Google's activity might have started. So I did.

64.68.89.138 - - [23/Mar/2004:22:49:08 -0500] "GET /includes/js/nav/menu_com.js HTTP/1.1" 200 21960 "-" "Googlebot/Test"
64.68.89.138 - - [24/Mar/2004:01:05:20 -0500] "GET /includes/js/functions.js HTTP/1.1" 200 1085 "-" "Googlebot/Test"
64.68.89.167 - - [25/Mar/2004:08:32:50 -0500] "GET /includes/js/nav/exmplmenu_var.js HTTP/1.1" 200 3406 "-" "Googlebot/Test"
64.68.89.182 - - [25/Mar/2004:21:31:06 -0500] "GET /includes/js/nav/menu_com.js HTTP/1.1" 200 21960 "-" "Googlebot/Test"
el64.68.89.182 - - [26/Mar/2004:04:09:04 -0500] "GET /includes/js/functions.js HTTP/1.1" 200 1085 "-" "Googlebot/Test"

These are the first hits that I have tracked from Googlebot. Admittedly, I only looked at 2004, because I started with that year and the log files were so large they took forever to process. Obviously, you can see that they were using the "Googlebot/Test" User-agent then. But an IP Whois confirms that it's Google's IP block. So - it would appear as though there was a three day test or so going on at that point. One week before my birthday.

I had intentions on stripping out all valid hits from a Googlebot over the last few years and plotting a graph to show activity levels, but there's work to be done and I'm not sure something like that would have all that much value, even if it is super-interesting.

So what is the take-away from all of this nonsense? Simple. Take a look through your JavaScript and CSS files to make sure that they validate and that there's no functions that might accidentally perform redirects in what might be considered a sneaky way. I'm not worried necessarily about re-styling H1 or H2 tags with CSS - I do that myself. However, I wouldn't ever have them 100 pixels off screen or invisibly small, because that's obviously very easy to detect. Of course - only the big G know's that they do with those files for sure!

Contact / Submit RFP
Latest Blog Posts
© Copyright Darrin Ward / De Ward Group, LLC. - All Rights Reserved | Website Design by LinkShape