Google Chaos

December 13th, 2008

What happens when your website falls out of Google’s index?  Most people react with panic.  But after seven (7) years of reading forum threads whose contributors have suffered similar fates, we know panic is the last thing you should do.

From the outside looking in, Google seems like a well-behaved giant.  Users rarely see Google errors.  And Google is not in the habit of issuing press releases when they do have a technology issue.  However, those who follow Google closely know better.

Google has had a history of technology goofs; many of Google’s updates don’t go as planned.  And for a few unlucky souls whose livelihoods are tied to the ‘giant’ the world comes crashing down when Google goofs.  When Google Chaos happens:

  • Websites, for no apparent reason, loose significant rankings.
  • Pages appear to be de-indexed.
  • And no matter what the Webmaster does to try to reverse the condition, in the short term, Google chaos persists.

Self Inflicted Wounds

To be fair, not all of these circumstances are Google’s fault.  Sometimes, Webmasters inadvertently do something that creates problems for Google.  Here is a short list that covers some fatal moves:

  1. A Webmaster decides to create a new home page and changes the URL to www.domain.com/home.  Furthermore, the Webmaster uses a 302 server re-direct from www.domain.com to www.domain.com/home.  And, finally, the Webmaster strips the content off the old /index.html page.  All works perfectly in a browser but Google sees something entirely different - Google no longer sees content on www.domain.com and does not necessarily follow a 302 server re-direct.  As a consequence, /home never inherits the SEO value associated with www.domain.com.
  2. A Webmaster accidentally makes an adjustment to the robots.txt file that disallows a primary directory.  Google still know about the pages but stops ranking all pages in that directory.
  3. A Webmaster makes an adjustment and adds a new, slick looking JavaScript menu.  Google does not typically read JavaScript and no longer follows the links in the menu.  As a consequence, ranked pages disappear from Google’s index.

Google Volatility

In other cases, Webmasters report wide swings in rankings.  This symptom is not unusual.  In May 2008, Matt Cutts, an engineer on the Google Spam team, went on record saying that the Search Engine Titan is currently conducting a major experiment, code named “Dewey.”  Webmasters in the SEO community have observed that pages with little or no PageRank, which have never shown up in the top 100 SERPS (Search Engine Result Pages), are now displacing other sites that had ‘page one’ rankings for more than five (5) years. Others observe that site rankings fluctuate +/- 30 positions at different times of the day and some have reported ranking fluctuation of more then 50 positions within the same day.

Position Research had been observing these conditions for many months. We speculate that Google is performing live testing similar to the tests described in a recent white paper titled “Search Engines that Learn from Implicit Feedback.”  The premise of the paper is that search engines can determine website relevance by analyzing what listings users do not click.  This testing requires that Google bring websites that may not otherwise deserve high ranking into a top position for a short period of time in order to record user behavior.

Data Loss

Many times, something happens within the Google system and data gets lost.  You might think: “How could data get lost”?  The answer has to do with understanding Google’s spidering and reporting network. Based on latest reports, Google maintains over 200,000 spidering servers.  These servers are constantly crawling website pages.  When you consider that over 100,000 new website pages are added daily, and that the current number of website pages is estimated at over 80 BILLION, you begin to understand the enormity of the task.  Other servers are consolidating and synchronizing this information so that data can be compiled into ranking data.

There is another set of servers dedicated to serving search results to the public.  These servers are clustered in datacenters spread throughout the world.  At last count, Google has over 40 datacenter with more than 750 IP addresses, each comprising of several servers.  Check out http://www.seocritique.com/datacentertool/ for a humbling view of Google datacenters.

Now we all know that (data) electrons are obedient most of the time, but not all the time.  Hard drives crash.  Data packets get lost during transit from one location to another.  And some times hardware fails during critical transmissions.  You can begin to understand how enormous Google’s task is and how easy it might be to loose data during all the consolidation / synchronization / compiling steps involved.

So what happens when data is lost?  That depends on what data is lost.  If the lost data is compiled data, then Google simply recompiles.  But if the data is original data, then Google must re-gather and then re-compile.  This can take time - weeks if not months.

Filter Traps

Google filters are another story.  In part, Google compiles page data to determine page attributes - and Google collects over 300 unique attributes for each page.  If Google determines that there is a combination of negative attributes to merit a ranking adjustment, then rankings decline.  But these attributes are only reasonably predictive when in combined with other attributes. 

Google filters are based on statistics - and in statistics, the larger the sample, the higher the correlation.  In a perfect world, Google would have all the page attributes it needs and unlimited computing power to reach very high correlation coefficients.  Under these circumstances, Google would be able to detect the bad from the good websites with 100% accuracy.  But in reality, Google doesn’t have enough attributes or computing power.  Therefore, their filters are less than perfect.  In other words, Google presumes that if ‘it’ walks like a duck, quacks like a duck, and smells like a duck, ‘it’ is probably a duck, but not certain.  ‘It’ may be a goose.  So to some degree, Google’s filters are throwing some ‘baby’ out with the ‘bathwater’.

Fireworks really start flying when Google introduces a new filter in an effort to improve rankings.  Invariably, some website pages are collateral damage.  It gets more interesting when Google starts turning the dials on these filters.  Website pages pop back in and out like popcorn.  The Webmaster’s hope is that Google engineers optimize their filter algorithms and minimize collateral damage.  But there are always some pages that get the ’short end of the stick’.

Minimize Google Chaos

So what can you do to avoid Google Chaos?  First, recognize that Google offers rankings for free and as such is not obligated to be bug free.  Second, realize Google ‘love’ goes to those whom Google chooses (through its complex algorithms).  And third, benchmark, benchmark, benchmark.

Google bugs and Google ‘love’ are things you cannot control.  But benchmarking is something you can control because when you record and log metrics (i.e. critical observations) you can better determining what your course of action you should take.

Here is a list of metrics that you should record:

  • Keyword rankings - Track Google rankings on a daily basis - weekly is not good enough because you need to know if a poor ranking condition is temporary or permanent. You also need to know the exact date when rankings declined so that you can compare your date with that other Webmasters who may have experienced a similar condition on the same date.
  • Google ‘cache’ query - Make sure Google is caching your pages and check cache dates.
  • Google ‘info:’ query - Make sure Google is reporting ‘info:’ query results. If your pages does not show results for an ‘info:’ query, something is wrong.
  • Google ‘URL’ query - Make sure Google is reporting results when a page URL is entered. Your page URL should be at or near the top of the results.
  • Google Webmaster account - Record changes to and observations reported by Google.
  • Record and log all website navigation and infrastructure changes. This can include changes to robots.txt and sitemap.xml files, your Google Webmaster account, and server changes.

Unexpected Google results for any of the Google queries may be a Google glitch or it may be indicative of something more serious.  These metrics may be performed on a weekly basis, daily if you notice something unusual.  A Google Webmaster account should be check on a monthly basis; more frequently if any other metric is concerning.

Making Sense of Google Chaos

If Google Chaos strikes your website and you have benchmarked Google metrics, you will be armed with the kind of information necessary to determine your next course of action. 

If your site looses rankings, the first thing to determine is whether your Google metrics have changed and if  your experience correlates with other webmasters.  Check the forums to see if something unusual is happening.  If your observations seem to be isolated, then the problem is likely to be self-inflicted.  Check your logs and start reverting to known stable conditions.  Then allow Google to react to these changes, which make take week or months depending on what changes were made.  As a rule of thumb, the time Google takes to react is the time Google will need to react again.  Observing your Google metrics will help determine whether your actions are making a real difference.

If your experience is not isolated and others are reporting the same conditions, it is probably a Google error.  Don’t panic.  Most of the time, Google corrects its mistakes within 1-2 weeks. 

But if the reason for Google’s reaction is based on a new filter, hang on to your hat.  It may take much more time for Google to sort things out.  And even when it does, your site may be part of an ‘elite’ few that is considered acceptable collateral damage.

How can you tell if your site is part of collateral damage?  This is pretty tough - it is a process of elimination.  First, wait and make sure that a Google bug has not caused your situation.  The forums can help determine this condition.  Second, make sure Google filters are optimized and stable.  Again, forum activity will help determine this condition.  Only after failure of these 2 conditions should a more radical approach be considered.

If the forums are quiet and your site is still in Google Chaos, start an extensive research effort, which considers any and all page attributes.  Check outbound links.  Check inbound links.  Check duplicate and near-duplicate content.  Check everything you can think of and start making site adjustments.  It just may be something really subtle that needs to be changed so that Google’s filters think you’re website is a ‘goose’ and not a ‘duck’.

Google Still Experimenting with Customized Search

November 3rd, 2008

Last week, Position Research discovered some more Google customized search conditions. Searching for the term “ira custodian” in Google.com produced a SERP (Search Engine Results Page) “Customized for San Diego metro area”.

Google geocentric customized results

Google geocentric customized results

Clicking on More details produced this page:

Google's description of geocentric customized results

Google's description of geocentric customized results

A series of experiments revealed more observations.

  1. No matter how many times the search button (or retun key) was executed, the geocentric customized results stayed the same. There did not appear to be a way to attain ‘normal’ results.
  2. When cookies were turned off, ‘normal’ SERPs were shown.
  3. When Google.com/ie is used, ‘normal’ SERPs were shown.
  4. The keyword had low popularity but other keywords with similar popularity did not produce geocentric customized SERPS. So the reason for this condition does not seem to be associated with keyword popularity.
  5. Searching for keyword variations produced these results:
    Table showing different keywords and Google SERP type

    Table showing different keywords and Google SERP type

  6. A view of Google maps for “ira custodian” does produce results in San Diego. A search for “self directed ira custodian” produce similar results, and the listing was highlighted as if it were a sponsored link - but it wasn’t.

Chasing this last condition a little further reveals that Google is reporting a San Diego based company with a snibit of text (shaded content below) from an article located on another website page with a link to the San Diego based company. Apparently the snibit of text is a referring website page.

Local listing with snibit of text from referring page

The study does suggest that Google is trying to produce geocentric customized results for some keywords. I am only speculating but perhaps Google is attempting to producing geocentric customized results when a website page matches four filters:

  1. The keyword phrase has some element of geocentricity.  In other words, searchers using a particular term would be looking for local companies.
  2. There are website pages that would appear near or on page 1 of normal SERP results, and
  3. Some of these pages/websites are listed in Google maps for that keyword phrase, and
  4. These pages/websites are from the same geocentric area as the searcher.

Terms like “custodian” or “plumber” or “electrician” would be terms that are often used as part of a keyword phrase that imply geocentricity.  If Google presumes user of these terms are more interested in geocentric results, then Google could customize the SERPS when appropriate.  Google could blend the results from its Local business results with normal SERPs.

So, as an experiment, consider the term “plumbing”.  I searched for “plumbing san diego” as my computer is located near San Diego.  Google’s SERP included Local business results.  I found one that had a title that included the term “central”.  Now, combining the terms to create a keyword phrase of “central plumbing” should produce Google geocentric custom results for my computer .  My hypothesis was correct (see below).  Note, your results may be different as it is likely that you are located in a different geocentric area.

Google SERPs produce geocentric customized results

Google SERPs produce geocentric customized results

And, as predicted, the keyword phrase “central plumbing maryland” produces a normal result because the term “maryland” conflicts with my local IP address and cancels Google’s geocentric assumption. 

I concede that not all terms seem to follow this pattern. There may be more to this story. Further research is required to determine which keywords/terms may produce these kinds of results.

Google Halloween Goof!

November 3rd, 2008

Starting October 31, Google pushed new data to their ranking index.  It did not take long for the Webmaster community to take notice.  Almost immediately, many webmasters noticed a dramatic change in their rankings.  Their Google home page rankings vanished.  Rankings for interior pages were not affected; just home pages.

In a rather uncharacteristic move, Matt Cutts of Google made a contribution to the webmasterworld.com thread dedicated to these observations (http://www.webmasterworld.com/google/3777991.htm).  On November 1, 2008 (Saturday), Matt Cutts made this forum post:

“My concern is this one could also not be understood quickly and therefore last longer than it should before they fix it.”

I think this was a short-term issue and things should be back to normal pretty soon (if not already).”

A few hours later, Matt Cutts again posted:

“b2net, I don’t consider those rankings indicative of anything coming in the future. Some data went into the index without all of our quality signals incorporated, and it should be mostly back to normal and continuing to get back to normal over the course of the day.”

As of Monday morning, many rankings that were filtered out on Friday are restored.  But the story is not over.  The forums are alive with speculation that Google is doing something and this ‘goof’ was just part of the story. 

Historically, Google makes very large changes 2 times a year: one in the spring/summer and another in the fall/winter.  The fall/winter updates usually coincide with an October/November time frame.  So it is not surprising that speculation continues that this Google ‘goof’ is just part of a bigger change that Google is just beginning to roll out.

UPDATE Dec. 1, 2008

After observing Google for more than 30 days, Google’s goof persists.  Forums continue to report anomalous conditions.  Some report missing home pages that had previously ranked well.  Now these pages are not showing up in a simple Google URL query.  AND, interior pages that had never ranked well before appear to be taking the place of home page rankings (just lower rankings).

Others report that sitemap.xml pages are not getting indexed.  It seems that Google is deliberately excluding these pages.

Our study shows that many pages are responding properly to Google ’info:’ and ‘cache:’ queries but not to ‘URL’ queries.  In the past, we have seen this condition when:

  1. Google is re-building portions of their index and the ‘cache’ and ‘info’ show up prior to Google rankings OR
  2. When Google is making new pages available for ranking.

Based on these and other’s observations, we believe Google is still rebuilding their index with data lost during their October 31, 2008 goof.

Google Uses Customized Search

September 29th, 2008

Starting about August 8th, Google.com started to incorporate custom results in their standard query Search Engine Results Pages (SERPs). Google provides a small notice in the upper right corner of the screen to notify users of this condition.

This is a shift for Google. Prior to this point, Google would only provide “Customized ” results when users were signed in. But now, Google is customizing some results for the public.

Clicking on the “More details” link reveals this page.

Some simple testing reveils that Google is tracking browser sessions with cookies. Disabling cookings on a browser prevents Google from delivering “customized” results. We also noticed that Google started a new cookie session when the browser was launched again. And, clicking the search button after “customized” results are shown returns the results to normal (”un-customized”) rankings.

We tested a few keywords to see how much change was associated with Google’s “customization”. Our tests show that Google changed rankings no more than 2 positions on page one. In one case, an indented listing was removed.

So what does this mean for Search Engine Optimization (SEO). In a word, nothing! Although it is true that there will be some ranking fluxuation depending on whether Google shows “customized” rankings, the level of fluxuation is well within the normal “noise” levels associated with Google results.

Position Research will continue to log daily rankings based on Google.com non-customized results.

Keyword value in URL MYTH!

August 8th, 2008
Does Google Award Value for Keywords in a URL? The short answer is technically: MAY BE. Practically: NO!

Many SEO companies preach the importance of placing keyword in a URL or domain.  And domains which contain keywords are a highly sought after price.  There is no shortage of SEO commentaries on the subject – all professing the value of keywords in a domain or URL.  But in reality, the story is quite different.  Here is the proof.

What does Google Say:

Matt Cutts of Google is reported to have commented on the subject here: http://www.seo.com/blog/google/matt-cutts-does-domain-roundtable/

He said keywords in the domain carry weight with users, and for this reason, Google also gives some weight to a keyword in the URL and/or domain name.

Based on this comment and the overwhelming support for this theorem, one would suspect that the case is closed.  Not so fast…..

A few simple experiments will reveal the reality of this theory.  Try this query in Google “4192594” (without the quotation marks).  I know it is not much of a search word, but it does illustrate the point.  Now, among the nearly 300 reported results, find a listing that only includes the term 4192594 in the URL.  Then click to see Google’s Cached page and observe Google’s comment:

These terms only appear in links pointing to this page: 4192594

Does this mean that Google is ranking the page because the term was found in the URL?  NO.  According to Google, the page received rankings because the term was found in the anchor text.  Here is the quote from Google’s page (emphasis added): http://www.google.com/support/bin/answer.py?hl=en&answer=427.

Sometimes Google includes pages in your search results that don’t contain the word or phrase you searched for. This can occur even when you perform a phrase search. In evaluating the merit and relevance of a page, Google looks not only at the content of the page itself, but also at the anchor text of links that point to the page. If links pointing to the page contain the phrase you searched for, Google may return the page as a match for your query. When this occurs, our cached copy of the page displays the message “These terms only appear in links pointing to this page:”

In every case I checked, Google’s cached page had this line indicating that the only reason the site was reported in the SERPs was that there were backlinks with the term in the anchor text.  There were no pages listed that either:
  1. Did not have the term somewhere in the body or title
  2. Did not have the qualification statement “These terms only appear in links pointing to this page: [term]”.

If Google were awarding value based solely on the value of a term in a URL or domain, There would be at least one ranked page with this condition.  After examining many listings for different terms, I can make only one conclusion: In and of itself, placing keyword terms in domains or URLs has no ranking value. Google may record them; they may be awarding some value, but it does not appear Google is awarding ranking value.

Indirect Value:

But can so many SEOs be wrong.  Can Matt Cutts be wrong?  No – they are not wrong – they just are not telling the whole story.  A domain or URL that includes the keyword does have value – just not for the reasons most think.

Depending on the circumstances, some webmasters use the target href as the anchor text.  And if the target href happens to have the keyword in it, Google will see the keyword in the anchor text and count it as link reputation.  That is exactly what Google is reporting in the Cached pages for our “4192594” query.  Somewhere, some Webmaster decided to copy the href and use it as their anchor text.

So it may be more correct to say that keyword in a URL or domain have indirect value.

Cuil.com - A New Search Engine

July 28th, 2008

A new search engine has entered the seen: www.cuil.com (pronounced “cool”) that promises to shake things up.

Cuil has been designed and developed by X-Google engineers and backed by $33 million in venture capital. The new search engine bostes an index of 120,000,000,000 (120 billion) web pages and claims to have better relevancy than Google.

After testing the search engine, we found several short comings.

  1. The search engine index appears to be aging. The new Position Research website went live more than 6 weeks ago, but Cuil.com continues to index and rank website pages that no longer exist.
  2. The search engine Preferences seem to be rather primitive allowing for just a few user options: Safe Search and Typing Suggestions.
  3. Search results are displayed in either a 2 or 3 column presentation with a thumbnail graphic. The graphic may or may not be representative of the website. It appears that ranking order is across the page, then down.
  4. The length of each listing is about 12 lines of text. This means only 2-3 rows of listings are visible before the vertical scowl bar must be used to view more listings. This page topography may be uncomfortable for some users.
  5. When comparing page one rankings for a few keywords with Google, the results are completely different. There was no overlap at all. Further testing will be required to establish Cuil.com’s relevancy.

On a positive note, Cuil.com does have a nice “Explore by Category” option that appears for keyword phrases cuil.com believes are alternative or sub categories. The option allows users to choose alternative keyword phrases by category.

There is no doubt that cuil.com will continue to improve their index and rankings. With their strong pedigree funding, we expect to hear more from cuil.com in the future.

Google announces improved indexing of Flash

July 1st, 2008

Google announced on July 1st, 2008 via the Google webmaster blog that it has improved the indexing of Adobe Flash Files. This information is very intriguing as many Webmasters/SEOs knew that Google could read into some Flash movies, but as to how far they did remained a mystery. Here is what Ron Adler and Janis Stipins (software engineers on Google’s indexing team) had to say regarding the algorithm update.

“We’ve developed an algorithm that explores Flash files in the same way that a person would, by clicking buttons, entering input, and so on. Our algorithm remembers all of the text that it encounters along the way, and that content is then available to be indexed. We can’t tell you all of the proprietary details, but we can tell you that the algorithm’s effectiveness was improved by utilizing Adobe’s new Searchable SWF library.”

What effects will this have? Ultimately, This will mean that the size of Google’s index will increase. This may also mean that in some market spaces there will be site’s that formerly were not indexed. In rare cases, these flash site’s may compete for your market space’s keywords.
We will be watching the effects of this algorithm closely in the upcoming months and keep our clients abreast as to any changes that need be committed to maintain top rankings.