Search engines use a page’s hash value (or “fingerprint”), which is computed at the time the page crawled to determine if it is unique. If a search engine crawls another website and finds that pages have the same hash value as others, it concludes that the two pages (URLs) are equivalent. This process is called canonicalization.
Problems occur when search engines record the same page hash value in what they believe is two distinct locations. The locations could be on different IPs or on different hosts. In this case, the search engine may conclude that one or both websites have violated their terms of service by producing duplicate websites.
The canonicalization process is further complicated when the search engine crawls two equivalent pages at separate times. If a modification has been made to the page prior to the engine’s second crawl, the search engines will see different hash values for what would have been identical pages. This, condition can result in a near-duplicate content penalty resulting in a ranking reduction.
Types of Redirects
Because search engines choose only one URL to rank, they must determine which of potentially many equivalent URLs is best. In those cases where the search engine may be confused, they may not choose well. In worse case scenarios, search engines may misapply ranking attributes or consider the URLs as SPAM.
The Hyper Text Transfer Protocol (HTTP) permits several types of redirects. Many were designed for general use. But just because a redirect works in a browser does not mean it provides the required guidance for search engines.
The 2 most common redirects are alias and 301. An alias is defined at the DNS (Domain Name Server) level controlled by the host. The 301 is defined at the host level. When an alias domain is processed by a browser address bar, the domain entered remains the same. However, if a domain is 301 redirected, the domain entered changes to the target domain.
Although both work, the alias relies on the search engine to resolve the canonicalization issues and can result in undesirable conditions including:
- Split PageRank between the different URLs
- Keyword rankings on different URLs.
- Duplicate content penalties
Given sufficient time, search engines typically resolve these issues; unfortunately, damage to the website’s rankings may have already occurred.
301 permanent redirects, on the other hand, provide specific direction to the search engines. This redirect (not to be confused with a 302 temporary redirect) is considered best practice.
www and non-www URLs are a special case. When a domain is provisioned at an internet host, DNS records are created. These DNS records can exist for www or non-www or both conditions. If both www and non-www DNS records are written, one is considered an alias of the other. In this case a 301 host level redirect should be applied to the alias. For example, a non-www URL will redirect to a www URL. This best practice method eliminates search engine confusion.
The only way to prevent Search Engines from “guessing” how to resolve a domain name and to prevent the risk of website penalties is to use a 301 redirect. The 301 redirect provides information as well as guidance to a Search Engine on how to resolve a domain name to the target location. If the 301 redirect is not implemented, the results may be undesired or potentially negative for a company’s website.