Thursday, May 19, 2011

The Google Panda Guide - Part 1: What It Is, How It Works, Collateral Damage

The Facts

MasterNewMedia, is officially among the Panda victims.

MasterNewMedia was hit first by the new Google Farmer update, also known as "Panda", on February 24th, and then again, even more strongly on April 11th 2011.

MasterNewMedia lost key rankings for all the most popular and economically relevant articles, which disappeared from the top Google search engine results and were often replaced by lower quality results, including sites that scraped content from our own articles, and then republished them as theirs without any credit, link back or attribution. Now Google is listing, among other content, some such low-quality content in place of MasterNewMedia's own.

rank-drop-google-panda-for-key-guides-masternewmedia.jpg.jpg

The loss in ranking inside Google SERPs for three key guides published on MasterNewMedia - the first bar indicates the original ranking before Google Panda, the second bar the ranking for the same content after the Panda update.

So, while the overall traffic loss has not been immediately evident, the drop in Google-referred search engine visitors to the whole set of MasterNewMedia guides which provided our core advertising revenue channel, was truly dramatic.

Just on the AdSense front, revenues have skydived first after the first February bit with a solid 25% drop and then again after the April update by another 30%. Overall, that revenue channel has lost more than 50% of its earnings in the arc of two months.

Though I had experienced a strong manual Google penalty in the past, which brought my revenues to near zero (at the time 90% of my revenue was based on AdSense - today, gladly is only 60%), for over two weeks in 2007, this Panda has been by far the worst and longest Google penalization I have ever come to witness.

A very interesting piece of data that may help web publishers, who like me, have multiple language editions on the same domain (I have the English, Italian and Spanish editions under the .org domain and the Brazilian edition on a separate domain). Panda will NOT discriminate the different editions and will cap everything you have under that domain, no matter what is the language.

It follows, that if my experience is in some way representative, other web sites that are primarily in other languages other than English (and for which the Google Panda algorithm has not yet been released) may be already under a Panda penalization, simply because of the English portions of their web site that exist in English.

What is Google Panda

Panda is a new Google algorithm designed specifically to target web sites with "shallow or low quality content."

In the original words of Google on February 24th:

"...in the last day or so we launched a pretty big algorithmic improvement to our ranking--a change that noticeably impacts 11.8% of our queries--and we wanted to let people know what's going on.

This update is designed to reduce rankings for low-quality sites--sites which are low-value add for users, copy content from other websites or sites that are just not very useful.

At the same time, it will provide better rankings for high-quality sites--sites with original content and information such as research, in-depth reports, thoughtful analysis and so on.

We can't make a major improvement without affecting rankings for many sites. It has to be that some sites will go up and some will go down. Google depends on the high-quality content created by wonderful websites around the world, and we do have a responsibility to encourage a healthy web ecosystem.

Therefore, it is important for high-quality sites to be rewarded, and that's exactly what this change does."

Source: The Official Google Blog

If a web site is under a Google Panda penalization, it will lose its rankings inside search engine result pages and it will lose a significant portion of its search engine referred traffic.

If a web site is hit by Panda and uses advertising, in particular Google AdSense, the negative economic impact may be quite large, since such ads specifically rely on search engine referred traffic.

As a matter of fact, in some cases, you may only see the impact of Panda by looking at a sudden and very significant drop in AdSense revenues as well as the specific loss in rankings for specific keywords.

In the case of MasterNewMedia for example, you cannot really easily detect the Panda hit, by simply looking at the traffic trendline, as you can see here.

The Panda hits are both well masked by other traffic. For example the peak in traffic around March 11 is due to the Japanese Tsunami that triggered a flock of visitors to my old 2005 Tsunami guide - Ihave since moved all of the Tsunami content to a separate domain.

You really need to check the traffic rankings and revenues for individual articles to see what is actually happening.

What is Google Panda Really After

Google Panda, in the words of Google is after "thin", shallow, useless content. But what do these words mean, and in practical terms, what is the type of content that may indeed trigger a Panda penalization on your site?

It looks like the judge is not yet out on this one with a final verdict, but there are enough indicators, and visible patterns that some speculation about the type of content that may trigger off excessive negative Panda points can indeed be made.

In general, I would expect Google to be after web sites that:

a) provide little valuable content, by pushing tens or hundreds of short blog posts, and heavy SEO tactics to bring lots of traffic and then monetize through AdSense. These are the so-called MFA sites (Made for AdSense).

b) produce or have produced lots of short news, but are not categorized by Google as official news sources.

c) have forgotten shallow content pages in their sites, which provides no value to readers

d) have tens or hundreds of tag, archive or category pages which have never been curated, maintained or optimized for SEO and which may carry same titles and descriptions while providing again little or no value to the reader.

e) scrape, copy or replicate without authorization and credit content from other sites and republish it as theirs without adding any attribution or value.

How Does Google Panda Work

Apparently (since there is no official way to find out from Google), Google Panda works by scoring "negative" points for any given web site and then applying a penalization "cap" over the whole site if a certain threshold is reached.

This happens even if the amount of individual pages that triggered the Panda points is only a fraction of the total web site page count.

Read what SEOmoz reported on this:

"...We separated the content that gained the most traffic to compare against the content that had lost the most traffic, comparing signals & looking for trends.

The results seemed random. Very short video descriptions would rank quite well, while long, detailed original transcriptions and guides were suffering.

Every time we thought we'd found an influencing signal, we'd go on to find enough exceptions to negate it.

It became abundantly clear that Panda does not work by filtering out individual low quality content as was originally implied.

It works by punishing entire domain names if an undetermined percentage of the content on that site meets the undefined "low-quality" criteria.

Soon after we came to this realization, Google confirmed it in a statement to Search Engine Land, and an interview with WIRED."

In this light, the first key thing to understand before you start firing off at Google for having penalized your site when Panda arrives, is that the Panda algorithm penalty is triggered by a COMBINATION of factors, and never by a single factor alone.

The other important thing to understand, is that Panda does not operate in real-time. At least for now.

So, if you are caught by it, even if you hypothetically fix all that needs to be corrected, you will need to wait until the next time when Google decides to rerun Panda and update its SERPs accordingly. Such an update has already happened at least twice: when Panda was first released on February 24th, and the second time on April 11th.

When the next major update is coming, and when Google Panda will hit other language web sites besides English ones, nobody knows, but it shouldn't be long before things shake up again.

Collateral Damage

An ironic video made by an independent SEO professional to vent his disappointment for the Google Panda after-effects.

As Google Panda has hit the live web sites that didn't fit its criteria, two quite distinct groups of opinions have emerged: one, made by Google and sparse individual voices, which claim that Google search results have indeed improved after Panda, by offering higher quality sites on top of their results. The other, made up in large part by web publishers and SEO professionals, has been reporting some quite surprising evidence that the side damage caused by Panda, may have not been so marginal.

Here are a few reports from this second group:

Despite New Panda Guidelines, Google Still Burying Authoritative Results

Google Panda Algorithm Exploit Uncovered

When Google's Panda Rewards Content Theft

Hijacking Google Rankings Through Content Scraping and Cloaking in 2011

How Google's Panda Update is inadvertently encouraging even more content farms

The problem, if you have read these reports, is quite evident. SEOmoz again, frames it elegantly and in a few simple words:

"With this signal hitting an entire site instead of just its individual low quality content, the results fundamentally oppose the stated goal of search quality and fairness in attribution.

The collateral damage results in Google burying the original source of high quality content, promoting those who steal, scrape, and republish above them."

This is why the key to win this battle, is to arm yourself with patience, time to study and analyze and deep willingness to question many of your existing web publishing assumptions. Including the one that there is indeed, and MasterNewMedia really seems to be a case in point about this, the possibility that you will be hit by some of this substantial collateral damage.

Overall, the key critical arguments that are publicly made against Google / Panda are:

1) Search results have not improved across the board, and many scrapers and spammy web sites are not outranking original content creating a lot user and webmaster mistrust in Googleìs actions.

2) Quality web sites, providing deep and valuable content, have been penalized as a consequence of Panda's collateral damage. The business consequences of this, for thousands of web sites have been disastrous.

3) Google indications and suggestions on the new panda algorithm have been very vague and apparently difficult to apply as they do not refer anymore to specific issues, but address directly the user-experience factors.

To give you an even better idea of what Google Panda looks like from the trenches, here are five examples (out of hundreds of smilar ones) of the type of feedback that many webmasters are providing inside the Google Webmasters Forum and elsewhere.

1) I am not getting what I want

smartboy - Level 1 - 5/2/11

I love using Google. But after the Google Panda Update 2.0, I'm not getting what I want in the search results and as a customer, I'm not satisfied with all these crap results that are coming up.

I have been patient but my latest experience for a search has irked me so much that I came here actually to share it with others.

I was searching for info about a recently launched RIM's smartphone. When I saw the results, (http://goo.gl/7RkIA), it is clearly that Google has just fetched me 2 pages (Gizmodo and BGR) and all others are copies of these two pages. What does that suppose to mean? I needed different opinions about the phones and all I end up getting is 2 pages of info and 8 pages of crap.

This is not what the customers want. I'm afraid Panda has never worked the way it should. Please check out the page http://goo.gl/7RkIA and this screenshot (http://twitpic.com/4sf9la/full).

2) Genetic resources site disappeared

thomasoniii - Level 1 - 5/3/11

Hi, I'm a contributor to http://www.gramene.org/, a source of plant genomics information. Towards the end of March, googlebot stopped hitting our site at all, so we're quite concerned that we got caught up in the content farm dragnet.

A good portion of our data is created internally, and the rest is enhanced mirroring of other biological data + additional content. We're an NSF funded site, so it's not like having a content farm with advertising would even benefit us in any way. We simply need to ensure that this information is available to the biological research community.

Since 3rd week of March, our google traffic has dropped by more than 50%. Of course, we can't be sure it's due to the algorithm change, but when googlebot stopped hitting us, we had made no recent changes to the site, so we're at a loss to explain it.

3) Spammy web sites outranking original content

thaman - Level 1 - 5/3/11

Spammy websites are outranking sites that have the original content. Spammers are using .tk domains on a mass scale and dominating SERPs. Frankly I haven't seen something like this since 1999. These guys are getting away with cloaking through old school Javascript redirects and Google is just eating it right up. Kind of embarrassing if you ask me.

Here's the specific case study:
References: [1] Hijacking Google Rankings Through Content Scraping and Cloaking in 2011(Web)

4) Copycats Sites still our content and rank above us

mhafez - Level 1 - 5/4/11

...I run AppAdvice.com and something very strange is happening. First off, we are a quality site who publishes more than 30 original iOS application news and reviews per day. We receive more than 50,000 daily visitors OUTSIDE of search engine traffic.

Starting in April we saw our Google inbound traffic drop 60%. We started researching why, and we found that Google is not indexing any of our pages. The hierarchy still remains, indexes, and category pages, but the actual content is not being indexed.

If you take a unique title from one of our posts. Paste it into Google, you will get about 8 copycat sites that steal our content, and the ORIGINAL appadvice.com article is nowhere to be found.

...It is quite frustrating, as we spend 10's of thousands of dollars to produce this content, and Google is crediting sites who steal it from us and leaving us nowhere in search results.

5) I have no recourse to prevent the theft of my content

war3rd - Level 1 - 5/5/11

You know what really ticks me off about the latest update? My website ranked #1 for a particular keyword for years prior to the April Panda update. High quality content and thousands of legitimate organic backlinks. I don't do anything grey or black hat.

Some jerk violated my copyright, stole my content and posted it exactly as I had written it, and after the Panda update, he's now #1 and I'm #3 for that key phrase. I even filed a complaint with you guys (Google) and you responded that you weren't going to do anything about it. And this copyright infringing page only has a few inbound links, so how it could even get to slot #1 is a mystery to me.

What this tells me that not only is there collateral damage (me), but also that Google really doesn't care about copyright infringement. Presume he's generating the same income for you that I did with that placement, and it's a zero sum game for you.

I don't get the warm fuzzies when I point out that an illegal webpage has jumped the results of the original content that it replaced, which had been the leader for years, and Google said "we're not going to do anything about it." I'm venting, yes, but I feel as though I have no recourse to prevent the theft of my content and subsequent damage to my brand and image.

6) Google can't tell quality from scrapers and no-value aggregators

rt at sea - Level 1 - 5/5/11

In changing its algorithm to penalise low quality sites trying to exploit Google and users it seems that other, high quality, sites have been inadvertently and unfortunately penalised.

I run responsibletravel.com, which has been impacted negatively. We carefully select (screen) and market ethical holidays. The value we add is in editing a collection, screening them against ethical criteria, publishing 1000's of tourists reviews, and adding our own unique content. Like the vast majority of retailers/e-commerce sites we take some product copy from our suppliers sites. We then add our own unique content.

Google does not appear to be able to separate out legitimate, value added retail sites with some content from suppliers sites - like ours - from sites with duplicate content which add little or no value.

Penalising sites like ours directly contravenes the advice Google has always provided via its own webmaster tools...

7) Content thieves outranking original content

Brandon72 - Level 1 - 5/6/11

...Is Google doing anything to preserve the "priority" ranking for the original version of an article or blog post? I have been wrestling with this problem on a daily basis, and it all began with the Panda update. It is a huge problem for me. Here's an experiment I did recently:

Brandon's Plagiarism Experiment

1 -- I wrote a unique article for my website. This was brand-new content that had never existed before.

2 -- A few days later, Google had found and indexed the new article.

3 -- Once a day after the indexing, I would perform a Google search for a statistically improbable phrase from my article. I did this to see if the article had been plagiarized and posted elsewhere online.

4 -- For several days, I found no plagiarized versions of my article. So naturally, my article ranked first for the unique phrase.

5 -- But after about 30 days, I found there were nearly a dozen plagiarized versions of my article. Two of these articles outranked my own content for the unique phrase.

This never used to happen before the Panda update.

In my opinion, I think that Google will be soon able to fix these issues, of attributing original content to its effective author appropriately, as it was clearly able to do this, and much more effectively, before Panda.

It could be that the very characteristics of the new Panda algorithm inhibit somehow Google, at least for the time being, from being able to properly assess and attribute original content. Or that the new ranking and evaluation system requires more passes and time to fine tune itself to perfection.

Whichever the case, you may need to wait.

Once again, the best strategy to adopt in this case too, is the one of reporting publicly, through your web site, blog or inside the Google Webmaster Central Forums, your own experience and data as to provide greater details to Google engineers to fine tune this new beast.

The Dawn of Automatic Algorhitmic Penalization

With Google Panda, and given the impact it has had so far on web sites, Google has in fact inaugurated a new era. The age of automatic, algorithmic penalization.

With the amount of web sites existing out there and the rate at which new ones are created, it was only to be expected that Google would have found a more scalable way to deal with the huge amount of fake and spammy content that is on the web.

What this entails is that, any web site that misses satisfying Google new Panda mix, IS a potential candidate for penalization, and it WILL BE penalized. No risk, of being left out.

While Google will continue to rely on quality raters and user reports to also apply manual penalizations to web sites breaking Google own guidelines, Panda is designed to filter out comprehensively, through an automatic penalization algorithm, everything that doesn't pass through its rapidly evolving user-experience-focused filtering mix.

Even more significant could be the fact that Google has finally started to significantly decrease the effective weight of some of the classical SEO factors that could be gamed or manipulated, while giving a stronger boost to a new, data-based class of signals which are more representative of the actual user experience.

Conclusions - Part 1

Besides my very own situation, I have come to see Panda, as the major transformation and change Google has ever done to how it ranks web sites results inside its SERPs.

I am in favor of Google taking a drastically different approach to how it ranks results, as it acknowledges how much of the SERPs have become prey of sneaky SEO, link-selling and other underground approaches.

But, to be honest, given the effort I have made over the years, to keep a MasterNewMedia a high-quality content destination, I am not too sure the results are yet what people really expect. It might be that, as in the past I have made my good dose of mistakes and I overlooked variables which have come to affect the impression I gave off to search engines robots, but however impartial I may be, because I am the publisher of this site, I find it hard to come to grips with the idea that MasterNewMedia is in the same realm of the type of "thin" content sites that Panda wanted to really target.

An overzealous Panda? An algorithm still needing some refinement? Or mistakes on my end that, nonetheless the focus on good, quality content, triggered anyway Google sensitive radars?

I look forward to find out very soon.

In the next part of this article series on Google Panda, I will share with you, what is the ideal mental and psychological attitude with which to confront Panda.

From what I have learned so far, in fact, you may be in for a lot of surprises if you approach Panda just like any other past Google algorithm change. I may be wrong, but this looks to me as a completely different ball game.

What now counts, is what your readers think of your site, as they express it through their actions online. To win on this front, you need to stop thinking in terms of on-page or off-page factors and start reasoning in terms of what makes your users happier, more satisfied and wanting to come back for more.

This is in essence what Panda is all about.

Stop thinking about SEO and start thinking about your readers. Period.

Search this site for more with

16658

The Google Panda Guide - Part 1: What It Is, How It Works, Collateral Damage

The Facts

What is Google Panda

What is Google Panda Really After

How Does Google Panda Work

Collateral Damage

1) I am not getting what I want

2) Genetic resources site disappeared

3) Spammy web sites outranking original content

4) Copycats Sites still our content and rank above us

5) I have no recourse to prevent the theft of my content

6) Google can't tell quality from scrapers and no-value aggregators

7) Content thieves outranking original content

The Dawn of Automatic Algorhitmic Penalization

Conclusions - Part 1

Search this site for more with

Curated by