April 29, 2005
Future PageRank Helps Reputation And Trustworthiness Shine Over Artificially Inflated Search Results: Google TrustRank
In March 2005, Google registered the trademark 'TrustRank' with the U.S. Patent and Trademark Office (USPTO). What might this tell us about Google's forthcoming initiatives and how might this trademark's application, and its potential functionality, fit alongside the existing Google 'PageRank' feature?

Photo credit: Diego Sapriza
PageRank (PR) is at the very core of the Google search engine and is a system of Web site measurement that Web publishers are, typically, obsessed about - in particular how high their Web site's PR is. In very simple terms, PR evaluates and ranks Web sites according to a computed value determined by the number of other sites linking to them.
So, although Google PR determines the 'importance' of a Web site, it does not determine it's value, in terms of the trust-worthiness of the content on a site and of the site overall.
Indeed, spam merchants have been able to exploit this high level of dependence on the number of links to and from a Web site to inflate artificially, through various devious means, the Google PR of their own sites, thereby making them appear higher in search results.
This is where 'TrustRank' may come in and a paper published by researchers at the Stanford (alma mater of the Google co-founders) Digital Library Technologies group last year, called "Combating Web Spam with TrustRank" (.pdf), and recently made available on the Stanford server, may provide a clue as to what it may be all about.
The paper is extremely technical and a full read-through is only really recommended for those who have a deep understanding of algorithms and computer science.
Here is a brief abstract for the rest of us:
Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine’s results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam.
We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good.
In this paper we discuss possible ways to implement the seed selection and the discovery of good pages. We present results of experiments run on the World Wide Web indexed by AltaVista and evaluate the performance of our techniques.
Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.
The paper then goes on to present the research methodology and findings in the following order:
1. We formalize the problem of web spam and spam detection algorithms.
2. We define metrics for assessing the efficacy of detection algorithms.
3. We present schemes for selecting seed sets of pages to be manually evaluated.
4. We introduce the TrustRank algorithm for determining the likelihood that pages are reputable.
5. We discuss the results of an extensive evaluation, based on 31 million sites crawled by the AltaVista search engine, and a manual examination of over 2,000 sites. We provide some interesting statistics on the type and frequency of encountered web contents, and we use our data for evaluating the proposed algorithms.
As the free Web that we know today becomes increasingly chaotic, over-powering and untrustworthy, TrustRank may become an important factor in its long-term survival as a global information repository.
Update - Sunday April 30th 2005
New Scientist reports:
"Now Google, whose name has become synonymous with internet searching, plans to build a database that will compare the track record and credibility of all news sources around the world, and adjust the ranking of any search results accordingly.
The database will be built by continually monitoring the number of stories from all news sources, along with average story length, number with bylines, and number of the bureaux cited, along with how long they have been in business. Google's database will also keep track of the number of staff a news source employs, the volume of internet traffic to its website and the number of countries accessing the site.
Google will take all these parameters, weight them according to formulae it is constructing, and distil them down to create a single value. This number will then be used to rank the results of any news search."
Reference: [via SearchEngineWatch] [ Read more ]
Conversation Tags:
The name of that technology is brilliant even if they do not use it.
Marketing their search product with the concept of "uniquely democratic web..." and now with the concept of trust.
You can see how effective the marketing is by the post titles various blogs are using to describe the technology.
Related Articles
April 23, 2005
Spam Fighting At The Source: How To Prevent Junk eMail From Ever Being sent Out
Email spam filters have recently reached such a level of reliability and efficiency that any further development risks killing off legitimate emails from reaching you.
Photo credit: Adam Ciesielski
However, these filters have not cured the global spam epidemic - it's still a huge business for the unscrupulous... read more
April 11, 2005
Free Disposable Email Account With RSS: PookMail
Here is the first disposable email account service anybody can use. From any computer and operating system.
Simple, free and fun-to-use, this online service is the perfect fit each time you would like to give out an email address but without using your own personal one.
Photo... read more
November 4, 2003
Removing Spam From Blogs
I was just discussing today the issue of how to rapidly eliminate the number of spam messages posted by unscrupolous marketers on Web sites and blogs that provide spaces for comments at the end of articles, and making some recommendations to the Communication Agents on how... read more
February 7, 2003
Confessions of a Spam King Revisited
Guest Article
by Wade Andrews
Silicon Hills Media Group
"Due to responses for more from the Spam King, I conducted another interview for this Christmas edition. It also seems to be a favorite time for the Spam king to work. You'll see what I mean during the interview."
Wade... read more
March 1, 2003
Spam Conundrum
Spam, Anti Spam Software and Anti Spam Techniques
http://www.vicomsoft.com/knowledge/ reference/spam.html
= must have
Mini-Guide
FREE
Vicomsoft has just published an interesting and informative document designed to raise awareness about Spam and offer practical solutions to email users. This document may be found at:
http://www.vicomsoft.com/knowledge/ reference/spam.html
If you want to become a... read more
October 7, 2002
Mini-dossier: What to do to limit junk email, spam and email viruses
by Luigi Canali De Rossi
September 2002
Here are the best solutions available. I have researched up and down and I did find some interesting answers:
1) You may wish to subscribe to a service like spamcop.net they operate a service that can track the originating source. Their address... read more