April 29, 2005



Future PageRank Helps Reputation And Trustworthiness Shine Over Artificially Inflated Search Results: Google TrustRank

 

In March 2005, Google registered the trademark 'TrustRank' with the U.S. Patent and Trademark Office (USPTO). What might this tell us about Google's forthcoming initiatives and how might this trademark's application, and its potential functionality, fit alongside the existing Google 'PageRank' feature?

Claudio_the_Labrador_1_by_dsapriza_350o2.jpg
Photo credit: Diego Sapriza

PageRank (PR) is at the very core of the Google search engine and is a system of Web site measurement that Web publishers are, typically, obsessed about - in particular how high their Web site's PR is. In very simple terms, PR evaluates and ranks Web sites according to a computed value determined by the number of other sites linking to them.

So, although Google PR determines the 'importance' of a Web site, it does not determine it's value, in terms of the trust-worthiness of the content on a site and of the site overall.

Indeed, spam merchants have been able to exploit this high level of dependence on the number of links to and from a Web site to inflate artificially, through various devious means, the Google PR of their own sites, thereby making them appear higher in search results.

This is where 'TrustRank' may come in and a paper published by researchers at the Stanford (alma mater of the Google co-founders) Digital Library Technologies group last year, called "Combating Web Spam with TrustRank" (.pdf), and recently made available on the Stanford server, may provide a clue as to what it may be all about.

The paper is extremely technical and a full read-through is only really recommended for those who have a deep understanding of algorithms and computer science.

Here is a brief abstract for the rest of us:


Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine’s results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam.

We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good.

In this paper we discuss possible ways to implement the seed selection and the discovery of good pages. We present results of experiments run on the World Wide Web indexed by AltaVista and evaluate the performance of our techniques.

Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.

The paper then goes on to present the research methodology and findings in the following order:

1. We formalize the problem of web spam and spam detection algorithms.

2. We define metrics for assessing the efficacy of detection algorithms.

3. We present schemes for selecting seed sets of pages to be manually evaluated.

4. We introduce the TrustRank algorithm for determining the likelihood that pages are reputable.

5. We discuss the results of an extensive evaluation, based on 31 million sites crawled by the AltaVista search engine, and a manual examination of over 2,000 sites. We provide some interesting statistics on the type and frequency of encountered web contents, and we use our data for evaluating the proposed algorithms.

As the free Web that we know today becomes increasingly chaotic, over-powering and untrustworthy, TrustRank may become an important factor in its long-term survival as a global information repository.



Update - Sunday April 30th 2005

New Scientist reports:

"Now Google, whose name has become synonymous with internet searching, plans to build a database that will compare the track record and credibility of all news sources around the world, and adjust the ranking of any search results accordingly.

The database will be built by continually monitoring the number of stories from all news sources, along with average story length, number with bylines, and number of the bureaux cited, along with how long they have been in business. Google's database will also keep track of the number of staff a news source employs, the volume of internet traffic to its website and the number of countries accessing the site.

Google will take all these parameters, weight them according to formulae it is constructing, and distil them down to create a single value. This number will then be used to rank the results of any news search."


Reference: [via SearchEngineWatch] [ Read more ]
Conversation Tags:
Readers' Comments    
2005-05-04 21:53:43

aaron wall

The name of that technology is brilliant even if they do not use it.

Marketing their search product with the concept of "uniquely democratic web..." and now with the concept of trust.

You can see how effective the marketing is by the post titles various blogs are using to describe the technology.



Related Articles



April 23, 2005
Spam Fighting At The Source: How To Prevent Junk eMail From Ever Being sent Out


Email spam filters have recently reached such a level of reliability and efficiency that any further development risks killing off legitimate emails from reaching you. Photo credit: Adam Ciesielski However, these filters have not cured the global spam epidemic - it's still a huge business for the unscrupulous... read more



April 11, 2005
Free Disposable Email Account With RSS: PookMail


Here is the first disposable email account service anybody can use. From any computer and operating system. Simple, free and fun-to-use, this online service is the perfect fit each time you would like to give out an email address but without using your own personal one. Photo... read more



November 4, 2003
Removing Spam From Blogs
I was just discussing today the issue of how to rapidly eliminate the number of spam messages posted by unscrupolous marketers on Web sites and blogs that provide spaces for comments at the end of articles, and making some recommendations to the Communication Agents on how... read more



February 7, 2003
Confessions of a Spam King Revisited
Guest Article by Wade Andrews Silicon Hills Media Group "Due to responses for more from the Spam King, I conducted another interview for this Christmas edition. It also seems to be a favorite time for the Spam king to work. You'll see what I mean during the interview." Wade... read more



March 1, 2003
Spam Conundrum


Spam, Anti Spam Software and Anti Spam Techniques http://www.vicomsoft.com/knowledge/ reference/spam.html = must have Mini-Guide FREE Vicomsoft has just published an interesting and informative document designed to raise awareness about Spam and offer practical solutions to email users. This document may be found at: http://www.vicomsoft.com/knowledge/ reference/spam.html If you want to become a... read more



October 7, 2002
Mini-dossier: What to do to limit junk email, spam and email viruses
by Luigi Canali De Rossi September 2002 Here are the best solutions available. I have researched up and down and I did find some interesting answers: 1) You may wish to subscribe to a service like spamcop.net they operate a service that can track the originating source. Their address... read more



posted by on Friday, April 29 2005, updated on Tuesday, February 21 2006


 

 

 

 

Understanding comes from exploration

Home | Subscribe | RSS Feeds | Site map | Syndicate
Consulting | Publications
About | Privacy | Contact

 

Creative Commons License
This work is licensed under a Creative Commons License.





View blog authority

 

3216