Saturday, May 17, 2008

Semantic Search With Powerset: Transforming Publications Through Semantic Search Technology

Powerset: Transforming Publications Through Semantic Search Technology

by John Blossom

There are rocket scientists, then there are rocket scientists - and then there's Barney Pell, long-time Silicon Valley startup maven and currently the Founder and Chief Technology Officer at Powerset. Barney is one of those rare people who has been a rocket scientist via both the NASA side of the term and the software industry side, an outlook that has helped him to assemble many teams through the years that have developed advanced search and language processing technologies.

Powerset Search Overview

Powerset has unveiled its first effort recently at a new technology to provide rich content from semantic searches, an interesting look at how one can completely reshape the face of a content product via enhanced search technologies.

Using Wikidpedia and as its primary target content, Powerset technology analyzes search phrases to come up with search results that match natural language phrases as well as keywords.

Powerset Road Test

This being a very early stage debut of technology some search targets work better than others and overall I'd have to say that it's a technology that seems to do best with people and things as opposed to concepts.

For example, if you type in "Who is Bill Gates?" you get the screen similar to the top of the above screen grab, which includes a top deck of biographical information from the Freebase reference database followed by Powerset's sets of semantic analysis called "Factz" that focus on what the Wikipedia article says about this prominent figure. One of these sets, for example, tells us that Gates gave testimony, a speech, an address, a demo, a presentation and a deposition. You can click on any of these terms to get more details from the underlying article.

Below the initial bio and Factz information is a set of search results for the initial query, including the best-match article on Microsoft founder Bill Gates. This is in essence the straight Wikipedia article with links mapped over to Powerset's version of this content, along with a handy visual presentation of the article's outline on the right or another listing of key Factz organized within the article outline. I like some of the inferences that it's come up with in the Wikipedia definition of Content that I contributed a while back: "information provides value; experiences provide value; content provides value." True enough.

I like how Powerset prefixes organic search results with federated content, taking a best stab at results on very focused topics that enable people to obtain knowledge more quickly and effectively.

The automatically generated Factz, though, suffer from the same problem that most semantic tools experience when they examine a very small data set: spotty inferences. For example, in the Factz about Bill Gates Powerset inferred that he founded Cher, an inference drawn from the fact that biographer Howard Johns was known for revealing the addresses of these and other celebrities. Hmm. Don't think that I'd put that info down on my "final Jepoardy" slate.

I am also not so crazy about the organic search results, which tend to err on the side of word proximity.

Again, with a relatively narrow data set such as Wikipedia it's not always easy to tune content analysis well to the capabilities of semantic text analysis in search engines.

Big Picture

The big picture for this early-days release of Powerset is that it is a great demonstration of how one particular source of content can be transformed through search and content federation technologies into an altogether different kind of publication.

Oftentimes I talk these days about search technologies being similar to datafeed technologies, but in this instance it's important to recognize that search technologies are also end-publishing technologies in and of themselves that can aggregate, filter and organize content in altogether new ways that enhance the value of one or more core publications.

Using free content from Wikipedia and Freebase the Powerset technology does a good job of demonstrating this concept simply, albeit with some early growing pains.

Strategic Advice for Online Publishers

Publishers wanting to stay in the forefront of content markets are turning in droves to content federation technologies as a solution to add value to existing product sets, so expect to hear more from technologies such as Powerset that help publishers to add value rapidly.

Originally written by John Blossom for Shore and first published as "Powerset: Transforming Publications Through Semantic Search Technology" on May 13th 2008.

John Blossom -
Reference: Shore [ Read more ]