Mashups: What Are They? Technical And Social Challenges - Part 2
Mashups are filled with technical challenges and that is why many technologists are attracted by them. The challenge of having to find new solutions to "old" technical issues while "inventing" new ways to mix and combine existing resources and tools is undoubtedly a very positive motivator for any developer out there.
In this second part (Part 1) of this guide to Mashups by Duane Merrill, the focus is on the "Technical Challenges" and "Social Aspects" of this rapidly evolving field.
What to look out for, typical technical challenges and obvious bottlenecks are examined and explained in an introductory, though technically competent way. Good considerations regarding accessibility, SEO, security and other issues often underestimated during initial planning of any such project.
The social aspects are no less important in the creation of online mashups and in particular the tradeoff between the protection of intellectual property and consumer privacy versus fair-use and the free flow of information is a notable one.
As always, plenty of links help you learn and find out more about possible terms and technologies you may not be yet familiar with.
Intro by Robin Good
Photo credit: Sun - The Big Mashup
Mashups: The New Breed of Web App - Part 2
Mashup matrix - Photo credit: Vincent Thomé's blog
Like any other data integration domain, mashup development is replete with technical challenges that need to be addressed, especially as mashup applications become more feature- and functionality-rich.
This section touches on a handful of these challenges, some of which you can address and mitigate, while others are open issues.
Data Integration Challenges: Semantic Meaning and Data Quality
Qualitative surveys suggest that the number one enterprise IT concern today is data integration within the enterprise virtual organization. (In this context, I use the term virtual organization to mean a composition of federated business units, each contained within its own administrative domain.)
Like many enterprise IT managers who find themselves up to the task of integrating legacy data sources (for example, to create corporate dashboards that reflect current business conditions), mashup developers are faced with the analogous challenges of deriving shared semantic meaning between heterogeneous data sets. Therefore, to get an idea for what mashup developers have in store,you need look no further than the storied integration challenges faced by enterprise IT.
For example, translation systems between data models must be designed.
When converting data into common forms, reasonable assumptions often have to be made when the mapping is not a complete one (for example, one data source might have a model in which an address-type contains a country-field, whereas another does not). Already challenging, this is exacerbated by the fact that the mashup developers might not be domain experts on the source data models because the models are third-party to them, and these reasonable assumptions might not be intuitive or clear.
In addition to missing data or incomplete mappings, the mashup designer might discover that the data they wish to integrate is not suitable for machine automation; that it needs cleansing.
For example, law enforcement arrest records might be entered inconsistently, using common abbreviations for names (such as "mkt sqr" in one record and "Market Square" in another), making automated reasoning about equality difficult, even with good heuristics.
Legacy data sources are likely to require much human effort in terms of analysis and data cleansing before they can be availed to semantic modeling technologies.
Mashup developers might also have to contend with several issues that IT integration managers might not, one of which is data pollution.
As part of their application design, many mashups solicit public user input. As evidenced in the wiki application domain, this is a double-edged blade as:
a) it can be quite powerful because it enables open contribution and best-of-breed data evolution,
b) yet it can be subject to inconsistent, incorrect, or intentionally misleading data entry. The latter can cast doubts on data trustworthiness, which can ultimately compromise the value provided by the mashup.
Another host of integration issues facing mashup developers arise when screen scraping techniques must be used for data acquisition.
As discussed in the previous section, deriving parsing and acquisition tools and data models requires significant reverse-engineering effort. Even in the best case where these tools and models can be created, all it takes is a re-factoring of how the source site presents its content (or mothballing and abandonment) to break the integration process, and cause mashup application failure.
The Ajax model of Web development can provide a much richer and more seamless user experience than the traditional full-page-refresh, but it poses some difficulties as well.
At its fundamentals, Ajax entails using the browser's client-side scripting capabilities in conjunction with its DOM to achieve a method of content delivery that was not entirely envisioned by the browser's designers. (Perhaps this hack-like nature of Ajax lends to its appeal.) However, this subjects Ajax-based applications to the same browser compatibility issues that have plagued Web designers ever since Microsoft created Internet Explorer.
Because content is no longer necessarily linked to the URL in the browser's address bar, users might not experience the functionality that they normally expect when they use the browser's BACK button, or the BOOKMARK feature. And, although Ajax can reduce latency by requesting incremental content updates, poor designs can actually hinder the user experience, such as when the granularity of update is small enough that the quantity and overhead of updates saturate the available resources.
Also, take care to support the user (for example, with visual feedback such as progress bars) while the interface loads or content is updated.
As with any distributed, cross-domain application, mashup developers and content providers alike will also need to address security concerns.
The notion of identity can prove to be a sticky subject, as the traditional Web is primarily built for anonymous access.
Single-signon is a desirable feature, but there are a multitude of competing technologies (ranging from Microsoft Passport to the Liberty Alliance), thus creating disjointed identity namespaces that you must integrate as well.
Content providers are likely to employ authentication and authorization schemes (which require the notion of secure identity or securely identifiable attributes) in their APIs to enforce business models that involve paid subscriptions or sensitive data.
Sensitive data is also likely to require confidentiality (that is, encryption), and you must take care when you mash it with other sources to not put it at risk.
Identity will also be crucial for auditing and regulatory compliance. Additionally, with data integration happening both on the server and client-side, identity and credential delegation from the user to the mashup service might become a requirement.
In addition to the technical challenges described in the previous section, social issues have (or will) surface as mashups become more popular.
Unwitting content providers (targets of screen scraping), and even content providers who expose APIs to facilitate data retrieval might determine that their content is being used in a manner that they do not approve of.
(For a good review of Web aggregation and regulations, see the Resources section at the end of this article.)
The mashup Web application genre is still in its infancy, with hobbyist developers who produce many mashups in their spare time.
These developers might not be cognizant of (or concerned with) issues such as security. Additionally, content providers are only beginning to see the value in providing APIs for machine-based content access, and many do not consider them a core business focus. This combination can yield poor software quality, as priorities such as testing and quality assurance take the backseat to proof-of-concept and innovation.
The community as a whole will have to work together to assemble open standards and reusable toolkits in order to facilitate mature software development processes.
Before mashups can make the transition from cool toys to sophisticated applications, much work will have to go into distilling robust standards, protocols, models, and toolkits.
For this to happen, major software development industry leaders, content providers, and entrepreneurs will have to find value in mashups, which means viable business models.
API providers will need to determine whether or not to charge for their content, and if so, how (for example, by subscription or by per-use). Perhaps they will provide varying levels of quality-of-service.
Mashup developers might look for an ad-based revenue model, or perhaps build interesting mashup applications with the goal of being acquired.
End of Part 2
- Programmable Web: Stay up to date with the latest on mashups and the new Web 2.0 APIs.
- Considering Ajax, Part 1: Cut through the hype(Chris Laffra, developerWorks, May 2006): Consider this set of discussion points for every developer before you use Ajax techniques for a Web site.
- Ajax page: Visit this page sponsored by the Mozilla Development Center
- The Interplay of Web Aggregation and Regulations (LawTech): Be sure to read this good review of Web aggregation and regulations (PDF file).
- DB2 and open source: Put yourself on the map with Google Maps API, DB2/Informix, and PHP on Linux (Marty Lurie and Aron Y. Lurie, developerWorks, March 2006): Create an easy-to-use map with your data on it.
- Building Web service applications with the Google API (Nicholas Chase, developerWorks, May 2002): Learn to embed Google search results and other information in your Java applications in this tutorial.
- The ultimate mashup -- Web services and the semantic Web tutorial series: Take the all the tutorials in this series and create a custom mashup.
- Second Generation Web Services: Read this XML.com article for coverage of the REST architecture.
- REST and the Real World: Read more on REST from XML.com.
- The W3C Semantic Web Activity site: Read about the Semantic Web.
- W3C RDF Activity: Visit this site for the latest on Resource Description Framework.
- W3C RDF Activity: Visit this site for the latest on Resource Description Framework.
- Introduction to Jena: Use RDF models in your Java applications with the Jena Semantic Web Framework (Philip McCarthy, developerWorks, June 2004): Find out how to use the Jena Semantic Web Toolkit to exploit RDF data models in your Java applications.
- What is RSS?: From XML.com, learn about this syndication format for news, content, and personal weblogs.
- Atom Overview: Read about the XML-based Web content and metadata syndication format and application-level protocol from AtomEnabled.org
- IBM XML 1.1 certification: Find out how you can become an IBM Certified Developer in XML 1.1 and related technologies.
- XML: See developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
Get products and technologies
- W3C SOAP Specification: Get the latest version.
- Scraping with style: scrAPI toolkit for Ruby: Try this technology for your mashups.
- XML zone discussion forums: Participate in any of several XML-centered forums.
About the author
Duane Merrill has developed grid computing and distributed data integration platforms for over five years. He has been a contributor to the Legion Project at the University of Virginia and a core developer for the Avaki Corporation's distributed enterprise information integration product Avaki. He is currently obtaining his Ph.D in Computer Science at the University of Virginia.
This article is copyright 2006 Backstop Media and has been republished with permission.Duane Merrill -
Reference: IBM [ Read more ]
blog comments powered by Disqus