IBM posts UIMA as a standard

On August 8th 2005, IBM announced version 8.2.2 of its OmniFind search software (more formally known as WebSphere Information Integrator OmniFind Edition). This release externalizes the Unstructured Information Management Architecture (UIMA), an interchange framework that was embodied in earlier releases of the product. UIMA helps compliant software components interwork on the task of extracting useful information from all kinds of corporate data, especially text.

Using the best

There are competing approaches to putting structure on a corpus of text documents and some suppliers’ products are better than others’ for specific types of analysis. IBM’s aim with UIMA is to make it straightforward for user organizations to use their preferred analysis software (IBM calls these products annotators). They might also wish to use more than one kind, in combination. UIMA makes these programs interoperable and extensible across the organization.

Different types of analysis software vary in how they represent and communicate their results. UIMA therefore defines a common analysis structure (CAS) to manage and store the software objects that these annotators produce and consume. These objects feed through to the search software, which can then produce the results as the user asks for them.

The analysis results feed through a UIMA-compliant processing engine. The output from this can go to search software, a database or an application program. IBM’s OmniFind product combines a UIMA processing engine with a search engine. Technically competent user organizations or other software companies might choose to build a processing engine alone.

The diagram above, based on an IBM original, shows how a UIMA-based search product might be applied. Here, it intercedes between the different data sources (and, possibly, different annotators) on the left and two output streams on the right. One output goes via an OmniFind search index and appears as search results. The other stream goes into a database or data warehouse, using SQL. It could perhaps appear alongside other structured data in a standard report. The input sources can include forms, tables, invoices, CRM files and ERP files, as well as text.

IBM’s plans for UIMA

IBM has been working on UIMA since 2001. There is a working group for the framework, which IBM jointly leads with DARPA, the US Government’s military research organization. Other participants include Sloan Kettering, Mayo Clinic, BBN Technologies, MITRE and SAIC (Object Sciences). Universities like Stanford, Carnegie Mellon and Columbia are using UIMA in advanced courses and research projects.

As well as this assembled brainpower, IBM has enlisted a group of 16 software suppliers who are collaborating on the standard. They include Attensity, ClearForest, Inquira, SAS, SPSS and others, who are providing text analysis modules. Factiva and QL2 will be offering products that supply the data for analysis. Software from Cognos, Kana, Siebel and others will be able to accept the outputs from UIMA-based systems. Some companies, such as Attensity and ClearForest, will be releasing compliant products this quarter.

IBM feels that adopting the UIMA framework will make those suppliers’ products more saleable and will help systems integrators. Complementary, and even competing, products from multiple vendors will be able to communicate with one another and be more easily put to use.

Typical applications for text analytical systems include customer support, ecommerce, media monitoring, competitive intelligence, fraud detection, and research and intelligence. Sectors such as automotive, durable and consumer goods, commercial and retail banking, insurance and government agencies are among the primary targets.

There is a Java-based software development kit (SDK) for UIMA available from IBM’s alphaWorks Web site. Towards the end of this year, IBM will make UIMA framework itself open source. It proposes to start by placing it on the SourceForge site for free download. Other programmers will be able to add to or amend the code later. (Earlier this year, IBM said it would be turning over 30 or so other projects to SourceForge.)

Assessment

In one of its August 8th press releases, IBM quotes Arthur Ciccolo, Department Group Manager for Information and Knowledge Management, IBM Research. He says, UIMA provides, for the first time, true interoperability among different knowledge discovery, search, business intelligence and text analytics software.

This is a grand vision, and needs a big player like IBM to make it possible. Publishing UIMA freely is a bold step and is an example of enlightened self-interest at work. IBM sees UIMA as important in helping the enterprise search market extend beyond its present niche. Search products can become essential components of a well-integrated and enterprise-wide information management architecture. IBM is keen to supply the components for this. Such an architecture would serve the needs of the ‘on demand’ business in all its guises, such as in performance management, compliance and business transformation.

UIMA, and the work and relationships around it, will help IBM mark out some of the growing information management market for itself and its collaborators. There will inevitably be ‘coopetition’ but less likelihood of a technical lockout, for buyers and sellers alike.

Much depends on whether other major suppliers of information management software will be willing to incorporate UIMA in their products. If they do, they and IBM will have made an important stride forward in building a new market. If they do not, which seems unlikely at this stage, no major damage would result but an opportunity would have been lost.