Content Enrichment and Search Enhancement
Adding Metadata and Tuning the Search Engine to Improve Information Findability
Business Issue

The Wildland Fire Lessons Learned Center (WFLLC), headquartered in Tucson, Arizona, serves the wildland fire community by providing them with a single reference repository for knowledge about optimally fighting wildfires. The WFLLC supports more than one million firefighters nationally, according to the U.S. Fire Administration.

In October 2010, the WFLLC was using a first-generation knowledge management system (KMS), which consisted of a set of custom developed, disjointed websites and web applications. The results of an end-user survey conducted in early 2010 showed almost universal displeasure with the WFLLC website’s layout and navigation. End users complained that they had a difficult time finding information.

Iknow was awarded a five-year contract to support the WFLLC’s knowledge management (KM) technology infrastructure. The objective of this contract was to upgrade the systems to a more robust and scalable platform built on leading commercial, off-the-shelf (COTS) software products, as well as provide ongoing support and maintenance.

In an earlier assignment under this contract, Iknow performed a comprehensive assessment of the WFLLC websites and identified many opportunities for improvement. Some of the areas of greatest need were improving the websites’ user experience and adding additional functionality. In addition, the WFLLC had an existing website search product that had been implemented poorly and needed to be reconfigured.

In this assignment, Iknow was asked to enrich the WFLLC’s content by adding new wildland fire-specific metadata and to improve the quality of the search results by upgrading and reconfiguring their commercial search product.

Approach

The primary objectives of this project were:

  • Design a comprehensive taxonomy and a set of WFLLC-specific content metadata tags.
  • Store the new taxonomy and metatags in Smartlogic Semaphore. Semaphore is an enterprise semantic platform that augments traditional information management systems (such as search, content management systems, and business workflow engines) by adding advanced content classification, metadata, and navigation capabilities to deliver a more complete enterprise information management experience.
  • Develop a new content repository in Microsoft SharePoint, a content management system (CMS).
  • Migrate the WFLLC’s content from the current websites to SharePoint, remove old or dated content, assign the new metatags to the WFLLC’s content, and store both the articles and their associated metadata together in the new CMS.
  • Configure the Coveo search engine to incorporate the new taxonomy, metadata, and content in the SharePoint CMS and implement the new functionality on the existing WFLLC websites.

Iknow performed the assignment in two major work streams: (1) Content Enrichment; and (2) Search Engine Enhancement. The key steps were:

  • Content Enrichment Work Stream
  1. Project Preparation and Initiation
  2. Analyze the Existing Content and Perform Initial Processing
  3. Develop the WFLLC Taxonomy and Metadata
  4. Design the Content Tagging Process
  5. Implement the Content Tagging Process
  • Search Engine Enhancement Work Stream
  1. Validate the Coveo Enterprise Search (CES) Installation
  2. Design the CES Interfaces
  3. Implement the New UI Designs and Search Tuning

Commercial software products from three product vendors were used to create the overall solution. The products and short descriptions of their functionality are provided below.

  • Smartlogic Semaphore software was used for ontology model development and management, ontology-driven classification, and browsing search results. The specific software products purchased from Smartlogic were:
    • Ontology Manager—designed to allow multiple users to create, enhance, and browse all types of semantic models, whether they are lists, controlled vocabularies, taxonomies, thesauri, or ontologies. The software covers the lifecycle of taxonomy development and maintenance. The license includes unlimited semantic visualization (SV) web clients (i.e., unlimited use on the WFLLC’s websites).
    • Semaphore’s Semantic Enhancement Server (SES)—a high-speed XML-based index that allows developers to query an ontology or taxonomy in real-time and create and deliver topic maps, faceted search, visualization, topic pages, related content, and other user interface components.
    • Text Miner—used to automatically extract nouns, noun phrases, and other entity types from unstructured text content. The Advanced Language Pack provides entity extraction capability for a specific language. The English language was licensed for use by the WFLLC.
    • Content classification is the process of analyzing a document and adding metadata “tags” that describe that document. Metadata tags are sourced from a taxonomy or other form of controlled vocabulary. Modules of Semaphore Content Classification and Text Mining Server include:
      • Classification Server. The enterprise scalable classification and text analysis processing engine.
      • Rule and Template Editor. A client tool to generate the rule base templates and build custom rules.
      • Rulebase Generator. The processing stream that generates the rule bases from the Semaphore model.
    • Semaphore for SharePoint is a comprehensive integrated solution for Microsoft Office SharePoint Server 2007 or SharePoint 2010. This connector extends SharePoint by tightly integrating Semaphore’s automatic classification and taxonomy governance capabilities with SharePoint’s content management functionality. 
  • Two Microsoft products were purchased and used in the Content Enrichment solution.
    • Microsoft SharePoint is a web application platform. SharePoint provides a central location for storing content such as files and documents. This content can be accessed and modified within a web browser or by using a client application (typically Microsoft Office) via desktop or smartphone. SharePoint 2010 provides a concurrent edit ability with Office 2010.
    • Microsoft SQL Server is a relational database server. Microsoft SQL Server’s primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network (including the Internet).
  • Coveo Enterprise Search was used on the WFLLC websites. The Search Enhancement portion of this project improved the search functionality of the Coveo product. Two Coveo products were used in the Content Enrichment solution.
    • Coveo Enterprise Search, version 6.5. Coveo Enterprise Search is a modular and scalable enterprise search platform that indexes information stored in various repositories throughout the enterprise.
    • SharePoint Connector. Coveo’s SharePoint Connector is one of the best ways to integrate information stored in SharePoint with other information in Coveo’s search index. Coveo supports multiple SharePoint versions, including SharePoint 2010.

The solution was developed on Amazon Web Services and Server Intellect, a third-party hosting provider. The overall technical architecture of the Content Enrichment and Search Enhancement Project is illustrated in the exhibit below. The exhibit shows the three types of functionality that are integrated together in the overall solution:

  • Taxonomy and Classification, provided by Smartlogic
  • Content Management and User Interface, provided by Microsoft
  • Search, provided by Coveo.
Content Enrichment/Search Enhancement Solution

Exhibit illustrating the content enrichment and search enhancement solution.

The four arcs highlight the integration between these products:

  1. The Semaphore taxonomy model and classification rules are used to tag the content in the SharePoint content repository.
  2. The Coveo SharePoint connector accesses the SharePoint content repository during content indexing.
  3. The Semaphore SharePoint connector provides enhanced search and browse functionality.
  4. The Coveo SharePoint connector provides enhanced search and facet functionality.
Results

The WFLLC received a new taxonomy for the wildland fire domain and all of the WFLLC’s content was richly tagged with appropriate metadata. The Coveo search engine was reconfigured to incorporate the new taxonomy and metadata. Several new search options were implemented, including keyword search, faceted search, advanced search, and browse options. The search results ranking algorithms were optimized to provide more accurate search results.

I'm interested in:
I want to submit a: