Content Search Guide

CHAPTER 4

Implementing Conceptual Search

This chapter takes you through the steps of the process flow for implementing Autonomy-based conceptual searching in your exteNd Director applications.

The following topics are covered:

 
Top of page

Searching the CM repository: how the Search and Content Management APIs are integrated

exteNd Director comes with a data fetcher that allows you to conduct Autonomy-based searches exclusively on content and metadata stored in the exteNd Director Content Management (CM) repository. The CM subsystem communicates with the exteNd Director Dynamic Reasoning Engine (DRE) through the Search API, as illustrated in this process flow diagram:

srcProcessFlowCMFetcher

In this scenario, you implement conceptual search in your exteNd Director applications by using CM API classes that wrap the Search API. These wrapper classes provide methods for constructing and executing queries on content and metadata that reside in the CM repository and have been indexed by the exteNd Director DRE.

The CM fetcher performs the following search functions automatically:

 
Top of page

Searching data sources other than the CM repository

To use Autonomy technology with exteNd Director to search data sources other than the CM repository, you must work out legal and contractual issues to purchase additional fetcher products from Autonomy, Inc. You can then use the exteNd Director Search API to fetch and query content from the custom data source.

Alternatively, you can import your custom data into the CM repository and use the CM wrapper classes for implementing Autonomy-based conceptual search.

 
Top of page

The process flow for implementing conceptual searching

The diagram below presents the recommended process flow for implementing support for Autonomy-based searching of content in the CM repository. Subsequent sections describe each of these tasks in detail.

srcSearchFlow

 
Top of page

Configuring your project and search environment

To use Autonomy-based conceptual search capabilities with the CM subsystem, you must configure your environment by:

For more information    See Configuring Your Environment for Conceptual Searching for detailed information about these configuration tasks.

 
Top of page

Developing application resources

The Search and CM subsystems provide APIs that allow you to develop application resources such as search components, JSP pages, and servlets for implementing Autonomy-based search functionality in your exteNd Director applications.

The Search API provides wrapper classes around the Autonomy API that you can use to fetch content from custom data sources, then construct and execute conceptual-style queries against this data.

The CM API wrappers the Search API to provide classes and methods for searching the CM repository in particular.

 
Top of section

Implementing search operations

This section describes search operations you can implement in exteNd Director applications.

Types of operations

Here are the key operations you can implement in your exteNd Director applications using the Search API:

Operation

Description

For more information

Fetch

Import structured and unstructured data into the query engine where it is indexed for querying

See Fetching Content and Metadata

Query

Create and execute queries on indexed content and process results

See Querying Content and Metadataand Search Query Types

For more information    For in-depth descriptions of how to use the Search classes and methods—along with illustrative code examples—see Fetching Content and Metadata, Querying Content and Metadata, and the API Reference.

The relationship between indexing and querying

The exteNd Director DRE queues its indexing jobs and executes them asynchronously. As a result, certain documents may take longer to index than others and, consequently, may not be available for querying immediately. Therefore, when implementing Autonomy-based searching, leave a time window that allows the indexing process to finish before you issue queries.

 
Top of section

Programming practices

This section describes the best practices for using the Search API and CM API to develop application resources that implement Autonomy-based searching in your exteNd Director applications.

Using delegates

A delegate is a wrapper that hides the location of a service. The delegate model follows the J2EE Business Delegate pattern.

When you use a delegate, you do not need to know whether the service is using a local manager object or an EJB. The delegate initially attempts to instantiate a local manager. If this fails, it attempts to use an EJB instead. This approach allows developers to use the same code on clients and servers to instantiate services.

For more information    For more information about delegates, see the chapter on coding Java for exteNd Director applications in Developing exteNd Director Applications.

The CM API provides a delegate interface for managing search operations in the CM repository; the Search API provides delegates for managing search activities for custom data sources. When implementing Autonomy-based searching in exteNd Director applications, it is recommended that you use these delegates as follows:

When the data source is

Use

CM repository

Custom

For more information    To learn how to use these objects, see Fetching Content and Metadata and Querying Content and Metadata.

Logic flow

Here is the recommended logic flow for implementing search in your exteNd Director application. Add this logic to the getComponentData() method of your component:

  1. Fetch data—that is, import content and metadata from your data source into the exteNd Director DRE for indexing.

    exteNd Director comes with a data fetcher for the CM repository. When you use this repository as your data source, the fetching process is done automatically.

    If you license other fetchers from Autonomy to work with outside data sources, you need to initiate the fetch process programmatically. Follow these guidelines:

    If you want to search

    Do this

    CM repository

    Specify how to synchronize the CM repository with the associated exteNd Director DRE database to ensure that all updates are imported and indexed in a timely manner. You can:

    Data sources other than the CM repository

    1. Work out legal and contractual issues with Autonomy, Inc. One option is to purchase additional fetcher products from Autonomy, Inc.

    2. Create a descriptor for each data fetcher, as described in Data fetcher descriptors.

    3. For each data source, instantiate an object that implements the EbiDataFetcherDelegate interface.

    4. Call the fetchData() method on each EbiDataFetcherDelegate object to import data from the data source into the associated exteNd Director DRE for indexing.

    For more information    For more information, see Fetching Content and Metadata.

  2. Instantiate a blank query object:

    If you want to search

    Use

    CM repository

      com.sssw.cm.factory.EboFactory.getQuery()
    

    Custom data sources

      com.sssw.search.factory.EboFactory.getQuery()
    

  3. Set the query type.

    For more information    For descriptions of Autonomy-based query types, see Search Query Types.

  4. Specify the query string.

    For more information    For syntax, see Search Query Types.

  5. Set other parameters such as maximum number of results to return and relevance cut.

    TIP:   You need to call methods on com.sssw.search.api.EbiQuery.

  6. Get an object for running the query:

    If you want to search

    Do this

    CM repository

    Get an object that implements the EbiContentMgmtDelegate interface, as described in Querying the CM repository.

    Custom data sources

    Get an object that implements the EbiQueryEngineDelegate interface, as described in Querying custom data sources.

  7. Run the query.

  8. Process the results.

For more information    For details on implementing these steps, see Fetching Content and Metadata and Querying Content and Metadata.

Code example: querying the CM repository

The following code segment demonstrates how to construct and execute an Autonomy-based search query against the CM repository:

  ...
  //Instantiate a blank query object
  com.sssw.search.api.EbiQuery query = com.sssw.cm.factory.EboFactory.getQuery();
  
  //Specify the query type
  query.setQueryType(query.QUERY_TYPE_TEXT);
  query.setText("animal+mammal");
  
  //Select all columns
  query.selectAll();
  
  //OR ... Select individual columns, like doc id and title
  //query.select(com.sssw.cm.core.EbiCmConstants.DOCID);
  //query.select(com.sssw.cm.core.EbiCmConstants.TITLE);
  
  //Ask for a maximum of 50 results
  query.setMaxNumResults(50);
  
  //Ask for results that are at least 80% relevant
  query.setRelevanceCut(80);
  
  //Get the content manager delegate
  EbiContentMgmtDelegate contentMgr = com.sssw.cm.client.EboFactory.getDefaultContentMgmtDelegate();
  
  //Run the query
  Iterator iterResults = contentMgr.runQuery(context, query, true).iterator();
  
  //Process results
  while (iterResults.hasNext()) {
     com.sssw.cm.api.EbiQueryResult res = (EbiQueryResult)iterResults.next()
     System.out.println("DOCID:" + res.getID());
     System.out.println("TITLE:" + res.getTitle());
     String content = (res.getData() != null) ? new String(res.getData()) : "none";
     System.out.println("CONTENT:" + content);
     System.out.println("RELEVANCE:" + res.getIntegerProperty(res.PROP_DOC_WEIGHT));
     System.out.println("QUICK SUMMARY:" + res.getProperty(res.PROP_DOC_QUICK_SUMMARY));
  }
  ...

 
Top of section

Interacting with the CM repository

If you plan to add and update content in the CM repository programmatically using the CM API, you also need to write components and related resources that implement this logic. To learn about all the ways to interact with the CM repository, see Updating content in the CM repository.

 
Top of section

Packaging application resources

As you begin to develop your search application resources, you must make decisions about how to package them inside your exteNd Director project. Follow the guidelines in the chapter on using resource sets in Developing exteNd Director Applications.

After you have incorporated your custom resources in the exteNd Director project, you are ready to deploy the application to your application server, as described in Building, archiving, and deploying your application next.

 
Top of page

Building, archiving, and deploying your application

You build, archive, and deploy your application in exteNd Director just as you would any J2EE application.

For more information    For server-specific guidelines, see the chapter on deploying exteNd Director applications in Developing exteNd Director Applications.

 
Top of page

Updating content in the CM repository

There are several ways to add and update content in the CM repository, each described in this section. You can use any combination of these methods as long as you include the appropriate subsystems in your exteNd Director project, as described in Determining your exteNd Director project configuration.

 
Top of section

Updating content in the CMS Administration Console

The CMS Administration Console is part of the exteNd Director Web tier, a prebuilt Web application that provides a graphical user interface for creating, updating, and publishing content in the CM repository. When you create content in the CM Administration Console, you can take advantage of the exteNd Director CM features designed to facilitate searching—including the ability to define and associate custom metadata with documents.

For more information    For more information, see the chapter on the CMS Administration Console in the Content Management Guide.

 
Top of section

Updating content using the CM API

You use classes and methods in the CM API to write exteNd Director components, servlets, and JSP pages to create, update, and publish content in the CM repository.

For more information    For more information, see the chapter on managing documents in the Content Management Guide.

 
Top of section

Creating and updating content in third-party applications

You can create and update content in third-party applications. Before you can perform conceptual searches on this content, you must take additional steps:

If your third-party application is a WebDAV-enabled client, you can use the exteNd Director WebDAV servlet to transfer content into the CM repository, as described in the chapter on using WebDAV clients with exteNd Director for collaborative authoring in the Content Management Guide.

IMPORTANT:   Third-party content imported into the CM repository via WebDAV is saved as system resources. A limitation of system resources is that you cannot define custom metadata for them. However, you can still search their content and standard metadata. This limitation does not apply if you use the exteNd Director content import utility or create your own WebDAVclient using the client API provided by exteNd Director. You can associate custom metadata with content created in this type of custom-designed WebDAV client.

For more information    For more information, see the chapters on importing content and building your own WebDAV client in the Content Management Guide.

 
Top of page

Testing queries

You can use the exteNd Director DRE Administration console to test your queries in isolation before you deploy your application.

For more information    For more information, see Testing queries.

 
Top of page

Troubleshooting the search application

exteNd Director provides several techniques for debugging your search application and correcting commonly encountered problems.

For more information    For more information, see Troubleshooting the Conceptual Search Process.



Copyright © 2004 Novell, Inc. All rights reserved. Copyright © 1997, 1998, 1999, 2000, 2001, 2002, 2003 SilverStream Software, LLC. All rights reserved.  more ...