Content Search Guide
CHAPTER 4
This chapter takes you through the steps of the process flow for implementing Autonomy-based conceptual searching in your exteNd Director applications.
The following topics are covered:
exteNd Director comes with a data fetcher that allows you to conduct Autonomy-based searches exclusively on content and metadata stored in the exteNd Director Content Management (CM) repository. The CM subsystem communicates with the exteNd Director Dynamic Reasoning Engine (DRE) through the Search API, as illustrated in this process flow diagram:
In this scenario, you implement conceptual search in your exteNd Director applications by using CM API classes that wrap the Search API. These wrapper classes provide methods for constructing and executing queries on content and metadata that reside in the CM repository and have been indexed by the exteNd Director DRE.
The CM fetcher performs the following search functions automatically:
Imports data from the CM repository into the exteNd Director DRE for indexinga process called fetching, as described in Fetching Content and Metadata. Data must be fetched before it becomes available for searching.
Synchronizes the CM repository and the corresponding DRE database as you change content and metadata. You can control the mode and frequency of the synchronization process, as described in Synchronization mode and Operations that trigger immediate synchronization.
To use Autonomy technology with exteNd Director to search data sources other than the CM repository, you must work out legal and contractual issues to purchase additional fetcher products from Autonomy, Inc. You can then use the exteNd Director Search API to fetch and query content from the custom data source.
Alternatively, you can import your custom data into the CM repository and use the CM wrapper classes for implementing Autonomy-based conceptual search.
The diagram below presents the recommended process flow for implementing support for Autonomy-based searching of content in the CM repository. Subsequent sections describe each of these tasks in detail.
To use Autonomy-based conceptual search capabilities with the CM subsystem, you must configure your environment by:
See Configuring Your Environment for Conceptual Searching for detailed information about these configuration tasks.
The Search and CM subsystems provide APIs that allow you to develop application resources such as search components, JSP pages, and servlets for implementing Autonomy-based search functionality in your exteNd Director applications.
The Search API provides wrapper classes around the Autonomy API that you can use to fetch content from custom data sources, then construct and execute conceptual-style queries against this data.
The CM API wrappers the Search API to provide classes and methods for searching the CM repository in particular.
This section describes search operations you can implement in exteNd Director applications.
Here are the key operations you can implement in your exteNd Director applications using the Search API:
For in-depth descriptions of how to use the Search classes and methodsalong with illustrative code examplessee Fetching Content and Metadata, Querying Content and Metadata, and the API Reference.
The exteNd Director DRE queues its indexing jobs and executes them asynchronously. As a result, certain documents may take longer to index than others and, consequently, may not be available for querying immediately. Therefore, when implementing Autonomy-based searching, leave a time window that allows the indexing process to finish before you issue queries.
This section describes the best practices for using the Search API and CM API to develop application resources that implement Autonomy-based searching in your exteNd Director applications.
A delegate is a wrapper that hides the location of a service. The delegate model follows the J2EE Business Delegate pattern.
When you use a delegate, you do not need to know whether the service is using a local manager object or an EJB. The delegate initially attempts to instantiate a local manager. If this fails, it attempts to use an EJB instead. This approach allows developers to use the same code on clients and servers to instantiate services.
For more information about delegates, see the chapter on coding Java for exteNd Director applications in Developing exteNd Director Applications.
The CM API provides a delegate interface for managing search operations in the CM repository; the Search API provides delegates for managing search activities for custom data sources. When implementing Autonomy-based searching in exteNd Director applications, it is recommended that you use these delegates as follows:
When the data source is |
Use |
---|---|
CM repository |
|
Custom |
|
To learn how to use these objects, see Fetching Content and Metadata and Querying Content and Metadata.
Here is the recommended logic flow for implementing search in your exteNd Director application. Add this logic to the getComponentData() method of your component:
Fetch datathat is, import content and metadata from your data source into the exteNd Director DRE for indexing.
exteNd Director comes with a data fetcher for the CM repository. When you use this repository as your data source, the fetching process is done automatically.
If you license other fetchers from Autonomy to work with outside data sources, you need to initiate the fetch process programmatically. Follow these guidelines:
If you want to search |
Do this |
---|---|
CM repository |
Specify how to synchronize the CM repository with the associated exteNd Director DRE database to ensure that all updates are imported and indexed in a timely manner. You can:
|
Data sources other than the CM repository |
|
Instantiate a blank query object:
If you want to search |
Use |
---|---|
CM repository |
com.sssw.cm.factory.EboFactory.getQuery() |
Custom data sources |
com.sssw.search.factory.EboFactory.getQuery() |
For descriptions of Autonomy-based query types, see Search Query Types.
For syntax, see Search Query Types.
Set other parameters such as maximum number of results to return and relevance cut.
TIP: You need to call methods on com.sssw.search.api.EbiQuery.
Get an object for running the query:
If you want to search |
Do this |
---|---|
CM repository |
Get an object that implements the EbiContentMgmtDelegate interface, as described in Querying the CM repository. |
Custom data sources |
Get an object that implements the EbiQueryEngineDelegate interface, as described in Querying custom data sources. |
For details on implementing these steps, see Fetching Content and Metadata and Querying Content and Metadata.
The following code segment demonstrates how to construct and execute an Autonomy-based search query against the CM repository:
... //Instantiate a blank query object com.sssw.search.api.EbiQuery query = com.sssw.cm.factory.EboFactory.getQuery(); //Specify the query type query.setQueryType(query.QUERY_TYPE_TEXT); query.setText("animal+mammal"); //Select all columns query.selectAll(); //OR ... Select individual columns, like doc id and title //query.select(com.sssw.cm.core.EbiCmConstants.DOCID); //query.select(com.sssw.cm.core.EbiCmConstants.TITLE); //Ask for a maximum of 50 results query.setMaxNumResults(50); //Ask for results that are at least 80% relevant query.setRelevanceCut(80); //Get the content manager delegate EbiContentMgmtDelegate contentMgr = com.sssw.cm.client.EboFactory.getDefaultContentMgmtDelegate(); //Run the query Iterator iterResults = contentMgr.runQuery(context, query, true).iterator(); //Process results while (iterResults.hasNext()) { com.sssw.cm.api.EbiQueryResult res = (EbiQueryResult)iterResults.next() System.out.println("DOCID:" + res.getID()); System.out.println("TITLE:" + res.getTitle()); String content = (res.getData() != null) ? new String(res.getData()) : "none"; System.out.println("CONTENT:" + content); System.out.println("RELEVANCE:" + res.getIntegerProperty(res.PROP_DOC_WEIGHT)); System.out.println("QUICK SUMMARY:" + res.getProperty(res.PROP_DOC_QUICK_SUMMARY)); } ...
If you plan to add and update content in the CM repository programmatically using the CM API, you also need to write components and related resources that implement this logic. To learn about all the ways to interact with the CM repository, see Updating content in the CM repository.
As you begin to develop your search application resources, you must make decisions about how to package them inside your exteNd Director project. Follow the guidelines in the chapter on using resource sets in Developing exteNd Director Applications.
After you have incorporated your custom resources in the exteNd Director project, you are ready to deploy the application to your application server, as described in Building, archiving, and deploying your application next.
You build, archive, and deploy your application in exteNd Director just as you would any J2EE application.
For server-specific guidelines, see the chapter on deploying exteNd Director applications in Developing exteNd Director Applications.
There are several ways to add and update content in the CM repository, each described in this section. You can use any combination of these methods as long as you include the appropriate subsystems in your exteNd Director project, as described in Determining your exteNd Director project configuration.
The CMS Administration Console is part of the exteNd Director Web tier, a prebuilt Web application that provides a graphical user interface for creating, updating, and publishing content in the CM repository. When you create content in the CM Administration Console, you can take advantage of the exteNd Director CM features designed to facilitate searchingincluding the ability to define and associate custom metadata with documents.
For more information, see the chapter on the CMS Administration Console in the Content Management Guide.
You use classes and methods in the CM API to write exteNd Director components, servlets, and JSP pages to create, update, and publish content in the CM repository.
For more information, see the chapter on managing documents in the Content Management Guide.
You can create and update content in third-party applications. Before you can perform conceptual searches on this content, you must take additional steps:
Ensure that your third-party application produces content in a format that can be searched by the Search subsystem.
For information about configuring search formats, see Importable MIME types and Importable file extensions.
Import and publish the third-party content in the CM repository.
If your third-party application is a WebDAV-enabled client, you can use the exteNd Director WebDAV servlet to transfer content into the CM repository, as described in the chapter on using WebDAV clients with exteNd Director for collaborative authoring in the Content Management Guide.
IMPORTANT: Third-party content imported into the CM repository via WebDAV is saved as system resources. A limitation of system resources is that you cannot define custom metadata for them. However, you can still search their content and standard metadata. This limitation does not apply if you use the exteNd Director content import utility or create your own WebDAVclient using the client API provided by exteNd Director. You can associate custom metadata with content created in this type of custom-designed WebDAV client.
For more information, see the chapters on importing content and building your own WebDAV client in the Content Management Guide.
You can use the exteNd Director DRE Administration console to test your queries in isolation before you deploy your application.
For more information, see Testing queries.
exteNd Director provides several techniques for debugging your search application and correcting commonly encountered problems.
For more information, see Troubleshooting the Conceptual Search Process.
Copyright © 2004 Novell, Inc. All rights reserved. Copyright © 1997, 1998, 1999, 2000, 2001, 2002, 2003 SilverStream Software, LLC. All rights reserved. more ...