Content Search Guide

CHAPTER 6

Querying Content and Metadata

This chapter describes the content query process and explains how to implement querying in your exteNd Director search applications.

The following topics are covered:

 
Top of page

About querying

Queries are structured expressions that you can use to search content from a data source. Autonomy-based search capabilities in exteNd Director allow you to query both content and metadata using a single query expression, rather than requiring you to write separate queries for each type of data.

Querying metadata   In exteNd Director you can query two types of metadata:

Querying content   You can query content only if it has been published.

 
Top of section

Querying the CM repository

exteNd Director comes with a data fetcher that allows you to conduct Autonomy-based searches exclusively on content and metadata stored in the exteNd Director CM repository.

To query the CM repository, you use the CM subsystem in conjunction with the Search subsystem. The CM API provides classes that wrapper relevant search functions associated with the CM repository, as described in Implementing querying for the CM repository.

 
Top of section

Querying custom data sources

To use Autonomy technology with exteNd Director to search data sources other than the CM repository, you must purchase additional data fetchers from Autonomy, Inc.

To query custom data sources, you must use Search API classes to instantiate a query object and run the query against the other data sources you are licensed to use, as described in Implementing querying for custom data sources.

Alternatively, you can import your custom data into the CM repository and use the CM wrapper classes for implementing Autonomy-based conceptual queries.

 
Top of page

Types of queries you can run

The exteNd Director Search subsystem supports the following types of queries:

For more information    For detailed descriptions of each type of query—including syntax definitions and code examples showing how to specify each query type—see Search Query Types .

 
Top of page

Implementing querying for the CM repository

To implement Autonomy-based conceptual and keyword search in your exteNd Director applications, you use CM API functionality that wrappers the relevant Search APIs:

This CM class

Does this

To this search class

com.sssw.cm.api.EbiContentMgmtDelegate.runQuery()

Wrappers

com.sssw.search.api.EbiQueryEngineDelegate.runQuery()
com.sssw.cm.api.EbiQueryResult

Extends

com.sssw.search.api.EbiQueryResult

The wrapper classes provide methods for constructing and running queries on content and metadata that reside in the CM repository and have been indexed by the exteNd Director (Dynamic Reasoning Engine) DRE.

In addition, you can configure your environment to manage the processes of document fetching and querying, as described in Setting Search Options.

 
Top of section

Key classes and interfaces for querying the CM repository

Key classes and interfaces for querying the CM repository include:

Class or interface

Description

Package

EbiContentMgmtDelegate

Delegate for accessing objects that implement the EbiContentManager interface

NOTE:   EbiContentManager is an interface that provides methods for accessing standard metadata, custom metadata, and content in the CM repository

com.sssw.cm.api

EbiQuery

Interface that provides methods for constructing various types of Autonomy-based queries and setting query properties

com.sssw.search.api

EbiQueryResult

Interface that provides methods for processing the results of Autonomy-based queries of content and metadata in the CM repository

com.sssw.cm.api

EboFactory

Factory class that provides methods for getting content manager delegates

com.sssw.cm.client

EboFactory

Server-side factory class that provides methods for instantiating objects used by the CM subsystem

com.sssw.cm.factory

 
Top of section

Methods for querying the CM repository

This section describes the CM API methods you can use to query the CM repository in your exteNd Director applications.

Getting a content manager delegate

Here is the method for getting a content manager delegate:

  com.sssw.cm.client.EboFactory.getDefaultContentMgmtDelegate()

This method returns a content manager delegate associated with the default CM repository. The content manager delegate provides methods for running Autonomy-based queries on document content and metadata in this repository.

For more information    For information about why to use delegates, see Programming practices.

Instantiating query objects for the CM repository

Autonomy-based queries are based on the EbiQuery interface—an interface that resides in the Search API. To search content and metadata in the CM subsystem, you must instantiate a query object that not only implements this interface but also is associated with the CM repository. The CM API provides the method to use:

  com.sssw.cm.factory.EboFactory.getQuery()

Using this query object, you can call Search API methods to construct Autonomy-based queries and fine-tune search results, as described in Constructing queries for the CM repository.

Constructing queries for the CM repository

Here are key methods for constructing queries for the CM repository:

Method

Description

com.sssw.search.API.EbiQuery.setQueryType()

Specifies the type of query you want to run

For more information    For more information, see Search Query Types

com.sssw.search.API.EbiQuery.setQueryText()

Specifies the query string

com.sssw.search.API.EbiQuery.setMaxNumResults()

Sets the maximum number of results to return

com.sssw.search.API.EbiQuery.setRelevanceCut()

Sets the minimum relevance criteria for query results

NOTE:   You use the same methods for constructing Autonomy-based queries for custom data sources. The difference is that you call these methods on a query object instantiated from a factory in the Search API, as described in Implementing querying for custom data sources.

Issuing queries against the CM repository

Here is the method for querying the CM repository:

  com.sssw.cm.api.EbiContentMgmtDelegate.runQuery()

This method runs a query that you construct using the com.sssw.search.api.EbiQuery interface and returns the results as a collection of objects that implements the com.sssw.cm.api.EbiQueryResult interface.

 
Top of section

Code example: issuing an Autonomy-based query against the CM repository

The following code segment demonstrates how to instantiate a query object and run a query against the default CM repository, called Default:

  ...
  public void getComponentData( EbiPortalContext context, java.util.Map params ) throws com.sssw.fw.exception.EboUnrecoverableSystemException 
  {
     //Declare a string buffer
     StringBuffer sb = new StringBuffer();
  
     //Set the query string
     String queryString = "The+effect+of+the+recession+on+consumer+spending";
  
     try
     {
        //Create a blank query object
        com.sssw.search.api.EbiQuery query = com.sssw.cm.factory.EboFactory.getQuery();
  
        //Set query type to text
        query.setQueryType(query.QUERY_TYPE_TEXT);
  
        //Specify the query string; this is a conceptual query
        query.setText(queryString);
  
        //Ask for a maximum of 50 results
        query.setMaxNumResults(50);
  
        //Ask for results that are at least 80% relevant
        query.setRelevanceCut(80);
  
        //Ask to return all available document properties in the results
        query.selectAll();
  
        //Get the content manager delegate
        EbiContentMgmtDelegate contentMgr = com.sssw.cm.client.EboFactory.getDefaultContentMgmtDelegate();
  
        //Run the query
        //The boolean argument in runQuery indicates whether results should be filtered
        Iterator iterResults = contentMgr.runQuery(context, query, true).iterator();
  
        //Process the results
        while (iterResults.hasNext())
        {
           com.sssw.cm.api.EbiQueryResult res = (com.sssw.cm.api.EbiQueryResult)iterResults.next();
           //Get document metadata
           String docTitle = res.getTitle();
           java.sql.Timestamp dateCreated = res.getDateCreated();
  
           //Get document content
           String docAbstract = res.getAbstract();
  
           //Add query result to the string buffer returned by the component
           sb.append("\n").append(docTitle).append(dateCreated).append(docAbstract).append("\n");
        }
     }
     catch (Exception _E)
     {
        System.out.println ("Query failed");
        _E.printStackTrace();
     }
     //Set content type
     context.setContentType(com.sssw.portal.api.EbiComponentConstants.MIME_TYPE_HTML_UTF8);
  
     //Place the content into the context
     context.setComponentContent( sb.toString() );
  }
  ...

As you can see, this component retrieves both standard metadata—the document title and the date created—and content from the query results. By default, the exteNd Director DRE is configured to index both types of information. This behavior is controlled by two search options that are enabled by default in the CM subsystem configuration file:

The CM subsystem provides many other options that you can configure to customize your search environment, as described in Setting Search Options.

 
Top of page

Implementing querying for custom data sources

The exteNd Director Search API provides wrapper classes around Autonomy APIs that provide methods for querying content in data sources other than the CM repository.

IMPORTANT:   To use Autonomy technology with exteNd Director to search other data sources, you must purchase additional data fetchers from Autonomy, Inc.

 
Top of section

Key query classes and interfaces for querying custom data sources

Key classes and interfaces for querying custom data sources include:

Class or interface

Description

Package

EbiQueryEngineDelegate

Delegate for accessing objects that implement the EbiQueryEngine interface, which provides methods for configuring the query engine and processing queries

NOTE:   EbiQueryEngine is an interface that provides methods for interacting with the DRE

com.sssw.search.api

EbiQuery

Interface that provides methods for constructing various types of queries and setting query properties

com.sssw.search.api

EbiQueryResult

Interface that provides methods for processing the results of queries executed by the Search subsystem

com.sssw.search.api

EboFactory

Factory class that provides methods for getting Search subsystem delegates such as EbiQueryEngineDelegate

com.sssw.search.client

EboFactory

Server-side factory class that provides methods for instantiating objects used by the Search subsystem—such as an EbiQuery object

com.sssw.search.factory

 
Top of section

Query methods

This section describes Search API methods that you can use for querying custom data sources in your exteNd Director applications.

Getting a query engine delegate

Here is the method for getting a query engine delegate:

  com.sssw.search.client.EboFactory.getQueryEngineDelegate()

This method returns an object that implements the EbiQueryEngineDelegate interface. Methods on this object can be used to configure the query engine and run queries.

For more information    For information about why to use delegates, see Programming practices.

Instantiating query objects for custom data sources

Autonomy-based queries are based on the EbiQuery interface that resides in the Search subsystem API. To search content and metadata in custom data sources, you must instantiate a query object that implements this interface. Here is the method to use:

  com.sssw.search.factory.EboFactory.getQuery()

Using this query object, you can call Search API methods to construct Autonomy-based queries and fine-tune search results.

Constructing queries for custom data sources

Here are key methods for constructing Autonomy-based queries for custom data sources:

Method

Description

com.sssw.search.API.EbiQuery.setQueryType()

Specifies the type of query you want to run

For more information    For more information, see Search Query Types

com.sssw.search.API.EbiQuery.setQueryText()

Specifies the query string

com.sssw.search.API.EbiQuery.setMaxNumResults()

Sets the maximum number of results to return

com.sssw.search.API.EbiQuery.setRelevanceCut()

Sets the minimum relevance criteria for query results

NOTE:   You use the same methods for constructing Autonomy-based queries for the CM repository. The difference is that you call these methods on a query object instantiated from a factory in the CM API, as described in Implementing querying for the CM repository.

Issuing queries against custom data sources

Here is the method for issuing queries:

  com.sssw.search.api.EbiQueryEngineDelegate.runQuery()

This method runs a query that you construct using the com.sssw.search.api.EbiQuery interface and returns the results as a collection of objects that implements the com.sssw.search.api.EbiQueryResult interface.

 
Top of section

Code example: issuing an Autonomy-based query against a custom data source

The following code segment presents the getComponentData() method of an exteNd Director component that implements the logic for issuing an Autonomy-based conceptual query against a custom data source:

  ...
  public void getComponentData( EbiPortalContext context, java.util.Map params ) throws com.sssw.fw.exception.EboUnrecoverableSystemException 
  {
     //Declare a string buffer
     StringBuffer sb = new StringBuffer();
  
     //Set the query string, using syntax for a conceptual query
     String queryString = "physician+specialty+orthopaedics";
  
     try
     {
        //Create a blank query object
        com.sssw.search.api.EbiQuery query = com.sssw.search.factory.EboFactory.getQuery();
  
        //Set query type to text
        query.setQueryType(query.QUERY_TYPE_TEXT);
  
        //Specify the query string; this is a conceptual query
        query.setText(queryString);
  
        //Ask for a maximum of 50 results
        query.setMaxNumResults(50);
  
        //Ask for results that are at least 80% relevant
        query.setRelevanceCut(80);
  
        //Ask to return all available document properties in the results
        query.selectAll();
  
        //Get the query engine delegate
        EbiQueryEngineDelegate qe = com.sssw.search.factory.EboFactory.getQueryEngineDelegate();
  
        //Run the query
        Iterator iterResults = qe.runQuery(context, query, null, true).iterator();
  
        //Process the results
        while (iterResults.hasNext())
        {
           com.sssw.search.api.EbiQueryResult res = (com.sssw.search.api.EbiQueryResult)iterResults.next();
  
           //Get document metadata
           String docTitle = res.getTitle();
           java.sql.Timestamp dateCreated = res.getDateCreated();
  
           //Get document abstract
           String docAbstract = res.getAbstract();
  
           //Add query result to the string buffer returned by the component
           sb.append("\n").append(docTitle).append("\n").append(dateCreated).append("\n").append(docAbstract).append("\n");
        }
     }
     catch (Exception _E)
     {
        System.out.println ("Query failed");
        if (m_log.isError())
           m_log.error(_E);
     }
     //Set content type
     context.setContentType(com.sssw.portal.api.EbiComponentConstants.MIME_TYPE_HTML_UTF8);
  
     //Place the content into the context
     context.setComponentContent( sb.toString() );
  }
  ...

 
Top of page

Search query descriptors

You can construct search query descriptors as XML files that can be used to initialize the search query object. The XML for a search query definition must conform to the rules specified in search-query-def_4_0.dtd, a file that resides in the DTD folder within the SearchService.jar of your exteNd Director project library folder.

 
Top of section

Advantages of using query descriptors

There are several advantages to initializing a query object programmatically from an XML query descriptor:

 
Top of section

Query type element

Every search query definition contains an element for specifying the query type:

  <!ELEMENT search-query-def (text-query | fuzzy-query | get-all-query | suggest-query | name-search-query)>

In turn, each query type element provides properties for refining the query. For example, consider the text query element:

  <!ELEMENT text-query (query-text?, field-spec?, query-options?, selected-props?)>

Using this definition, you can construct a field query by defining a field specifier list in the field-spec property that indicates which metadata to search for the text defined in query-text.

Here are the XML definitions for other query types:

The Search API provides a method for setting query type at runtime—setQueryType()—that you call on the EbiQuery object.

For more information    For a detailed description of each type of query—including syntax definitions and code examples showing how to specify each query type—see Search Query Types .

 
Top of section

Query options property

Each query type includes a query-options property that allows you to fine-tune query behavior. Here is the XML definition for query-options:

  <!ELEMENT query-options (
  	 batch-options?,
  	 date-range?,
  	 exclusions?,
  	 generate-quick-summary?,
  	 thesaurus-options?,
  	 max-num-results?,
  	 relevance-cut?,
  	 sort-by-date?,
  	 sort-by-relevance?,
  	 use-abs-weight?
  )>

For each of these options, the Search API provides methods that you can call on the EbiQuery object for setting options individually at runtime. Here is a description of each option:

Query option

Description

Associated method

batch-options

Return the result set in batches of a particular size

setBatchOptions()

date-range

Search within the specified range of document creation dates

setDateRange()

exclusions

Exclude the specified documents from the query results

setExclusions()

generate-quick-summary

Generate quick summaries for each item in the result set

setGenerateQuickSummary()

thesaurus-options

Set up a thesaurus repository for thesaurus queries

setThesaurus()

max-num-results

Set the maximum number of results to return

setMaxNumResults()

relevance-cut

Set the relevance cut (the minimum similarity score) for query results

setRelevanceCut()

sort-by-date

Sort query results by date

setSortByDate()

sort-by-relevance

Sort query results by relevance

setSortByRelevance()

use-abs-weight

Return relevance scores as absolute weights rather than percentages

setUseAbsWeight()

 
Top of section

Selected properties

The selected-props property for query types allows you to specify the document properties to return in the query results. Here is the XML definition for selected-props:

  <!ELEMENT selected-props (prop-name* | select-all)>

Using this definition, you can specify that your query return individual document properties or all available document properties.

In addition, you can call the following Search API methods on the EbiQuery object to specify document properties at runtime:

Method

Description

select()

Select the specified document property to return in the query results

selectAll()

Return all available document properties in the query results

selectAlways()

Always return the specified document property in the query results

removeSelect()

Remove the specified property from the list of selected properties

 
Top of section

Example: defining a text query in XML

Here is an example of a text query defined in XML:

  <search-query-def>
  	 <text-query>
  	 	 <query-text><![CDATA[clinical+trials+diabetes+research]]></query-text>
  	 	 <field-spec>
  	 	 	 <field-spec-list><![CDATA[fnameTITLE=*report*+fnameCountry=*USA*]]></field-spec-list>
  	 	 	 <field-boolean-expr><![CDATA[fnameTITLE+AND+fnameCountry]]></field-boolean-expr>
  	 	 </field-spec>
  	 	 <query-options>
  	 	 	 <date-range>
  	 	 	 	 <date-from><![CDATA[11/01/2002]]></date-from>
  	 	 	 	 <date-to><![CDATA[11/02/2002]]></date-to>
  	 	 	 	 <date-pattern><![CDATA[dd/MM/yyyy]]></date-pattern>
  	 	 	 </date-range>
  	 	 	 <generate-quick-summary/>
  	 	 	 <max-num-results><![CDATA[50]]></max-num-results>
  	 	 	 <relevance-cut><![CDATA[70]]></relevance-cut>
  	 	 	 <sort-by-date/>
  	 	 	 <sort-by-relevance/>
  	 	 </query-options>
  	 	 <selected-props>
  	 	 	 <prop-name><![CDATA[AUTHOR]]></prop-name>
  	 	 	 <prop-name><![CDATA[TITLE]]></prop-name>
  	 	 	 <prop-name><![CDATA[CREATED]]></prop-name>	 
  	 	 </selected-props>
  	 </text-query>
  </search-query-def>

Based on this definition, this query is not merely a simple text query—or keyword search—but instead has the following characteristics:

Characteristic

Description

Performs a conceptual search

The <query-text> element specifies a search string in the form of a conceptual query:

  word1+word2+word3+... wordN

Searches content and metadata (field query)

The <field-spec> element specifies that the application search all documents that contain the string report in their title fields and the string USA in their country fields

Searches within a specified time period

The <date-range> element restricts the search to documents created between November 1 and 2, 2002

Specifies a maximum number of results

The <max-num-results> element specifies that a maximum of 50 results be returned

Specifies a relevance threshold

The <relevance-cut> element requests results that are at least 70% relevant

Returns specific document properties

The <selected-props> element indicates that the author, title, and date created should be returned for each document in the result set

This sample XML query definition resides in search-query-def_4_0_sample.xml, located in the DTD folder within the SearchService.jar of your exteNd Director project library folder.

 
Top of section

Example: initializing a query object from an XML descriptor

Here is sample code that initializes a query object from an XML query descriptor:

  ...
  //Instantiate a blank query object
  com.sssw.search.api.EbiQuery query = com.sssw.search.factory.EboFactory.getQuery();
  
  //Read in your query XML descriptor
  Document queryDesc = com.sssw.fw.util.EboXmlHelper.getDocumentFromString(myInputStream);
  
  //Initialize the blank query object with data from the XML
  //descriptor
  query.fromXML(queryDesc.getDocumentElement());
  ...

The getDocumentFromString() method returns a DOM document, converted from a string that represents an XML document—in this case the input argument myInputStream.

 
Top of page

Sorting query results

This section describes how to sort the results of Autonomy-based conceptual queries.

 
Top of section

Sorting by date and then relevance

You can sort query results by date, relevance, or both. When you sort by both properties, the results are first sorted by date, then by relevance.

Procedure To sort by date and then relevance:

  1. Define a com.sssw.search.api.EbiQuery object.

  2. Call any of the following methods on that object:

    Sorting factor

    To

    Call

    Relevance

    Enable sorting by relevance

    setSortByRelevance(true)

    Disable sorting by relevance

    setSortByRelevance(false)

    Date

    Enable sorting by date

    setSortByDate(true)

    Disable sorting by date

    setSortByDate(false)

 
Top of section

Sorting field query results

You can also sort results of field queries in ascending or descending order by a single parameter. The parameter can be the value of a standard metadata field or custom metadata field.

NOTE:   Standard metadata field names are listed in the [Fields] section of the DRE configuration file DirectorDRE.cfg, located at autonomy\engine in your exteNd Director installation directory.

Procedure To sort field query results:

  1. Before issuing a field query, make sure you configure your search environment to specify the types of metadata you want to search—standard metadata and/or custom metadata—as described in Setting Search Options.

  2. In a text editor, open the DRE configuration file DirectorDRE.cfg.

  3. Enable field sorting by setting the parameter FIELDSORT=1 in the [Server] section.

    NOTE:   If this parameter does not appear, add it to the file.

  4. Save and close the configuration file.

  5. Reset the DRE, as described in Resetting the DRE.

  6. Reindex the data, as described in Programming practices.

  7. Specify the sort parameter by appending one of these expressions to the field specifier list you created for your field query:

    Sort expression

    Description

    &fsort=FIELDNAME

    Sort in ascending order by the value of the field FIELDNAME

    &fsort=-FIELDNAME

    Sort in descending order by the value of the field FIELDNAME

    For example, suppose that in your CM repository you define a document type called Colleges, and two custom fields—Ranking and Location. If you want to find all colleges located in Massachusetts, sorted in descending order by rank, your field specifier should look like this:

      ...
      String fieldSpecList =    "fnameDOCTYPENAME=*Colleges*+fnameLocation=*Massachusetts*+   &fsort=-Ranking";
      String fieldBooleanExpr = "fnameDOCTYPENAME+AND+fnameLocation";
      query.setFieldSpecList(fieldSpecList, fieldBooleanExpr);
      ...
      
    

    For more information    For more information about constructing and implementing field queries, see Field queries.



Copyright © 2004 Novell, Inc. All rights reserved. Copyright © 1997, 1998, 1999, 2000, 2001, 2002, 2003 SilverStream Software, LLC. All rights reserved.  more ...