Content Search Guide
CHAPTER 6
This chapter describes the content query process and explains how to implement querying in your exteNd Director search applications.
The following topics are covered:
Queries are structured expressions that you can use to search content from a data source. Autonomy-based search capabilities in exteNd Director allow you to query both content and metadata using a single query expression, rather than requiring you to write separate queries for each type of data.
Querying metadata In exteNd Director you can query two types of metadata:
Standard (basic) metadata is descriptive information about content that is automatically attached to every document. Examples of standard metadata are title, author, and creation date.
Custom (extension) metadata is application-specific information about content that you define in the Content Management (CM) subsystem as fields in document types.
Querying content You can query content only if it has been published.
exteNd Director comes with a data fetcher that allows you to conduct Autonomy-based searches exclusively on content and metadata stored in the exteNd Director CM repository.
To query the CM repository, you use the CM subsystem in conjunction with the Search subsystem. The CM API provides classes that wrapper relevant search functions associated with the CM repository, as described in Implementing querying for the CM repository.
To use Autonomy technology with exteNd Director to search data sources other than the CM repository, you must purchase additional data fetchers from Autonomy, Inc.
To query custom data sources, you must use Search API classes to instantiate a query object and run the query against the other data sources you are licensed to use, as described in Implementing querying for custom data sources.
Alternatively, you can import your custom data into the CM repository and use the CM wrapper classes for implementing Autonomy-based conceptual queries.
The exteNd Director Search subsystem supports the following types of queries:
For detailed descriptions of each type of queryincluding syntax definitions and code examples showing how to specify each query typesee Search Query Types .
To implement Autonomy-based conceptual and keyword search in your exteNd Director applications, you use CM API functionality that wrappers the relevant Search APIs:
The wrapper classes provide methods for constructing and running queries on content and metadata that reside in the CM repository and have been indexed by the exteNd Director (Dynamic Reasoning Engine) DRE.
In addition, you can configure your environment to manage the processes of document fetching and querying, as described in Setting Search Options.
Key classes and interfaces for querying the CM repository include:
This section describes the CM API methods you can use to query the CM repository in your exteNd Director applications.
Here is the method for getting a content manager delegate:
com.sssw.cm.client.EboFactory.getDefaultContentMgmtDelegate()
This method returns a content manager delegate associated with the default CM repository. The content manager delegate provides methods for running Autonomy-based queries on document content and metadata in this repository.
For information about why to use delegates, see Programming practices.
Autonomy-based queries are based on the EbiQuery interfacean interface that resides in the Search API. To search content and metadata in the CM subsystem, you must instantiate a query object that not only implements this interface but also is associated with the CM repository. The CM API provides the method to use:
com.sssw.cm.factory.EboFactory.getQuery()
Using this query object, you can call Search API methods to construct Autonomy-based queries and fine-tune search results, as described in Constructing queries for the CM repository.
Here are key methods for constructing queries for the CM repository:
Method |
Description |
---|---|
com.sssw.search.API.EbiQuery.setQueryType() |
Specifies the type of query you want to run
|
com.sssw.search.API.EbiQuery.setQueryText() |
Specifies the query string |
com.sssw.search.API.EbiQuery.setMaxNumResults() |
Sets the maximum number of results to return |
com.sssw.search.API.EbiQuery.setRelevanceCut() |
Sets the minimum relevance criteria for query results |
NOTE: You use the same methods for constructing Autonomy-based queries for custom data sources. The difference is that you call these methods on a query object instantiated from a factory in the Search API, as described in Implementing querying for custom data sources.
Here is the method for querying the CM repository:
com.sssw.cm.api.EbiContentMgmtDelegate.runQuery()
This method runs a query that you construct using the com.sssw.search.api.EbiQuery interface and returns the results as a collection of objects that implements the com.sssw.cm.api.EbiQueryResult interface.
The following code segment demonstrates how to instantiate a query object and run a query against the default CM repository, called Default:
... public void getComponentData( EbiPortalContext context, java.util.Map params ) throws com.sssw.fw.exception.EboUnrecoverableSystemException { //Declare a string buffer StringBuffer sb = new StringBuffer(); //Set the query string String queryString = "The+effect+of+the+recession+on+consumer+spending"; try { //Create a blank query object com.sssw.search.api.EbiQuery query = com.sssw.cm.factory.EboFactory.getQuery(); //Set query type to text query.setQueryType(query.QUERY_TYPE_TEXT); //Specify the query string; this is a conceptual query query.setText(queryString); //Ask for a maximum of 50 results query.setMaxNumResults(50); //Ask for results that are at least 80% relevant query.setRelevanceCut(80); //Ask to return all available document properties in the results query.selectAll(); //Get the content manager delegate EbiContentMgmtDelegate contentMgr = com.sssw.cm.client.EboFactory.getDefaultContentMgmtDelegate(); //Run the query //The boolean argument in runQuery indicates whether results should be filtered Iterator iterResults = contentMgr.runQuery(context, query, true).iterator(); //Process the results while (iterResults.hasNext()) { com.sssw.cm.api.EbiQueryResult res = (com.sssw.cm.api.EbiQueryResult)iterResults.next(); //Get document metadata String docTitle = res.getTitle(); java.sql.Timestamp dateCreated = res.getDateCreated(); //Get document content String docAbstract = res.getAbstract(); //Add query result to the string buffer returned by the component sb.append("\n").append(docTitle).append(dateCreated).append(docAbstract).append("\n"); } } catch (Exception _E) { System.out.println ("Query failed"); _E.printStackTrace(); } //Set content type context.setContentType(com.sssw.portal.api.EbiComponentConstants.MIME_TYPE_HTML_UTF8); //Place the content into the context context.setComponentContent( sb.toString() ); } ...
As you can see, this component retrieves both standard metadatathe document title and the date createdand content from the query results. By default, the exteNd Director DRE is configured to index both types of information. This behavior is controlled by two search options that are enabled by default in the CM subsystem configuration file:
com.sssw.cm.fetch.process.content.
repository name
See the description in Index document content?.
com.sssw.cm.fetch.process.metadata
.repository name
See the description in Index standard document metadata?.
The CM subsystem provides many other options that you can configure to customize your search environment, as described in Setting Search Options.
The exteNd Director Search API provides wrapper classes around Autonomy APIs that provide methods for querying content in data sources other than the CM repository.
IMPORTANT: To use Autonomy technology with exteNd Director to search other data sources, you must purchase additional data fetchers from Autonomy, Inc.
Key classes and interfaces for querying custom data sources include:
This section describes Search API methods that you can use for querying custom data sources in your exteNd Director applications.
Here is the method for getting a query engine delegate:
com.sssw.search.client.EboFactory.getQueryEngineDelegate()
This method returns an object that implements the EbiQueryEngineDelegate interface. Methods on this object can be used to configure the query engine and run queries.
For information about why to use delegates, see Programming practices.
Autonomy-based queries are based on the EbiQuery interface that resides in the Search subsystem API. To search content and metadata in custom data sources, you must instantiate a query object that implements this interface. Here is the method to use:
com.sssw.search.factory.EboFactory.getQuery()
Using this query object, you can call Search API methods to construct Autonomy-based queries and fine-tune search results.
Here are key methods for constructing Autonomy-based queries for custom data sources:
Method |
Description |
---|---|
com.sssw.search.API.EbiQuery.setQueryType() |
Specifies the type of query you want to run
|
com.sssw.search.API.EbiQuery.setQueryText() |
Specifies the query string |
com.sssw.search.API.EbiQuery.setMaxNumResults() |
Sets the maximum number of results to return |
com.sssw.search.API.EbiQuery.setRelevanceCut() |
Sets the minimum relevance criteria for query results |
NOTE: You use the same methods for constructing Autonomy-based queries for the CM repository. The difference is that you call these methods on a query object instantiated from a factory in the CM API, as described in Implementing querying for the CM repository.
Here is the method for issuing queries:
com.sssw.search.api.EbiQueryEngineDelegate.runQuery()
This method runs a query that you construct using the com.sssw.search.api.EbiQuery interface and returns the results as a collection of objects that implements the com.sssw.search.api.EbiQueryResult interface.
The following code segment presents the getComponentData() method of an exteNd Director component that implements the logic for issuing an Autonomy-based conceptual query against a custom data source:
... public void getComponentData( EbiPortalContext context, java.util.Map params ) throws com.sssw.fw.exception.EboUnrecoverableSystemException { //Declare a string buffer StringBuffer sb = new StringBuffer(); //Set the query string, using syntax for a conceptual query String queryString = "physician+specialty+orthopaedics"; try { //Create a blank query object com.sssw.search.api.EbiQuery query = com.sssw.search.factory.EboFactory.getQuery(); //Set query type to text query.setQueryType(query.QUERY_TYPE_TEXT); //Specify the query string; this is a conceptual query query.setText(queryString); //Ask for a maximum of 50 results query.setMaxNumResults(50); //Ask for results that are at least 80% relevant query.setRelevanceCut(80); //Ask to return all available document properties in the results query.selectAll(); //Get the query engine delegate EbiQueryEngineDelegate qe = com.sssw.search.factory.EboFactory.getQueryEngineDelegate(); //Run the query Iterator iterResults = qe.runQuery(context, query, null, true).iterator(); //Process the results while (iterResults.hasNext()) { com.sssw.search.api.EbiQueryResult res = (com.sssw.search.api.EbiQueryResult)iterResults.next(); //Get document metadata String docTitle = res.getTitle(); java.sql.Timestamp dateCreated = res.getDateCreated(); //Get document abstract String docAbstract = res.getAbstract(); //Add query result to the string buffer returned by the component sb.append("\n").append(docTitle).append("\n").append(dateCreated).append("\n").append(docAbstract).append("\n"); } } catch (Exception _E) { System.out.println ("Query failed"); if (m_log.isError()) m_log.error(_E); } //Set content type context.setContentType(com.sssw.portal.api.EbiComponentConstants.MIME_TYPE_HTML_UTF8); //Place the content into the context context.setComponentContent( sb.toString() ); } ...
You can construct search query descriptors as XML files that can be used to initialize the search query object. The XML for a search query definition must conform to the rules specified in search-query-def_4_0.dtd, a file that resides in the DTD folder within the SearchService.jar of your exteNd Director project library folder.
There are several advantages to initializing a query object programmatically from an XML query descriptor:
You set all desired options with one method call, rather than making individual calls to define specific query properties.
Every search query definition contains an element for specifying the query type:
<!ELEMENT search-query-def (text-query | fuzzy-query | get-all-query | suggest-query | name-search-query)>
In turn, each query type element provides properties for refining the query. For example, consider the text query element:
<!ELEMENT text-query (query-text?, field-spec?, query-options?, selected-props?)>
Using this definition, you can construct a field query by defining a field specifier list in the field-spec property that indicates which metadata to search for the text defined in query-text.
Here are the XML definitions for other query types:
<!ELEMENT fuzzy-query (query-text?, field-spec?, query-options?, selected-props?)>
<!ELEMENT get-all-query (field-spec?, query-options?, selected-props?)>
<!ELEMENT suggest-query (doc-list, suggest-options?, field-spec?, query-options?, selected-props?)>
<!ELEMENT name-search-query (query-text?, field-spec?, query-options?, selected-props?)>
The Search API provides a method for setting query type at runtimesetQueryType()that you call on the EbiQuery object.
For a detailed description of each type of queryincluding syntax definitions and code examples showing how to specify each query typesee Search Query Types .
Each query type includes a query-options property that allows you to fine-tune query behavior. Here is the XML definition for query-options:
<!ELEMENT query-options ( batch-options?, date-range?, exclusions?, generate-quick-summary?, thesaurus-options?, max-num-results?, relevance-cut?, sort-by-date?, sort-by-relevance?, use-abs-weight? )>
For each of these options, the Search API provides methods that you can call on the EbiQuery object for setting options individually at runtime. Here is a description of each option:
The selected-props property for query types allows you to specify the document properties to return in the query results. Here is the XML definition for selected-props:
<!ELEMENT selected-props (prop-name* | select-all)>
Using this definition, you can specify that your query return individual document properties or all available document properties.
In addition, you can call the following Search API methods on the EbiQuery object to specify document properties at runtime:
Here is an example of a text query defined in XML:
<search-query-def> <text-query> <query-text><![CDATA[clinical+trials+diabetes+research]]></query-text> <field-spec> <field-spec-list><![CDATA[fnameTITLE=*report*+fnameCountry=*USA*]]></field-spec-list> <field-boolean-expr><![CDATA[fnameTITLE+AND+fnameCountry]]></field-boolean-expr> </field-spec> <query-options> <date-range> <date-from><![CDATA[11/01/2002]]></date-from> <date-to><![CDATA[11/02/2002]]></date-to> <date-pattern><![CDATA[dd/MM/yyyy]]></date-pattern> </date-range> <generate-quick-summary/> <max-num-results><![CDATA[50]]></max-num-results> <relevance-cut><![CDATA[70]]></relevance-cut> <sort-by-date/> <sort-by-relevance/> </query-options> <selected-props> <prop-name><![CDATA[AUTHOR]]></prop-name> <prop-name><![CDATA[TITLE]]></prop-name> <prop-name><![CDATA[CREATED]]></prop-name> </selected-props> </text-query> </search-query-def>
Based on this definition, this query is not merely a simple text queryor keyword searchbut instead has the following characteristics:
This sample XML query definition resides in search-query-def_4_0_sample.xml, located in the DTD folder within the SearchService.jar of your exteNd Director project library folder.
Here is sample code that initializes a query object from an XML query descriptor:
... //Instantiate a blank query object com.sssw.search.api.EbiQuery query = com.sssw.search.factory.EboFactory.getQuery(); //Read in your query XML descriptor Document queryDesc = com.sssw.fw.util.EboXmlHelper.getDocumentFromString(myInputStream); //Initialize the blank query object with data from the XML //descriptor query.fromXML(queryDesc.getDocumentElement()); ...
The getDocumentFromString() method returns a DOM document, converted from a string that represents an XML documentin this case the input argument myInputStream.
This section describes how to sort the results of Autonomy-based conceptual queries.
You can sort query results by date, relevance, or both. When you sort by both properties, the results are first sorted by date, then by relevance.
To sort by date and then relevance:
Define a com.sssw.search.api.EbiQuery object.
Call any of the following methods on that object:
Sorting factor |
To |
Call |
---|---|---|
Relevance |
Enable sorting by relevance |
setSortByRelevance(true) |
Disable sorting by relevance |
setSortByRelevance(false) |
|
Date |
Enable sorting by date |
setSortByDate(true) |
Disable sorting by date |
setSortByDate(false) |
You can also sort results of field queries in ascending or descending order by a single parameter. The parameter can be the value of a standard metadata field or custom metadata field.
NOTE: Standard metadata field names are listed in the [Fields] section of the DRE configuration file DirectorDRE.cfg, located at autonomy\engine in your exteNd Director installation directory.
Before issuing a field query, make sure you configure your search environment to specify the types of metadata you want to searchstandard metadata and/or custom metadataas described in Setting Search Options.
In a text editor, open the DRE configuration file DirectorDRE.cfg.
Enable field sorting by setting the parameter FIELDSORT=1 in the [Server] section.
NOTE: If this parameter does not appear, add it to the file.
Reset the DRE, as described in Resetting the DRE.
Reindex the data, as described in Programming practices.
Specify the sort parameter by appending one of these expressions to the field specifier list you created for your field query:
Sort expression |
Description |
---|---|
|
Sort in ascending order by the value of the field FIELDNAME |
|
Sort in descending order by the value of the field FIELDNAME |
For example, suppose that in your CM repository you define a document type called Colleges, and two custom fieldsRanking and Location. If you want to find all colleges located in Massachusetts, sorted in descending order by rank, your field specifier should look like this:
...
String fieldSpecList = "fnameDOCTYPENAME=*Colleges*+fnameLocation=*Massachusetts*+ &fsort=-Ranking";
String fieldBooleanExpr = "fnameDOCTYPENAME+AND+fnameLocation";
query.setFieldSpecList(fieldSpecList, fieldBooleanExpr);
...
For more information about constructing and implementing field queries, see Field queries.
Copyright © 2004 Novell, Inc. All rights reserved. Copyright © 1997, 1998, 1999, 2000, 2001, 2002, 2003 SilverStream Software, LLC. All rights reserved. more ...