Content Search Guide

CHAPTER 7

Configuring the Dynamic Reasoning Engine for Specialized Searching

This chapter explains how to configure your exteNd Director Dynamic Reasoning Engine (DRE) to perform specialized search tasks.

The following topics are covered:

 
Top of page

Searching for numbers

This section describes how to enable number searches using the DRE configuration file.

Procedure To enable searching for numbers:

  1. In a text editor, open the DRE configuration file DirectorDRE.cfg, located at autonomy\engine in your exteNd Director installation directory.

  2. Set the parameter INDEXNUMBERS=1.

    NOTE:   If this parameter does not appear, add it to the file.

  3. Delete the parameter DONTINDEXNUMBERS=1.

  4. Save and close the configuration file.

  5. Reset the DRE, as described in Resetting the DRE.

  6. Reindex the data, as described in Programming practices.

 
Top of page

Searching in other languages

This section describes how to configure the DRE for searching in other languages, including those that use a multibyte character set (MBCS). By default, the DRE is configured to process English-language data.

Procedure To search in other languages:

  1. Configure your search environment to import multibyte character set (MBCS) and other binary formats, as described in Importing MBCS and other binary formats below.

  2. Set language-specific configuration parameters, as described in Modifying language-specific configuration parameters.

  3. (Optional) Copy sentence-breaking files into the directory where the DRE resides, as described in Providing sentence-breaking files (optional).

  4. Reset the DRE, as described in Resetting the DRE.

  5. Reindex the data, as described in Forcing indexing.

 
Top of section

Importing MBCS and other binary formats

This section describes how to configure your search environment to import multibyte character set (MBCS) and other binary formats into the DRE for indexing.

You enable MBCS support by configuring Autonomy Omnislave, a plug-in module that converts data from binary file formats so it can be indexed in the DRE.

The Omnislave configuration file is called omnislave.cfg and resides in autonomy\OmniSlaves in your exteNd Director installation directory. The omnislave.cfg file contains two types of sections:

Section type

Description

[CONFIGURATION]

Provides general configuration settings that apply to all [<file_format>] sections that appear below it

[<file_format>]

Defines settings for specific file formats that you want OmniSlaves to extract

In the following example, formats are defined for Word, RTF, and Microsoft PowerPoint files:

  [Configuration]
  OmniConvertExtns0=*.doc
  OmniConvertLibraryCsvs0=wpconvdll.dll,wordconv.dll,rtfconv.dll
  OmniConvertConfigSectionCsvs0=WordPerfect,MSWord,Rtf
  OmniConvertExtns1=*.rtf
  OmniConvertLibraryCsvs1=rtfconv.dll
  OmniConvertConfigSectionCsvs1=Rtf
  OmniConvertExtns2=*.ppt
  OmniConvertLibraryCsvs2=pptconv.dll
  OmniConvertConfigSectionCsvs2=Ppt
  
  Logging=0
  LogAppend=TRUE
  LogMaxKBytes=500
  
  [MSWord]
  OutputCharSet=ASCII
  
  [Rtf]
  
  [Ppt]
  OutputCharSet=ASCII
  StopList=pptconv.dat

Procedure To enable support for MBCS and other binary formats:

  1. Open the omnislave.cfg in your favorite text editor.

  2. Create [<file_format>] sections for each of the file formats you want Omnislave to convert for indexing.

  3. In each [<file_format>] section, add a parameter OutputCharSet and set it to the character set to which you want to convert the file format.

    Choose one of these character set constants:

    For example, if you want to search a Word document in traditional Chinese, add the following lines of code under the appropriate [CONFIGURATION] section in the Omnislave configuration file:

      [MSWord]
      OutputCharSet=CHINESETRADITIONAL
    

 
Top of section

Modifying language-specific configuration parameters

You modify language-specific search parameters in the DRE configuration file DirectorDRE.cfg, located at autonomy\engine\ in your exteNd Director installation directory.

Procedure To modify language-specific parameters in the DRE:

  1. Open DirectorDRE.cfg in your favorite text editor.

  2. Set the CharConv parameter to the language you want the DRE to use:

    Language

    Value

    European

    0 (default)

    Japanese

    1

    Korean

    2

    Simplified Chinese

    4

    Traditional Chinese

    5

    Traditional Chinese indexed as Simplified Chinese

    6

    Eastern European

    7

    Russian WINANSI

    8

    Russian KO18

    9

    Hebrew

    10

    Greek

    11

    Swedish

    12

  3. Set the TermSize parameter to specify the maximum number of characters for any term in the DRE:

    Language

    Value

    English and European languages

    10 (default)

    German

    30

    Japanese

    30

    Korean

    40

  4. (Optional) Set the StripLanguage parameter to select which language to use when stripping terms to their stems (for example, stripping running to run):

    Option

    Value

    English

    0

    Conversion from UK to US English

    1

    no stripping

    2

    German

    3

    Italian

    4

    Russian

    5

    Advanced English

    6

    Spanish

    7

    Dutch

    8

    Advanced German

    9

    French

    10

    Greek

    11

    Swedish

    12

    Danish

    13

    Portuguese

    14

    Advanced Spanish

    15

    Norwegian

    16

    NOTE:   Use the advanced settings for English (6) and German (9) when possible. Exception: if you set the StripLanguage to 0 or 1 for English or 3 for German when you indexed content into the DRE, you must use those same settings when you send queries to the DRE.

 
Top of section

Providing sentence-breaking files (optional)

When you use languages that do not separate words with spaces, you must specify appropriate delimiters. exteNd Director provides language-specific sentence-breaking files on your product CD that you must copy into the directory where the DRE resides—autonomy\engine in your exteNd Director install directory. The following sections describe the sentence-breaking files and associated DRE configuration settings required for languages that do not delimit words with spaces.

Traditional Chinese

The required sentence-breaking files are:

Platform

Sentence-breaking files

Location on CD

NT

  • chinesebreaking.dll

  • big5togb.txt

  • wordlist.txt

  • chineseconvlist.txt

Autonomy\MBCS\chinese_nt_1_0_3.zip

UNIX

  • chinesebreaking.so

  • big5togb.txt

  • wordlist.txt

  • chineseconvlist.txt

Autonomy/MBCS/chinese_solaris_1_0_3.tar.Z

The required language-specific configuration settings are:

DRE configuration parameter

Value

CharConv

5

TermSize

40

StripLanguage

2

Simplified Chinese

The required sentence-breaking files are:

Platform

Sentence-breaking files

Location on CD

NT

  • chinesebreaking.dll

  • big5togb.txt

  • wordlist.txt

  • chineseconvlist.txt

Autonomy\MBCS\chinese_nt_1_0_3.zip

UNIX

  • chinesebreaking.so

  • big5togb.txt

  • wordlist.txt

  • chineseconvlist.txt

Autonomy/MBCS/chinese_solaris_1_0_3.tar.Z

The required language-specific configuration settings are:

DRE configuration parameter

Value

CharConv

4

TermSize

40

StripLanguage

2

Japanese

The required sentence-breaking files are:

Platform

Sentence-breaking files

Location on CD

NT

  • japanesebreaking.dll

  • \dic\tag.attr

  • \dic\tag.counter

  • \dic\tag.index

  • \dic\tag.mrph

  • \dic\tag.string

  • \dic\tag.table

  • jtag.dll

  • jtag.ini

  • jtag_at.dll

  • japaneseconvlist.txt

Autonomy\MBCS\japanese_nt_2_0_5.zip

UNIX

  • japanesebreaking.sl

  • /dic/system/jtag.attr

  • /dic/system/jtag.hash

  • /dic/system/jtag.id

  • /dic/system/jtag.mrph

  • /dic/system/jtag.offset

  • /dic/system/jtag.table

  • /dic/system/jtag.trie

  • jtag.ini

  • libcodeconv.sl

  • libjtag_at.sl

  • libjtag.sl

  • japaneseconvlist.txt

Autonomy/MBCS/japanese_solaris_2_0_5.tar.Z

The required language-specific configuration settings are:

DRE configuration parameter

Value

CharConv

1

TermSize

30

StripLanguage

2

Korean

The required sentence-breaking files are:

Platform

Sentence-breaking files

Location on CD

NT

  • koreanbreaking.dll

  • koreanconvlist.txt

  • Koma.dll

  • HanTag.dll

  • main.dat

  • prob.dat

  • main.fst

  • prob.fst

  • pos.nam

  • tag.nam

  • tagout.nam

  • connection.txt

  • stopposnam.txt

  • tagname.txt

Autonomy\MBCS\korean_nt_1_0_1.zip

UNIX

  • koreanbreaking.so

  • koreanconvlist.txt

  • main.dat

  • prob.dat

  • main.fst

  • prob.fst

  • pos.nam

  • tag.nam

  • tagout.nam

  • connection.txt

  • stopposnam.txt

  • tagname.txt

Autonomy/MBCS/korean_solaris_1_0_1.tar.Z

The required language-specific configuration settings are:

DRE configuration parameter

Value

CharConv

2

TermSize

40

StripLanguage

2



Copyright © 2004 Novell, Inc. All rights reserved. Copyright © 1997, 1998, 1999, 2000, 2001, 2002, 2003 SilverStream Software, LLC. All rights reserved.  more ...