Novell Doc: Novell Vibe 3.2 Administration Guide - Understanding and Configuring Search Functionality

4.12 Understanding and Configuring Search Functionality

Novell Vibe contains various features that control how items are indexed and how searches are executed on the Vibe site. As the Vibe administrator, you can configure the settings for these features. The default configurations are optimal for most Vibe sites.

These features are supported only with Vibe folder entries that contain English or other Western European languages. (For a list of supported languages, see Section 4.12.6, Supported Languages for Indexing and Searching for the Root Form of Words.)

4.12.1 Removing Frequently Used Words That Have No Inherent Meaning

Vibe removes frequently used words that have no inherent meaning when items are indexed and when users perform a search. Examples of such words are a, an, the, in, on, the, and so forth.

This includes when users perform a search with quotation marks. For example, “sell the products” would return all of the following: sell their products, sell with products, sell the products, and so forth. However, it would not return sell products.

You can customize the words that Vibe considers to have no inherent meaning. For more information, see lucene.indexing.stopwords.file.path.

This functionality is enabled by default. For information on how to modify the default settings, see Section 4.12.5, Modifying Configuration Settings.

4.12.2 Searching for Various Forms of the Same Word

When Vibe indexes a word, it indexes the root form of the word. Likewise, when users perform a search for a word, Vibe searches for the root form of the word and returns all matches. For example, performing a search on the word research returns all forms of the word, including researching, researched, and researches. Likewise, searching for the word researching returns results for research, researches, and so forth.

This functionality is enabled by default. For information on how to modify the default settings, see Section 4.12.5, Modifying Configuration Settings.

4.12.3 Searching for Words That Contain Accents

When Vibe indexes a word, it indexes the word without accents, regardless of whether the word originally contains accents. Likewise, when users perform a search for a word, Vibe searches for the word without accents, regardless of whether the user uses accents during the search. This means that when a user performs a search on the word cliché, Vibe returns results for the word cliché and cliche. Vibe also returns both forms of the word if the user performs a search on the word cliche.

This functionality is enabled by default. For information on how to modify the default settings, see Section 4.12.5, Modifying Configuration Settings.

4.12.4 Increasing the Number of Words That Are Indexed for Each Document

Vibe indexes all the words (and their variations) in documents that are uploaded to the Vibe site. In most cases, the default settings are sufficient to index documents that contain tens of pages.

If you have very large documents (hundreds or thousands of pages per document) that are stored on your Vibe site, you might encounter problems with parts of the document not being returned in search results. This can be due to the fact that the document is not being indexed in its entirety, because the document is too large. In this case, you might want to increase the number of words that are indexed for each document. Be aware that increasing the number of words that are indexed uses more system resources (RAM and disk space).

You can increase the number of words that are indexed for each document that is added to the Vibe site. For information on how to do this, see Section 4.12.5, Modifying Configuration Settings.

4.12.5 Modifying Configuration Settings

The location of the configuration file on the Vibe server depends on whether you have a single Lucene Index Server that is located on the same server as Vibe, or if your Lucene Index Server is located on a remote server or you have multiple Lucene Index Servers.

For more information about the various ways that you can set up the Lucene Index Server, see Changing Your Lucene Index Server Configuration in the Novell Vibe 3.2 Installation Guide.

To modify configuration settings for the Search feature:

Change to the following directory:

Linux:	/opt/novell/teaming/apache-tomcat/ webapps/ssf/WEB-INF/classes/config
Windows:	c:\Program Files\Novell\Teaming\apache-tomcat\ webapps\ssf\WEB-INF\classes\config

Open the ssf.properties file in a text editor if you have a single Lucene Index Server that is located on the same server as Vibe.

or

Open the lucene-server.properties file in a text editor if your Lucene Index Server is located on a remote server or you have multiple Lucene Index Servers.
Scroll down to locate the line for the search functionality that you want to change.

For information on each configuration setting that you can modify for searching on the Vibe site, see Configuration Settings for Search Features.
Copy that line to the clipboard of your text editor.
Depending on which file you are modifying (as described in Step 2), make a backup copy of the corresponding ssf-ext.properties file or lucene-server-ext.properties file, which are located in the same directory as the ssf.properties file and the lucene-server.properties file.
Open either the ssf-ext.properties file or the lucene-server-ext.properties file.
Scroll to the end of the ssf-ext.properties file or the lucene-server-ext.properties file, then paste the line you copied.
Edit the setting for the appropriate search functionality as needed.
Save and close the ssf-ext.properties file or the lucene-server-ext.properties file.
Close the ssf.properties file or the lucene-server.properties without saving.
Stop and restart Vibe to put the modified search customizations into affect for your Vibe site.
Re-index the Vibe site, as described in Section 24.4, Rebuilding the Lucene Index.

Configuration Settings for Search Features

The following tables show the configuration settings that you can modify for the various search features. Each configuration setting has a lucene.indexing setting and a corresponding lucene.searching setting. Both setting must be configured in order to produce the desired functionality.

Table 4-2 Removing Frequently Used Words

Setting	Function
lucene.indexing.stopwords.enable	Enables or disables the functionality that removes frequently used words that have no inherent meaning when items are added to the index. For more information, see Section 4.12.1, Removing Frequently Used Words That Have No Inherent Meaning. By default, the value is true (enabled).
lucene.indexing.stopwords.file.charset	If you have provided your own file that contains frequently used words that you want to be ignored (as described in lucene.indexing.stopwords.file.path), you can change the default character encoding of the file that contains the new words. By default, the value is UTF-8.
lucene.indexing.stopwords.file.path	Enables you to point to a file that you create that contains your own list of words that you want Vibe to ignore when items are added to the index. This file should be in a directory where it does not get overwritten or removed during an upgrade. If you are running Vibe in a clustered environment, this should be a directory that is accessible to and shared by all Vibe nodes. You must specify the full path to the file. Each line of the file should contain only one word. All words in the file must be in lowercase. By default, there is no file path specified, and Vibe defaults to a list of common words that are not normally useful when performing a search, such as a, in, this, and so forth.
lucene.searching.stopwords.enable	Enables or disables the functionality that removes frequently used words that have no inherent meaning when users perform a search. For more information, see Section 4.12.1, Removing Frequently Used Words That Have No Inherent Meaning. By default, the value is true (enabled).
lucene.searching.stopwords.file.charset	If you have provided your own file that contains frequently used words that you want to be ignored (as described in lucene.indexing.stopwords.file.path), you can change the default character encoding of the file that contains the new words. By default, the value is UTF-8.
lucene.searching.stopwords.file.path	Enables you to point to a file that you create that contains your own list of words that you want Vibe to ignore when performing a search. This file should be in a directory where it does not get overwritten or removed during an upgrade. If you are running Vibe in a clustered environment, this should be a directory that is accessible to and shared by all Vibe nodes. You must specify the full path to the file. Each line of the file should contain only one word. All words in the file must be in lowercase. By default, there is no file path specified, and Vibe defaults to a list of common words that are not normally useful when performing a search, such as a, in, this, and so forth. If you leave all three search features enabled (removing frequently used words, searching for various forms of the same word, and searching for words that contain accents), and you want to specify words to ignore that contain accents, you must specify both forms of the word (with and without the accents).

Table 4-3 Searching for Various Forms of the Same Word

Setting	Function
lucene.indexing.stemming.enable	Enables or disables the functionality that indexes various forms of the same word. For more information, see Section 4.12.2, Searching for Various Forms of the Same Word. By default, the value is true (enabled).
lucene.indexing.stemming.stemmer.names	Allows you to specify the language that you want Vibe to use when indexing the root form of words. For more information, see Section 4.12.2, Searching for Various Forms of the Same Word. By default, the language is English. For information about which languages are available, see Section 4.12.6, Supported Languages for Indexing and Searching for the Root Form of Words.
lucene.searching.stemming.enable	Enables or disables the functionality that allows users to search for various forms of the same word. For more information, see Section 4.12.2, Searching for Various Forms of the Same Word. By default, the value is true (enabled).
lucene.searching.stemming.stemmer.names	Allows you to specify the language that you want Vibe to use when searching for the root form of words. For more information, see Section 4.12.2, Searching for Various Forms of the Same Word. By default, the language is English. For information about which languages are available, see Section 4.12.6, Supported Languages for Indexing and Searching for the Root Form of Words.

Table 4-4 Searching for Words That Contain Accents

Setting	Function
lucene.indexing.asciifolding.enable	Enables or disables the functionality that indexes words with accents as well as the same word without the accents. For more information, see Section 4.12.3, Searching for Words That Contain Accents. By default, the value is true (enabled).
lucene.searching.asciifolding.enable	Enables or disables the functionality that allows users to search for words with accents as well as the same word without the accents. For more information, see Section 4.12.3, Searching for Words That Contain Accents. By default, the value is true (enabled).

Setting

Function

lucene.indexing.asciifolding.enable

Enables or disables the functionality that indexes words with accents as well as the same word without the accents. For more information, see Section 4.12.3, Searching for Words That Contain Accents.

By default, the value is true (enabled).

lucene.searching.asciifolding.enable

Enables or disables the functionality that allows users to search for words with accents as well as the same word without the accents. For more information, see Section 4.12.3, Searching for Words That Contain Accents.

By default, the value is true (enabled).

Table 4-5 Increasing the Number of Words That Are Indexed for Each Document

Setting	Function
lucene.max.fieldlength	Designates the maximum number of terms that are indexed for each document that is uploaded to Vibe. Be aware that the number of terms per document can be much higher than the number of words in the document (because stemming and ascii folding capabilities enable various forms of each word in the document to be indexed). Only terms that are indexed appear in search results. For more information, see Section 4.12.4, Increasing the Number of Words That Are Indexed for Each Document. By default, the value is 100000.
doc.conversion.size.threshold	Depending on your specific environment and need, you might need to modify this setting in addition to the lucene.max.fieldlength setting. Designates the size of the document that can be indexed in the Vibe site. For more information, see Section 4.12.4, Increasing the Number of Words That Are Indexed for Each Document. By default, the value is 31457280.
doc.max.text.extraction.size.threshold	Depending on your specific environment and need, you might need to modify this setting in addition to the lucene.max.fieldlength setting. Designates the maximum size of a document that can be indexed in the Vibe site after the document has undergone file conversion. For more information, see Section 4.12.4, Increasing the Number of Words That Are Indexed for Each Document. By default, the value is 1048576 (about 1MB).

Setting

Function

lucene.max.fieldlength

Designates the maximum number of terms that are indexed for each document that is uploaded to Vibe. Be aware that the number of terms per document can be much higher than the number of words in the document (because stemming and ascii folding capabilities enable various forms of each word in the document to be indexed). Only terms that are indexed appear in search results. For more information, see Section 4.12.4, Increasing the Number of Words That Are Indexed for Each Document.

By default, the value is 100000.

doc.conversion.size.threshold

Depending on your specific environment and need, you might need to modify this setting in addition to the lucene.max.fieldlength setting.

Designates the size of the document that can be indexed in the Vibe site. For more information, see Section 4.12.4, Increasing the Number of Words That Are Indexed for Each Document.

By default, the value is 31457280.

doc.max.text.extraction.size.threshold

Depending on your specific environment and need, you might need to modify this setting in addition to the lucene.max.fieldlength setting.

Designates the maximum size of a document that can be indexed in the Vibe site after the document has undergone file conversion. For more information, see Section 4.12.4, Increasing the Number of Words That Are Indexed for Each Document.

By default, the value is 1048576 (about 1MB).

4.12.6 Supported Languages for Indexing and Searching for the Root Form of Words

By default, when Vibe indexes a word, it indexes the root form of the word. Likewise, when users perform a search for a word, Vibe searches for the root form of the word and returns all matches. (For more information, see Section 4.12.2, Searching for Various Forms of the Same Word.)

The default language of the Vibe site is irrelevant in regards to indexing and searching for the root form of words; Vibe detects the language for each individual entry when it performs the indexing and search.

You can configure Vibe to use any of the following languages when indexing and searching for the root form of words (the default is English):

Danish
Dutch
English
Finnish
French
German
German2 (This is a modified version of German that handles umlaut characters differently. Appends an e after vowels that would otherwise have an umlaut. For example, ä becomes ae, ë becomes oe, and ü becomes ue.)
Hungarian
Italian
Norwegian
Porter (This is for the English language; this option simply indexes and searches in a different way.)
Portuguese
Romanian
Russian
Spanish
Swedish
Turkish