Improving Search Results through Intelligent Indexing

You can improve the accuracy of your search results by following these indexing guidelines:

When defining and creating your indexes, start with the highest-level Web Site URLs and File System Paths possible.

If content is showing up in your search results that you don't want included, try removing some paths or URLs from your defined indexes. Also, try excluding specific subdirectories that you know or suspect might contain content that you don't want searched.

If you've indexed too many file types and cluttered your search results, try removing file types that you don't want indexed by using the Extensions to Exclude option on the Define Index page.

Use the Robots META tag in your Web site's content.

Exclude documents or specific sections of documents, including headers, footers, and navigation bars.

Excluding Documents from Being Indexed

One way to improve search results is to guard what content is actually indexed, thus clearing a path for relevant information.

Using the Extensions to Exclude Option

You can use the Extensions to Exclude option to direct Web Search to ignore specific file types. For example, if you don't want Word or PowerPoint documents to be included in search results, you would enter DOC and PPT in the Extensions to Exclude field. When these document types are encountered during and indexing job, Web Search skips over them.

Using the Extentions to Include Option

As mentioned above, you can use the Extensions to Exclude option to direct Web Search to ignore specific filetypes. However, if you can't specify all of the extensions to exclude, use the Extensions to Include option and specify all acceptable file extensions. A typical list would specify HTM, HTML, PDF, TXT, and DOC.

HINT: When entering extensions in the Extensions to Exclude box, separate each extension by a space or a hard return. Avoid using commas. For example:

htm html pdf txt doc

Using the Robots META Tag

Another effective way of controlling what Web Search indexes is using the Robots META tag, a tag inserted into the content that is being indexed by Web Search.

When a Web-based search engine encounters a document containing the Robots META tag, the search engine will do as the META tag instructs.

There are several values you can specify in the Robots META tag:

NOINDEX: Indicates that the document is not to be indexed.

NOFOLLOW: Indicates that hypertext links in the document are not to be crawled.

FOLLOWINDEX: Indicates that hypertext links in the document should be crawled.

ALL: Indicates that the document can be indexed and all links can be crawled.

NONE: Indicates that the document is not to be indexed and that hypertext links are not to be crawled.

To include the Robots META tag, use this syntax:

<META name="Robots" content="value, optional_value">

Using the Robots Comment Tag

You can also use the Robots Comment tag to exclude specific sections of HTML documents from your search results. For example, you might not want such sections as repetitive headers, footers, navigation bars, and server-side includes to be indexed.

HINT: You can also place these tags at the top and bottom of all include files so these sections never get indexed when part of a larger document.

To direct Web Search where to begin skipping content while indexing, do the following:

At the point in your HTML document where you want Web Search to begin skipping content while indexing, enter the following tag:

Just after the content you want skipped, enter the following tag:

Save your changes and index (or reindex) the content.