Troubleshooting the NetWare Server

This section provides troubleshooting suggestions for typical NetWare server problems such as the following:


Resolving Abends

The NetWare operating system is very resilient, but errors can occur. Serious problems are usually accompanied by abend (abnormal end) messages. When an abend message appears, either NetWare or the CPU has detected a critical error condition and started the NetWare fault handler. NetWare uses abends to ensure the integrity of operating system data.

When a server abends, users might not be able to log in to the server, workstations might not be to read from or write to the server, and an abend message usually appears on the System Console or Logger screen of the server console. If the NetWare auto-recovery mechanism is enabled (default), NetWare might restart the server automatically or suspend the offending process, depending on the nature of the abend.

If there is no abend message on the console, no ABEND.LOG file in SYS:SYSTEM, and no number in brackets within the System Console prompt, but users still can't access the server, see Monitoring and Resolving Communication Problems. If there is no abend message but the console is frozen so that you cannot enter commands, see Server Console Hangs.


Understanding What Happens When You Get an Abend

When the server abends, it displays an abend message similar to the following:

Abend: SERVER-5.xx-message_number message_stringADDITIONAL INFORMATION: message

The Additional Information section states the probable cause of the abend. It indicates where the problem occurred and gives the name of any NLM associated with the abend. This information helps you determine how to resolve the abend.

The abend message, along with additional information, is saved in the ABEND.LOG file on drive C:. As soon as the server is restarted, the ABEND.LOG file is moved to SYS:SYSTEM.

You can respond to the abend manually or have the server respond automatically.

When you respond manually, the server determines the nature of the abend and displays the appropriate response option on the screen, along with additional options for bringing down the server or executing a core dump. You must execute an option to respond to the abend.

When the server responds automatically, it executes the appropriate response without intervention.

IMPORTANT:  Sometimes an abend (or a faulty NLM program) can cause the server console to stop functioning. In this case, the abend message is not displayed and you cannot enter commands at the console prompt.

After a server failure, we recommend turning the power off of the computer and restarting it rather than just exiting to the DOS prompt, C:\NWSERVER, and typing SERVER again.


Responding to the Abend Manually

The default method of responding to an abend is automatic. (See Responding to the Abend Automatically.)

To respond manually to abends, change the following SET parameter (Error Handling category) to the value shown:

AUTO RESTART AFTER ABEND = 0

This SET parameter controls what the server does after an abend. See the online help for a description of each value.

When an abend occurs, the server displays a short list of options appropriate to the nature of the abend. To respond to the abend, you must execute one of the options by typing the first letter of the option.

The following options could be displayed. Note that several of the options have the same first letter (such as R, S, or X). In a given abend situation, the option list will include only one option for any given first letter.

When the server restarts, it moves the ABEND.LOG file from the DOS partition to the SYS:SYSTEM directory.


Responding to the Abend Automatically

You can require the server to respond automatically to abends. Two automatic responses are possible.

Use the following SET parameter to specify how long the server waits after an abend before attempting to shut down and restart the computer:

AUTO RESTART AFTER ABEND DELAY TIME = minutes

To set the parameter values, use the SET command or MONITOR at the server console or NetWare Remote Manager from a workstation.

The Developer Option parameter is in the Miscellaneous category.

The Auto Restart After Abend and Auto Restart After Abend Delay Time parameters are in the Error Handling category.

All parameters can be set in the STARTUP.NCF file.

Because the server responds to the abend automatically, you might not know when an abend has occurred. Therefore, you should periodically check the ABEND.LOG file or the Profiling and Debug Information screen in NetWare Remote Manager (look for Suspended by Abend Recovery status).


Insufficient Packet Receive Buffers, No ECB Available Count Errors

The ECB (event control block) counter is incremented when a device sends a packet to your NetWare server but no packet receive buffer is available. This mean a packet has been dropped by the server.

The server allocates more packet receive buffers after each incident until it reaches its maximum limit (Maximum Packet Receiver Buffer setting).

If you are using an EISA busmaster board (such as the NE3200TM board), you will probably need to increase both the minimum and maximum number of packet receive buffers.

For procedures on setting the Minimum Packet Receive Buffers and Maximum Packet Receive Buffers parameters, see SET > Communications Parameters in Utilities Reference.

No ECB Available Count messages can also indicate that the driver is not configured correctly or that the Topology Specific Module (TSM) and the Hardware Specific Module (HSM) are incompatible. This value is maintained by the TSM.NLM program.

If the ECB count is increasing and all the packet receive buffers are in use, take a coredump (see Creating a Core Dump) and contact Novell technical support.


Resolving Slow Server Response

To diagnose slow server response problems, identify whether the following conditions exist:

To resolve slow server response problems, perform the following actions:


Server Console Hangs

If the server console locks up so that you cannot enter commands, but there is no abend message on the System Console or Logger screen, follow these steps to troubleshoot the problem. If there is an abend message on the screen, see Resolving Abends.

  1. Verify whether you can toggle among console screens.

    If yes, the problem might be caused by high server utilization. See High Utilization Statistics. If no, continue with the following steps.

  2. Verify whether the server console hangs when you unload a specific NLM.

    If yes, the NLM is probably the source of the problem. Contact the NLM vendor.

  3. Make sure you are using the latest disk and LAN drivers, BIOS, and firmware.

    If not, update disk and LAN drivers. For information on NetWare drivers, see Keeping Your Servers Patched.

  4. Verify whether the server console hangs after you mounted the last volume.

    If yes, the network board might not be seated correctly or might not be configured correctly. Check the board and its configuration and correct any problems.

  5. Verify whether you can you break into the debugger by pressing Shift+Shift+Alt+Esc on the system console keyboard.

  6. If the console is locked, you can't toggle among screens and you can't enter the debugger, contact Novell Technical Support or your computer vendor to learn how to generate a nonmaskable interrupt to shut down the server.

If the problem still occurs, follow the troubleshooting steps in Using a Troubleshooting Methodology; search the Novell Knowledgebase; and contact a Novell Support Provider.


High Utilization Statistics

Network performance is a key concern for network administrators and for Novell as well. Unfortunately, sometimes there is confusion about performance indicators and what their statistics mean.

For example, the idea that processor utilization is the key performance indicator for NetWare is much too simple. Some network administrators are concerned when the CPU Utilization health status in NetWare Remote Manager or the Utilization value in MONITOR's General Information screen approaches 100%, on the assumption that the higher the percentage, the worse NetWare's performance is. This is entirely false.


What Is Normal?

Consider first what the Utilization value represents: the average of the server's total processing capacity that was used during the last second (update interval). The remainder of the capacity was spent in the idle loop process. In other words, it is an indication of how much of that time the processor had something to do. A high utilization value means that NetWare is using that percentage of the processor's capacity and wastes less time doing nothing.

Some processes make efficient use of the processor and as a result might cause 100% utilization. This type of utilization is entirely appropriate. Most of the time, when utilization moves up to 100%, it means that the thread is using the processor efficiently. It might stay at 100% for a couple of minutes; this is normal.

It is not normal, however, when the utilization is at 100% for 15 to 20 minutes or more, when connections are dropped, or when server performance deteriorates noticeably. High utilization with these conditions indicates a problem. If you're not seeing these conditions, your utilization might be normal, even when it's at 100%.

How do you know what is normal for your server? You will recognize problems if you "baseline" your server. Know what is normal and know the difference between a cosmetic problem and a true performance problem. (You can test for a cosmetic problem by loading or unloading any NLM; this will cause the processor information to be recalculated.)


What Are the Most Common Solutions?

Before troubleshooting high utilization problems, make sure that you have followed the steps in Using a Troubleshooting Methodology. Check the Novell Support Connection Web site for NetWare patches or updated NLM programs. Available patches will contain fixes for reported high utilization problems related to the actual code of the operating system and eDirectory.

However, a number of high utilization conditions can still result from problems with configuration, levels of NLM programs, and tuning issues.

One of the first things you might want to do is discover the NLM program and threads that are using the CPU. To do this, complete the following steps:

  1. Access NetWare Remote Manager.

  2. Click the Profile/Debug link in the navigation frame.

  3. Click the Profile CPU Execution by NLM link.

  4. Note the parent NLM program and threads that are taking the longest execution time.

  5. If possible, unload the offending NLM program to see if the problem disappears.

You can also use the following list of issues to help you resolve problems.

The items in the list are categorized, but are otherwise in no particular order. The list represents the collective experience of Novell Support representatives. We recommend that you review each item, using each to carefully analyze your system. Except for problems new to NetWare 6, you will be able to resolve the problem on your own in almost every case.


Operating System Issues


Storage Devices and Adapter Issues


Memory Issues


Novell eDirectory Issues


Client Issues

If the problem still occurs, follow the troubleshooting steps in Using a Troubleshooting Methodology; review tips in TID 10011512 "Troubleshooting High Utilization"; search the Novell Knowledgebase for High Utilization; and contact a Novell Support Provider.


Disk Errors

To resolve disk I/O, disk space, and mirroring problems, see the following sections:


Resolving General Server Disk I/O Errors

To resolve a general disk I/O error on the server, try one or more of the following remedies:

If you have tried all the preceding suggestions without success, contact your Novell Support Provider or the drive manufacturer.


Resolving Server Disk Space Problems

To resolve an insufficient disk space error, do one or more of the following:


Mirrored Partitions Do Not Remirror Automatically

When mirrored partitions become unsynchronized, they should resynchronize automatically. If partitions do not resynchronize, complete the following steps:

  1. In ConsoleOne, browse and select the tree you want to manage, and then click the Partition Disk Management icon.

  2. Enter the eDirectory tree and context and server information.

  3. Click Properties > Media > Mirror > Resync.

  4. If the partitions still do not resynchronize, you must re-create the mirrored set.

    1. Determine which disk partition has the data you want to save and mirror.

    2. Delete the other disk partitions.

    3. Recreate new partitions in place of the ones you deleted.

    4. Mirror the partition containing data to the new partitions.

    For information about mirroring, see Creating a Partition in the Novell Storage Services Administration Guide.

If the problem still occurs, follow the troubleshooting steps in Using a Troubleshooting Methodology, search the Novell Knowledgebase, and contact a Novell Support Provider.


Mirroring Takes a Long Time

If partitions are very large, mirroring can sometimes take several hours to complete; this is normal. The following might help to speed the mirroring process:

If the problem still occurs, follow the troubleshooting steps in Using a Troubleshooting Methodology, search the Novell Knowledgebase, and contact a Novell Support Provider.


Mirroring Stops Just Before It Is Finished

Sometimes, the mirroring process proceeds without error but stops at 99% completion. To troubleshoot the problem, do the following:

If the problem still occurs, follow the troubleshooting steps in Using a Troubleshooting Methodology, search the Novell Knowledgebase, and contact a Novell Support Provider.


Resolving Disk Error Problems When a Traditional Volume Is Mounting

To diagnose problems when disk errors occur while a Traditional volume is mounting, identify whether the following conditions exist:

To resolve problems when disk errors while a volume is mounting, do the following:


Resolving Server Memory Problems

To troubleshoot different kinds of server memory problems, to resolve memory leaks, and to resolve memory problems by freeing memory, see the following sections:


NetWare Doesn't Recognize All the Memory in the Server

Use the following steps to find the source of the problem.

  1. Verify whether you are using the NetWare memory manager or an external memory manager. Does CONFIG.SYS or AUTOEXEC.BAT include a DOS=HIGH statement or commands to load memory managers or DOS device drivers? For example, is there a command to load HIMEM.SYS or EMM386.EXE? Both are memory managers.

    Comment out these statements from CONFIG.SYS or delete CONFIG.SYS altogether. Comment out these statements from AUTOEXEC.BAT. (To comment out a command, type REM and a space at the beginning of the command line.)

    If there is a memory manager in the server, NetWare relies upon the memory manager to determine the amount of available memory instead of registering the memory itself. Some memory managers in older computers cannot recognize more than 64 MB of memory. DOS device drivers take memory away from NetWare's memory pool.

    Make sure Windows 95 is not being used to boot the server. Windows 95 autoloads memory managers.

  2. Make sure the server BIOS is current.

    An out-of-date BIOS might be reporting the wrong amount of memory. If a newer version is available, update the BIOS.

If the problem still occurs, follow the troubleshooting steps in Using a Troubleshooting Methodology, search the Novell Knowledgebase, and contact a Novell Support Provider.


Resolving Server Memory Leaks

A memory leak means that an NLM or set of NLM programs has requested memory from the server, but has not returned the memory when finished with it. Over time, the amount of available memory decreases until eventually the server generates memory error messages. The memory leak might be slow or fast depending on the amount of memory requested each time.

If you reboot the server, the memory is returned to the memory pool, and the low memory error messages stop temporarily, until the memory leak ties up enough memory to generate the error messages again.

To see if the server has a memory leak, restart the server and then monitor memory statistics (Total Cache Buffers) over time. When traffic hasn't increased and no new applications are installed on the server and the statistics change, use the following steps to find the source of the problem.

  1. Load all the latest patches on the server.

    Server patches are available from Novell's Support Web Site and other locations. See Applying Patches for a list of sources.

  2. Restart the server to free memory and establish a baseline for memory use.

  3. View the memory statistics for the module:

    1. Access NetWare Remote Manager.

    2. Click the List Modules in the navigation frame.

    3. Sort the list for memory usage by clicking the Alloc Memory button.

    4. Click the value link for allocated memory for each module name you suspect might be the source of the leak.

      Under normal conditions, modules such as SERVER.NLM, NSS.NLM and DS.NLM are usually at the top of the list.

    5. Print this page and use it as a the baseline as you monitor the module's memory use over time.

  4. Repeat Step 3 for each NLM you suspect might be the source of a memory leak.

  5. (Conditional) If the memory error messages occur again, repeat Step 3 to view the memory statistics for each suspected NLM. Note whether memory use increased substantially for any of the modules.

    If there is a memory leak, one or more modules will show a large increase in the Bytes in Use value.

  6. When you discover the source of the memory leak, contact the module vendor to tell them about the problem. If possible, update the module or remove the module from the server.


Freeing Server Memory Temporarily

To free server memory temporarily (until you can add more memory to the server), do one or more of the following:


Resolving Memory Errors When a Traditional Volume Is Mounting

To diagnose problems when memory errors while a volume is mounting, identify whether the following conditions exist:

To resolve problems when memory errors while a volume is mounting, perform the following actions or ensure that the following conditions exist:


Server Displays Memory Error Messages

Typical memory error messages include the following:

If any of these conditions exist, use the following steps to find the source of the problem:

  1. Make sure the server is not loading a memory manager or a DOS device drivers.

    Check the AUTOEXE.BAT and CONFIG.SYS files to make sure no memory managers, such as HIMEM.SYS or EMM386.EXE, are being loaded, and that there is no DOS=HIGH statement in either file. Make sure no DOS device drivers are being loaded.

    Comment out these statements from CONFIG.SYS or delete CONFIG.SYS altogether. Comment out these statements from AUTOEXEC.BAT. (To comment out a command, type REM and a space at the beginning of the command line.)

    If there is a memory manager in the server, NetWare relies upon the memory manager to determine the amount of available memory instead of registering the memory itself. Some memory managers cannot recognize more than 64 MB of memory. DOS device drivers take memory away from NetWare's memory pool.

    Make sure Windows 95 is not being used to boot the server. Windows 95 autoloads memory managers.

  2. Make sure the server BIOS is current.

    An out-of-date BIOS might be reporting the wrong amount of memory. Update the BIOS if a newer version is available.

  3. Verify that the setting for the Reserved Buffers Below 16 MB SET parameter (Memory category) is set at 300 or higher.

    For older drivers, increase the value to 300 or higher, especially if there is a CD-ROM or tape device that needs memory below 16 MB.

  4. Make sure memory is being registered automatically.

    Manually registering memory can cause memory fragmentation. Some old system boards might require you to register memory manually, but the better solution is to upgrade to a newer board so that NetWare will register the memory automatically.

    If memory has been registered manually, reboot the server to free memory and do not manually register memory again. Upgrade the system board if necessary.

  5. Verify whether memory errors occur when a traditional volume is mounting.

    If yes, the server might be low on memory.

    To free memory temporarily, see Freeing Server Memory Temporarily. To solve the problem, add more RAM.

  6. Verify whether the "LRU sitting time" (in NetWare Remote Manager or MONITOR), average is more than 15 minutes during peak work hours.

    If no, the server might be low on memory.

    To free memory temporarily, see Freeing Server Memory Temporarily. To use the LRU Sitting Time to tune memory, see Tuning File Cache in the Server Memory Administration Guide. To solve the problem, add more RAM.

  7. Check for memory leaks.

    Do the LRU Sitting Time and Long Term Cache Hits gradually decline over time, even when network traffic has not increased and no new applications have been installed on the server?

    If yes, the server might have a memory leak. See Resolving Server Memory Leaks.

If the problem still occurs, follow the troubleshooting steps in Using a Troubleshooting Methodology, search the Novell Knowledgebase, and contact a Novell Support Provider.


Resolving Locked Device Errors

To resolve a locked device error, try one or more of the following:

If you have tried all of the above without success, contact a Novell Support Provider or the drive manufacturer.


Resolving Event Control Block Allocation Errors

Event control block allocation system messages can occur when you first start the server or after the server has been running for some time.

These messages indicate that the server was unable to acquire sufficient packet receive buffers, usually called event control blocks (ECBs). Running out of ECBs is not a fatal condition. However, it can indicate either a LAN or server problem.

Servers that run for several days where high loads occur in peaks might exceed the set maximum number of ECBs, causing the system to generate ECB system messages.

If these situations are caused by occasional peaks in the memory demand, you should probably maintain your current maximum ECB allocation and allow the message to be generated at those times.

Otherwise, if your server memory load is very high and you receive frequent ECB allocation errors, try setting your maximum ECB allocation higher. Use the following SET command in the STARTUP.NCF file:

SET MAXIMUM PACKET RECEIVE BUFFERS=number

Memory allocated for ECBs cannot be used for other purposes.

The minimum number of buffers available for the server can also be set in the STARTUP.NCF file with the following command:

SET MINIMUM PACKET RECEIVE BUFFERS=number


Resolving Server Console Command Problems

To diagnose server console command problems, identify whether the following conditions exist:

To resolve server console command problems, do the following:


Resolving Keyboard Locking Problems When Copying Files from CD-ROM

To diagnose keyboard locking problems when copying files from CD-ROM, identify whether the following conditions exist.

If you have a CD-ROM device that shares a SCSI bus with a disk subsystem containing volumes that network operating system installation files are copied to (typically volume SYS:), your keyboard might lock while loading drivers or copying files to the volume. The following figure shows possible configuration conflicts.


Possible SCSI channel conflicts during a NetWare installation

Remove the CD-ROM device drivers that you used to set up the CD-ROM drive as a DOS device from your CONFIG.SYS file. This will avoid possible conflicts when the Operating System CD is mounted as a NetWare volume.

To resolve keyboard locking problems when copying files from the CD, use the following procedure:

  1. Press Alt+Esc until you are at the console prompt.

  2. Enter DOWN.

  3. Using a text editor, remove the CD-ROM device drivers from your CONFIG.SYS file.

  4. Save the updated CONFIG.SYS file.

  5. Using a text editor, remove any references to the CD-ROM drivers from your AUTOEXEC.BAT file.

  6. Save the updated AUTOEXEC.BAT file.

  7. Reboot the server by pressing Ctrl+Alt+Del.

  8. (Conditional) If the server doesn't boot automatically from the AUTOEXEC.BAT file, change to the subdirectory where the SERVER.EXE file and other boot files are located (the default is C:\NWSERVER), and enter the following at the DOS prompt:

    SERVER

  9. (Conditional) If you are using ASPI device drivers (for example, for an Adaptec* controller), you need to enter one of the following commands:

    AHAxxxx

    where xxxx specifies the Adaptec board number

    or

    ASPICD

    or

    CDNASPI

  10. At the console prompt, enter NWPA.

  11. (Optional) At the console prompt, enter CD DEVICE LIST.

    A list appears with numbers associated with all the devices on your network. Determine which number is the volume number.

  12. At the console prompt, enter

    CD9660.NSS

    CD MOUNT volume_name|number

  13. At the console prompt, enter NWCONFIG.



Previous | Next