B — Understanding Client Failover

[Previous] [Next] [Table of Contents] [Index]

If a client's preferred CDS or security server is unavailable, clients will automatically locate (fail over to) an available replica. This section provides performance expectations for client failover in a standard Windows NT 4.0 environment, and describes how other environmental factors influence the speed and success of client failover.

This section describes:

B.1 Situations that Trigger Failover
B.2 Requirements for Successful Failover
B.3 Failover Test Environment
B.4 Failover Test Results
B.5 Factors that Affect Failover
B.6 Application Server Failover
B.7 Responding to Loss of Service

B.1 Situations that Trigger Failover

If a CDS or security server is unavailable when one of the following actions occurs, the client fails over to a backup server:

Client logs into DCE
Client credentials expire and client requests refreshed credentials
Client makes a request of an application server, and the application server contacts the security server to verify the client's credentials
Client performs a directory service lookup

B.2 Requirements for Successful Failover

To prepare for successful client failover, you must ensure that replicas are maintained for security servers and CDS servers (see Section 5.4 on page 69 for information about creating replicas).

By default, a backup CDS server replicates only the root directory; you should change the default replica configuration to maintain all of the additional directories that will be needed if the master is unavailable. Once you create directories on your backup server for all directories on the master, the backup directories are automatically synchronized on a periodic basis (the default is once per hour).

See Section 5.5 on page 70 for instructions on changing the default CDS replica configuration.

B.3 Failover Test Environment

Review the information in this section for general expectations about failover times. Then, refer to Section B.5 on page 126 to see if failover in your environment is subject to any additional factors.

The failover statistics in this section were obtained under the following test conditions:

Windows NT 4.0 environment with Service Pack 3.
Servers and replicas local to the network.
One system housing the preferred CDS server and master security server, and another system housing the replica CDS and security servers.
Registry keys set to their defaults.
RPC_SUPPORTED_PROTSEQS environment variable used to set the environment to TCP-only for TCP readings and UDP-only for UDP readings.

The information in the CDS cache and in the pe_site file has a major impact on whether failover is required for PC-DCE operations, and on the speed of failover (see Section B.5.1 on page 127 and Section B.5.2 on page 127). The tests in this section indicate failover readings when these files either do not exist or are not current.

B.4 Failover Test Results

The following sections include test statistics for the failover conditions described in Section B.3:

B.4.1 Failover when the CDS or Security Server is Not Running
B.4.2 Failover when the Server System is Unreachable

B.4.1 Failover when the CDS or Security Server is Not Running

These tests apply to scenarios in which the preferred CDS server or master security server is not running, but the system that houses them is up and running.

In either a TCP or UDP environment, failover to the backup servers for the following scenarios is immediate (within a few seconds).

Client dce_login
CDS lookup
Refresh of client tickets (credentials) during an application session

Rapid failover occurs because the Windows 2000 or Windows NT endpoint mapper on the master server system immediately informs the client that the server is unavailable. The client does not need to wait for the protocol timeout period before contacting the backup server system.

B.4.2 Failover when the Server System is Unreachable

These tests apply to scenarios in which the client is unable to contact the system that houses the preferred CDS and master security servers (for example, the system is disconnected from the network or has been powered off). Under these circumstances, the endpoint mapper is not running, so failover is not immediate.

B.4.2.1 Client Performs a CDS Lookup

If the preferred CDS server is unreachable when the client needs to perform a lookup (for example, a dcecp cell show command or an attempt to locate an application server):

TCP — Failover to the backup CDS server in a TCP environment takes approximately 1 minute 30 seconds.

The client then caches the backup CDS server as master. Should the new master become unreachable, failover to the original master may take up to 5 minutes, depending upon the TCP state of the connection.
UDP — Under UDP, failover to the backup CDS server takes approximately 3 minutes 40 seconds.

B.4.2.2 Client Logs into DCE

Upon login, the client runtime uses CDS to locate the security server, then refers to the pe_site file, unless you've configured the environment to use pe_site exclusively. See Section B.5.2.

If you are not using the pe_site file exclusively, failover to log into the backup security server takes:

TCP — Approximately 2 minutes 15 seconds.
UDP — Approximately 1 minute 13 seconds.

B.5 Factors that Affect Failover

Environmental factors include:

B.5.1 Cache Contents
B.5.2 PE_Site File Use
B.5.3 Endpoint Mappers
B.5.4 Registry Keys
B.5.5 Replicas Across a WAN Link

Client failover for application servers is affected by a different set of factors, and is discussed separately in Section B.6 on page 128.

B.5.1 Cache Contents

CDS lookups are affected by whether or not the client has already stored an application server's bindings in its cache. For example, if an application session is in progress, the client has already obtained the application server's bindings from the CDS server. At this point, if the CDS server fails, the application session can still continue.

B.5.2 PE_Site File Use

The pe_site file is a list of security servers and their associated bindings. By default, the client runtime looks up bindings for a security server by using CDS. If the preferred CDS server is unavailable, the runtime will look up the location of a security server in the pe_site file.

If the pe_site file is up-to-date, a dce_login will succeed immediately although the security server may be down, because the information will be obtained from the pe_site file without needing to contact the security server.

You can force the runtime to use pe_site exclusively, rather than CDS. If you do so, an unavailable CDS server will never be an issue for client login, because the client runtime will never contact CDS. It will use the security server bindings already stored in pe_site. This could save time in the event that a preferred CDS server is unavailable (as long as the pe_site file is accurate).

Keep in mind that, although the dce_update process monitors server status and sorts available servers to the top of the pe_site list, failover will take longer when pe_site includes multiple servers, and the first available server is not towards the top of the list.

Refer to the PC-DCE Overview Guide for more information about the pe_site file.

B.5.3 Endpoint Mappers

The presence of an operating system endpoint mapping service in addition to PC-DCE's endpoint mapping service has a major impact on the speed of client failover. When an operating system endpoint mapping service is in use on the server system, the client contacts the server and the endpoint mapper immediately responds that the server is down. This allows the client to quickly fail over to the replica server.

Windows 2000 and Windows NT 4.0 include their own endpoint mapper, so if you are running either of these sytems you can expect a faster failover than if you are running some of the UNIX operating systems, such as AIX or Solaris.

When no endpoint mapping service is in use, and in the event that PC-DCE is down, the protocol timeout period must pass before the client knows it must move on to a replica. For TCP, the timeout period is two minutes; for UDP, the timeout period is 45 seconds.

B.5.4 Registry Keys

Modifications that you make to registry keys can affect client failover time. For example, you can:

Change the time period that PC-DCE waits for the security server to initialize before failing over to another security server (SecdWaitTimeout).
Change the time period that PC-DCE waits for the CDS server to initialize before failing over to another CDS server (CdsdWaitTimeout).
Increase the frequency at which dce_update solicits and caches CDS clearinghouses and security server information (CDSUpdateInterval, SECUpdateInterval). The more frequent the solicitations, the fresher the cache. However, note that frequent solicitations incur more network traffic.

For information about modifying registry keys, see Appendix A on page 117.

B.5.5 Replicas Across a WAN Link

Failover to a replica located across a WAN link is subject to the additional delays that may be incurred by the WAN link (for example, the link may have slower response times or bottleneck conditions).

B.6 Application Server Failover

In order for failover to occur for application servers, you must ensure that more than one application server is available to the client.

B.7 Responding to Loss of Service

If a primary server is permanently unavailable, then you must take steps to create a new primary server. DCE does not automatically create a new primary server. Section 5.5.4.3 describes how to reconfigure a CDS backup server as a primary server, and Section 5.6.1 describes how to promote a backup security server to master server.

If a master security server goes down, promote the backup server to master as soon as possible. It is not possible to write to backup security servers, so processes such as changing passwords can not occur until the server is promoted to master or the original master comes back up.

If your credentials expire before promoting the backup to master, when the original master comes back online you may not be able to log in to the new master to return it to backup status.

[Previous] [Next] [Contents] [Index]

To make comments or ask for help, contact support@entegrity.com.

B — Understanding Client Failover

B.1 Situations that Trigger Failover

B.1.1 Situations That Do Not Trigger Failover

B.1.1.1 Full Client Startup and the CDS Server

B.1.1.2 Interaction Between Application and Servers

B.2 Requirements for Successful Failover

B.3 Failover Test Environment

B.4 Failover Test Results

B.4.1 Failover when the CDS or Security Server is Not Running

B.4.2 Failover when the Server System is Unreachable

B.4.2.1 Client Performs a CDS Lookup

B.4.2.2 Client Logs into DCE

B.5 Factors that Affect Failover

B.5.1 Cache Contents

B.5.2 PE_Site File Use

B.5.3 Endpoint Mappers

B.5.4 Registry Keys

B.5.5 Replicas Across a WAN Link

B.6 Application Server Failover

B.7 Responding to Loss of Service

Copyright © 2003 Entegrity Solutions Corporation & its subsidiaries.

All rights reserved.