B — Understanding Client Failover


[Previous] [Next] [Table of Contents] [Index]


If a client's preferred CDS or security server is unavailable, clients will automatically locate (fail over to) an available replica. This section provides performance expectations for client failover in a standard Windows NT 4.0 environment, and describes how other environmental factors influence the speed and success of client failover.

This section describes:

B.1 Situations that Trigger Failover
B.2 Requirements for Successful Failover
B.3 Failover Test Environment
B.4 Failover Test Results
B.5 Factors that Affect Failover
B.6 Application Server Failover
B.7 Responding to Loss of Service

B.1 Situations that Trigger Failover

If a CDS or security server is unavailable when one of the following actions occurs, the client fails over to a backup server:

B.1.1 Situations That Do Not Trigger Failover

This section discusses situations in which failover may seem to occur, but in fact is not really occurring. Understanding PC-DCE behavior in these situations will help you develop good failover strategies and accurate expectations of failover times in actual failover situations.

B.1.1.1 Full Client Startup and the CDS Server

When the CDS server is unavailable, full client startup does not represent a true failover scenario. Upon startup, a full client writes to the master CDS server to provide its location. If the client cannot contact the master CDS server, it will try to use the backup CDS server. However, it does not truly fail over to the backup CDS server.

Full client startup writes to CDS only if the full client configuration is different from the previous full client configuration (such as when a DHCP server assigns it a different IP address).

Rather, the client continues trying to contact the master CDS server, and has partial functionality during this time. If the client's IP address is unchanged (this may not be the case if you are running DHCP), the client should still be able to perform most operations, such as status-type operations like dcecp server ping. However, the client will be unable to run applications that need to write to the CDS namespace.

B.1.1.2 Interaction Between Application and Servers

If you are using a DCE application when the CDS server goes down, this is not a failover scenario. If an application session is in progress, the client has already obtained the application server's bindings from the CDS server. At this point, if the CDS server fails, the application session can still continue.

If you are using an application over the network when the security server goes down, you can continue to use the application while your credentials are good (by default, credentials are good for two hours at a time).

If the master security server is unavailable, failover does occur halfway through the credentials expiration interval, when the client contacts the security server to refresh the credentials. This process occurs automatically and is invisible to the user, unless a backup security server cannot be found.

Once your credentials expire, any process for which you require tickets will fail. When a client makes a request of an application server, the application server contacts the security server to see if the client's credentials are sufficient to fulfill the request. This is also a failover scenario.

B.2 Requirements for Successful Failover

To prepare for successful client failover, you must ensure that replicas are maintained for security servers and CDS servers (see Section 5.4 on page 69 for information about creating replicas).

By default, a backup CDS server replicates only the root directory; you should change the default replica configuration to maintain all of the additional directories that will be needed if the master is unavailable. Once you create directories on your backup server for all directories on the master, the backup directories are automatically synchronized on a periodic basis (the default is once per hour).

See Section 5.5 on page 70 for instructions on changing the default CDS replica configuration.

B.3 Failover Test Environment

Review the information in this section for general expectations about failover times. Then, refer to Section B.5 on page 126 to see if failover in your environment is subject to any additional factors.

The failover statistics in this section were obtained under the following test conditions:

The information in the CDS cache and in the pe_site file has a major impact on whether failover is required for PC-DCE operations, and on the speed of failover (see Section B.5.1 on page 127 and Section B.5.2 on page 127). The tests in this section indicate failover readings when these files either do not exist or are not current.

B.4 Failover Test Results

The following sections include test statistics for the failover conditions described in Section B.3:

B.4.1 Failover when the CDS or Security Server is Not Running
B.4.2 Failover when the Server System is Unreachable

B.4.1 Failover when the CDS or Security Server is Not Running

These tests apply to scenarios in which the preferred CDS server or master security server is not running, but the system that houses them is up and running.

In either a TCP or UDP environment, failover to the backup servers for the following scenarios is immediate (within a few seconds).

Rapid failover occurs because the Windows 2000 or Windows NT endpoint mapper on the master server system immediately informs the client that the server is unavailable. The client does not need to wait for the protocol timeout period before contacting the backup server system.

B.4.2 Failover when the Server System is Unreachable

These tests apply to scenarios in which the client is unable to contact the system that houses the preferred CDS and master security servers (for example, the system is disconnected from the network or has been powered off). Under these circumstances, the endpoint mapper is not running, so failover is not immediate.

B.4.2.1 Client Performs a CDS Lookup

If the preferred CDS server is unreachable when the client needs to perform a lookup (for example, a dcecp cell show command or an attempt to locate an application server):

B.4.2.2 Client Logs into DCE

Upon login, the client runtime uses CDS to locate the security server, then refers to the pe_site file, unless you've configured the environment to use pe_site exclusively. See Section B.5.2.

If you are not using the pe_site file exclusively, failover to log into the backup security server takes:

B.5 Factors that Affect Failover

Environmental factors include:

B.5.1 Cache Contents
B.5.2 PE_Site File Use
B.5.3 Endpoint Mappers
B.5.4 Registry Keys
B.5.5 Replicas Across a WAN Link

Client failover for application servers is affected by a different set of factors, and is discussed separately in Section B.6 on page 128.

B.5.1 Cache Contents

CDS lookups are affected by whether or not the client has already stored an application server's bindings in its cache. For example, if an application session is in progress, the client has already obtained the application server's bindings from the CDS server. At this point, if the CDS server fails, the application session can still continue.

B.5.2 PE_Site File Use

The pe_site file is a list of security servers and their associated bindings. By default, the client runtime looks up bindings for a security server by using CDS. If the preferred CDS server is unavailable, the runtime will look up the location of a security server in the pe_site file.

If the pe_site file is up-to-date, a dce_login will succeed immediately although the security server may be down, because the information will be obtained from the pe_site file without needing to contact the security server.

You can force the runtime to use pe_site exclusively, rather than CDS. If you do so, an unavailable CDS server will never be an issue for client login, because the client runtime will never contact CDS. It will use the security server bindings already stored in pe_site. This could save time in the event that a preferred CDS server is unavailable (as long as the pe_site file is accurate).

Keep in mind that, although the dce_update process monitors server status and sorts available servers to the top of the pe_site list, failover will take longer when pe_site includes multiple servers, and the first available server is not towards the top of the list.

Refer to the PC-DCE Overview Guide for more information about the pe_site file.

B.5.3 Endpoint Mappers

The presence of an operating system endpoint mapping service in addition to PC-DCE's endpoint mapping service has a major impact on the speed of client failover. When an operating system endpoint mapping service is in use on the server system, the client contacts the server and the endpoint mapper immediately responds that the server is down. This allows the client to quickly fail over to the replica server.

Windows 2000 and Windows NT 4.0 include their own endpoint mapper, so if you are running either of these sytems you can expect a faster failover than if you are running some of the UNIX operating systems, such as AIX or Solaris.

When no endpoint mapping service is in use, and in the event that PC-DCE is down, the protocol timeout period must pass before the client knows it must move on to a replica. For TCP, the timeout period is two minutes; for UDP, the timeout period is 45 seconds.

B.5.4 Registry Keys

Modifications that you make to registry keys can affect client failover time. For example, you can:

For information about modifying registry keys, see Appendix A on page 117.

B.5.5 Replicas Across a WAN Link

Failover to a replica located across a WAN link is subject to the additional delays that may be incurred by the WAN link (for example, the link may have slower response times or bottleneck conditions).

B.6 Application Server Failover

In order for failover to occur for application servers, you must ensure that more than one application server is available to the client.

B.7 Responding to Loss of Service

If a primary server is permanently unavailable, then you must take steps to create a new primary server. DCE does not automatically create a new primary server. Section 5.5.4.3 describes how to reconfigure a CDS backup server as a primary server, and Section 5.6.1 describes how to promote a backup security server to master server.

If a master security server goes down, promote the backup server to master as soon as possible. It is not possible to write to backup security servers, so processes such as changing passwords can not occur until the server is promoted to master or the original master comes back up.

If your credentials expire before promoting the backup to master, when the original master comes back online you may not be able to log in to the new master to return it to backup status.





[Previous] [Next] [Contents] [Index]


To make comments or ask for help, contact support@entegrity.com.

Portions of this document were derived from materials provided by Compaq Computer Corporation. Copyright © 1998-2003 Compaq Computer Corporation.

Copyright © 2003 Entegrity Solutions Corporation & its subsidiaries.

All rights reserved.