Overview
The platform features redundancy for servers and data acquisition to deploy fault-tolerant systems and synchronization tools for solution consistency. It details configuring server redundancy within the software platform, setting up data redundancy, adjusting server parameters, and managing solutions in redundant setups. This page covers handling online and offline changes, hot starts, runtime modifications, and configuring clients based on redundancy, ensuring smooth redundancy management across clients.
Our platform has built-in features to deploy redundant systems, both at the server level and at the data acquisition level, allowing for the deployment of fault-tolerant systems in an easy and reliable process. Solution synchronization is a concern when using redundancy, but the platform has tools to simplify it by facilitating effective collaborative engineering.
On this page:
Server Redundancy Types
There are two basic types of server redundancy approaches, “Hot-Hot” and “Hot-Standby”.
In most cases, the “Hot-Standby” is the preferred way to implement fault-tolerant systems.
Hot-Hot Redundancy
For a Hot-Hot scenario, two servers and all server modules are running at all times. A custom application (usually the TRemoteClient driver) handles the synchronization of the operator inputs between the servers. When the application is feeding remote SQL databases or exchanging data with third-party systems, custom code in the application must ensure that data is not written twice.
The Hot-Hot configuration is used when the process does not allow any switching time from the primary to the secondary server. Therefore, there is no concept of an “active server.” Both servers are fully active at all times. The remote client's displays can access the solution from either server. To create Hot-Hot scenarios, you need to have two servers running, utilize the TRemoteClient to synchronize any necessary data between the servers, and employ scripts to enable or disable parts of the running application as required by the duplicated operation.
The disadvantage of Hot-Hot implementations is that they require custom solution engineering, especially when integrating with other applications and when data acquisition from field devices is duplicated, which causes extra load on the network of devices and potentially results in differences in timestamps among each station.
How to manage two operators at each station trying to give commands at the same time, as well as the integration of remote SQL and file systems to avoid data duplication, must be decided at the application level.
For these reasons, the Hot-Standby configuration is the preferred option when implementing fault-tolerant servers. The Hot-Hot scenario is mainly applied when the server activation time of the hot-standby, usually from 1 second to a few seconds, is not acceptable for the process. Otherwise, Hot-Standby has a simpler configuration, requires no custom engineering, and automatically guarantees data consistency among the servers.
Hot-Standby Redundancy
In this scenario, there are two servers that have the solution loaded, but only one server (the ACTIVE one) is running all the modules. The STANDBY server has the modules in a PAUSED state and is receiving data synchronization from the ACTIVE computer to update its local memory but is not executing the tasks.
Redundancy is automatically implemented by the platform, using a simple configuration dialog. There is no need for custom programming or custom applications. Any solution created as a standalone solution can be deployed as a fault-tolerant pair with no engineering required, even when connected to third-party systems.
The communication between the servers uses WCF TCP/IP communication, which can be encrypted and utilizes any WCF feature. The data exchange uses a Tatsoft algorithm developed from years of field experience, which synchronizes the data using a publisher-subscriber model that sends data only by exception (events and data changes) in an efficient and reliable way.
Most importantly, the redundancy is not an external module; it is “redundancy-to-the-core.” Our platform was designed from the ground up to have built-in, kernel-level support for fault-tolerant applications.
When the STANDBY server detects that the ACTIVE computer is down, it changes the modules from the PAUSE state to the RUN state, starting with the real-time database's states and values according to how they have been received from the synchronization.
The detection of the Active Server is usually done by using a watchdog message on the same TCP/IP channel that is being used to exchange synchronization data, but custom switch determinations can be added if needed. The “Connection Timeout” parameter in the redundancy configuration defines the inactive time of the Active computer that triggers the standby computer to go live; the typical value is from one to ten seconds, depending on the network and servers.
Redundancy Configuration
The Hot-Standby redundancy is configured in the Runtime → Startup → Redundancy Enabled dialog.
Server Redundancy Configuration
The key parameters to configure are:
ConnectionTimeout: Watch-dog time in seconds. This is the time the standby computer uses to monitor the activity of the active computer while doing the automatic switch. The Switch between active and standby can also be by command. In this case, the switch is immediate. Only the time to change the modules from Pause to Run on the standby computer is needed. The time ranges typically from one second to five seconds but is very dependent on specific Solution configuration, the server computers, and the network.
ConnectionRetry: Number of connection retries, default is one.
Primary and Secondary IP/Computer names: The computer IP address or DNS names of the computers that will work together as a fault-tolerant pair. They will run the exact same Solution configuration, and either one can be ACTIVE or STANDBY at any given time. The only differences between the one designed as the Primary and the one designed as the Secondary are:
- Primary precedence: If you start two servers at the same time and a conflict arises, the computer will have the role of ACTIVE. If both are on, the role is attributed to the Primary computer.
- On Primary Startup: When the Secondary is the ACTIVE server and the Primary computer is down, the Primary computer starts. You can specify if the Secondary will stay as the ACTIVE server, or if the Primary server will take over the ACTIVE state after it finishes its initialization, which will automatically change the Secondary computer to the STANDBY role. The configuration is using the “On Primary Startup” option.
Based on these settings, a custom command line for the TStartup.exe is created and presented on the configuration line. When you are deploying redundant configuration, you need to use the command line on Windows AutoStart or on the Windows Service configuration if you are running the platform as a Windows Service.
You can configure application redundancy by configuring two computers to be used as servers. One computer will be the primary server and the other will be the secondary or hot standby. If the primary computer or the connection to the computer fails, the system automatically fails over to the secondary computer.
If you selected HMI as the Product Family, the redundancy configuration is not available.
To configure redundancy:
- Go to Runtime → Startup → Redundancy.
- Enter or select the information, as needed.
Field | Description |
---|---|
Enable Configuration | Select to enable the redundancy configuration. |
Primary Server IP and Port | Enter the IP address and port of the primary server. |
Secondary Server IP and Port | Enter the IP address and port of the secondary server. |
On Primary Startup | Select the option you want. |
Replication | Select how to handle historian replication. AlarmHistorian, TagHistorian and Retentive. |
Connection Timeout | Connection timeout time, in seconds. If reached, this will cause the system to switch to the secondary server. |
Startup Command Line | Read-only field populated based on the fields above. Click Copy to Clipboard to copy the command for use. |
Rich Client Command | Read-only field populated based on the fields above. Click Copy to copy the command for use. |
Smart Client URL | Read-only field populated based on the fields above. Click Copy to copy the command for use. |
HTML5 Client URL | Read-only field populated based on the fields above. Click Copy to copy the command for use. |
Data Redundancy Configuration
When running fault-tolerant applications, the Alarm Archiving server and the Historian Archiving server can be located on the same computer as the platform's servers or on a third computer dedicated to archiving, such as a Microsoft SQL Server or Oracle server. The “Historian Replication” configuration provides support for many scenarios.
When the Historian is on a remote database machine, there is no need to enable replication on the platform. Since only one computer is ACTIVE, either the Primary or Secondary will write to the external database. In this case, you define “no replication” for the configuration.
Note that the external database can still be a fault-tolerant cluster that uses the database redundancy tools, but the cluster is viewed by the Solution as just one external connection.
If you are running either the Alarm database or the HistorianTags database on the same computer as the platform's servers, you need to enable the respective replication option.
Server Parameters
When using redundancy, there is a set of parameters that can be assigned to both TStartup.exe and TServer.exe to specify its behavior. The command line is automatically created on Info → Solution → Redundancy, but you can customize the command lines directly as needed. These are the parameters used:
- /ip1: <Primary Server Name or IP>
- /ip2: <Secondary Server Name or IP>
- /Port1: <Port number of primary, default is 3101>
- /Port2: <port number for secondary, default is 3101>
- /Solution: <full path of the Solution file>
- /connectionTimeout: <watch-dog timeout in seconds, accepts decimal points>
- /username: <startup user>
- /redundancy (has no parameters, just need to be included to enable redundancy)
- /autoswitch (has no parameters. if included, the Primary takes over as the Active node if the secondary was acting as Active)
- /TimeAutoSwitch: <number of seconds the Primary waits before becoming active if the autoswitch option is enabled. Usually set to 60 seconds.
- /SolutionIPPath: <IP>;<Path of the Solution on the remote server>
- The SolutionIPPath is used by the system to allow one station to automatically update the Solution in the redundant pair when doing online Solution changes and HotStart commands. Example: /SolutionIPPath:192.168.0.1;C:\FactoryStudio\Solutions\test.tproj
The TimeAutoSwitch time is connected when you are using the /autoswitch option. In this scenario, when the computer designated as the Primary starts, it will "auto switch" from standby to active after starting. It is important that the switch happens only after the process has had time to receive all the synchronization from the active computer. Usually, 60 seconds should be enough for that, but you should increase that setting for large Solutions or slow networks.
Solution Modification on Redundant System
The centralized Solution configuration makes it easy to keep a Solution synchronized on both servers. All the Solution settings are only in one of two files: the .tproj or .trun file. You just need to make sure the file is the same on both computers when deploying your Solution.
Note that the switch depends on many factors: if there are pending operations, the Solution size, the computers, and the network. The total switch time is typically measured in seconds, but it is necessary to conduct a test on your specific scenario to specify the right parameters for the connection timeout and retry parameters.
You can set up your own procedure to synchronize the two files, or use the platform's automated methods for hot-standby configuration.
Online changes
Option 1: Local IPC acting as the Primary Server
- When you open the Engineering application, use the configuration tools and connect to a Server running a Solution with the Online configuration checkbox enabled. The Server can be on the same computer with the configuration tools or on a remote computer.
- In this case, every change you make to the Solution is applied immediately to the running application.
- The runtime property @Server.UpdateSolutionOnInactiveServer will propagate online changes on the active server to the backup server.
Option 2: For online changes, you should use the dbsln file instead of dbrun and do the following steps:
- Connect the Engineering tools on the STANDBY computer.
- Do the modifications online, or click the Hot Start button to apply the previously made changes.
- In the application (in an administrator display), execute the command Server.SwitchToStandby(), so the computer with the new Solution will be active.
- Finally, execute a trigger on property Server.UpdateSolutionOnInactiveServer that will apply the changes to the other computer.
- If you want to return the Active state to the original computer, just run the Server.SwitchToStandby() method again
Offline changes
The command line parameter /SolutionIPPath option allows you to define a verification path. When the application starts, it verifies the Solution configuration against the reference provided on the remote computer.
For off-line changes:
- Make your Solution changes on another computer, and create the solution file to be installed for production use.
- Stop the runtime in the Standby computer, copy the solution file, and start the runtime again.
- Switch the Active to the server that is updated, and stop the runtime in the other system.
- Either use the \SolutionIPPath parameter to enable the Standby computer to get the configuration from the server automatically when starting, or copy the file to the second computer
- Start the runtime
Hot Start
You can make changes to a Solution when it is running even if you are not connected to it. When you make changes to a Solution and apply all the new changes at once without stopping the application, it is called "hot-swapping". To do this: After you make your Solution changes, connect to the server, and click the Hot Start button in the Runtime → Diagnostics page.
Runtime changes
It is possible to update Solutions without using engineering tools. There is a method @Server.LoadSolutionVersion(<Solution>), which allows dynamic loading of new Solution configuration. This command can be included with the application itself or inside an external .NET application connected to the server.
Client configuration based on Redundancy scenario
The platform's clients are easy to configure on any redundancy scenario. There is no programming or advanced configuration required. You just need to start the client station with the right parameter, as showed on Runtime → Startup. Example: TRichClient.exe /ip1:192.168.1.1 /ip2:192.168.1.2 /connectiontimeout:5
Clients do not even need to install any Solution files on their machines. Client computers will get all the Solution information from the server. The RichClient only needs software platform on the client computer. When using the SmartClient, the only thing needed is Internet Explorer installed on the client computer.
This is how it works: When the client starts (either the RichClient or SmartClient), it looks for the ip1 computer to connect. If it is not found, the client switches to the ip2 computer. If the client loses connection with the server computer, it will try automatically to switch to the redundant pair without stopping the operator operations.
Redundancy Interaction over Client
Clients can visualize critical redundancy information and even make a server switch using runtime properties and methods.
Switching the active server
There is a method included in the platform: @Server.SwitchToStandby()
That method will force the Active server to handle control of the Standby server; if the standby server is not running, the command fails, and the execution is kept on the same computer. If you want to use this feature, you need to create a protected display or button in an application screen.
Visualizing redundancy status
The following are the most used properties:
- @Server.IsPrimary // True, if the active server is the Primary
- @Server.IsSecondary //True, if the active server is the Secondary
- @Server.IsStandByActive //True, if the standby computer is up and running
- @Server.IsSwitchToPrimaryEnabled //True, when configuration is to the Primary is always Active
- @Server.RedundancyPendingObjects //Number of objects pending synchronization
- @Server.UpdateSolutionOnInactiveServer //Allow Online changes/HotStart to replicate to the standby computer
- @Server.SwitchToStandby() // Request to the active server to switch to standby
In this section: