This topic describes how to deploy and implement a cluster of Conjur servers to provide high availability and cloud-friendly, global distribution with low latency. A Conjur cluster is configured in a master-standby-follower architecture.
A cluster contains the following components:
- One active master.
- One or more standby masters, with at least two required if you want to implement an auto-failover cluster. A standby is a replicated master, ready to take over as master.
- One or more followers, with at least two recommended. Followers are also replications of the master, configured to service authentication, authorization, and read requests.Typical deployments use load balancers to handle traffic to the followers.
The standbys continuously replicate the Conjur database from the active master, using PostgreSQL streaming replication.
Synchronous replication ensures a completely up-to-date standby at any time. Asynchronous replication may lag behind the master. A healthy cluster has at least one standby configured as synchronous.
To enable synchronous replication, there must be at least 2 standbys in the cluster.
Most traffic to Conjur is read traffic. Followers are horizontally scaling components that are typically configured behind a load balancer to handle all types of read requests, including authentication, permission checks, and secret fetches. Write operations requested on a follower are delegated to the master.
Master-to-follower replication is asynchronous. Recommended practice connects followers to the master through a load balancer. This avoids having to reconfigure the followers whenever a standby becomes a master.
Benefits of high availability
Replication is the heart of Conjur high availability to ensure that a complete copy of the Conjur data is present on each machine. Specifically, data is transferred from the active master to the standbys and followers using Postgresql streaming replication. Streaming replication uses a continuous connection between the 'upstream' master and each 'downstream' standby or follower. In the absence of any network problems, transfer from the master to the downstreams is virtually instantaneous.
Standbys may optionally be designated as 'synchronous'. When this feature is enabled, Postgresql automatically selects exactly one standby as the synchronous master. In this situation, Postgresql ensures that each 'write' transaction on the active master is also committed to the synchronous standby before the response is returned to the client. This configuration is called '2-safe', because each transaction is always safely written to at least 2 machines.
Besides data durability, the advantage of a synchronous standby is that the synchronous standby never lags behind the active master. Therefore, no data is ever lost by a failure of the active master. The disadvantage of having a synchronous standby is that a failure of either the active master or the synchronous standby causes client 'write' operations to block, and eventually to timeout. Therefore, in synchronous configuration there must always be at least 2 standby masters, so that if one standby fails, the other one will take over the synchronous role.
If there is an outage in the network, or if an unusually large amount of data needs to be replication, then it's possible for asynchronous standbys and followers to 'lag' behind the master. This means that there is a delay between the transactions committing on the master and appearing on the downstream server. The amount of replication delay, if any, can be determined by examining the health check on the master and the standby or follower. Replication delay is also indicated on the Conjur UI Cluster Health page.
The master and all followers generate audit records that capture all activity. A follower forwards its audit events to the master.
All traffic within the cluster is secured by verified TLS (HTTPS, LDAPS, or PostgreSQL with Mutual TLS) - the Mutual TLS for replication is key as it allows ports to be open across data centers without needing to worry about DDoS attacks. Each Conjur server has an SSL certificate which is issued by a common certificate authority (CA). On setup, self signed certificates are generated and configured. These certificates can be swapped for those generated by your organization.
To create a new follower or standby, a seed file of information from the master is required. An authorized administrator generates the seed file on the master, copies it, and unpacks it on the new server. The seed file contains sensitive information, including configuration settings, SSL certificates and private keys, and data encryption keys. Be sure to restrict access to seed files and protect the information.
The server data keys and SSL private keys can be encrypted using a master key. When the server keys are encrypted, no plaintext keys are stored on the server hard disk or included in the seed files. Conjur supports Hardware Security Module (HSM) and Amazon Key Management Service (KMS) integrations for master key encryption.
We recommend that client machines be configured to direct their requests to a load balancer that sits in front of the Conjur followers. The load balancer can use the Conjur built-in health check to route traffic to healthy machines. Use health-checking DNS in front of the load balancer(s) for even better locality and availability.
For best practices on load balancing web services, contact your load balancer vendor.
The evoke utility
evoke command line utility is used for configuring, backing up, and restoring Conjur Enterprise servers.
The utility is installed on every Conjur server. For more information, see Tools > evoke utility.
- High availability cluster setup and management:
- configure an unconfigured Conjur instance as a master, standby, or follower.
- generate seeds for the creation of a follower or standby from a Conjur master
- create an auto-failover cluster
- create backups and restore a Conjur server from a backup
- Master key management:
- lock (encrypt) and unlock (decrypt) server keys
In this section: