Proxmox - Implement High Availability Clustering

Posted Sep 30, 2018

By harryvasanth

2 min read

Intro

High Availability (HA) clustering in Proxmox ensures that virtual machines (VMs) and containers remain operational even if a node in the cluster fails. This guide covers advanced concepts and step-by-step instructions for setting up a Proxmox HA cluster with shared storage, fencing, and testing failover scenarios.

Step 1: Prerequisites for HA Clustering

Before setting up an HA cluster, ensure the following:

Minimum Three Nodes: A minimum of three nodes is recommended to maintain quorum and avoid split-brain scenarios.
Shared Storage: Use NFS, iSCSI, or Ceph for shared storage to allow VMs to migrate between nodes seamlessly.
Redundant Network: Ensure at least two bonded network interfaces for cluster communication and storage traffic.
Fencing Devices: Fencing is mandatory to isolate failed nodes and prevent data corruption.

Example Command to Update Nodes:

apt-get update && apt-get dist-upgrade -y

Step 2: Creating a Proxmox Cluster

Step 2.1: Create the Cluster

On the first node, create the cluster:

pvecm create my-cluster

Step 2.2: Add Nodes to the Cluster

On additional nodes, join the cluster:

pvecm add <IP_of_first_node>

Verify the cluster status:

pvecm status

Step 3: Configuring Shared Storage

Shared storage is critical for HA functionality. In this example, we use NFS.

Step 3.1: Configure NFS on the Storage Server

On the NFS server:

  
mkdir -p /export/proxmox
chmod 777 /export/proxmox
echo "/export/proxmox *(rw,sync,no_subtree_check)" >> /etc/exports
exportfs -a
systemctl restart nfs-server

Step 3.2: Mount NFS on Proxmox Nodes

On each Proxmox node:

  
mkdir -p /mnt/nfs
echo "<NFS_SERVER_IP>:/export/proxmox /mnt/nfs nfs defaults 0 0" >> /etc/fstab
mount -a

Add the NFS storage in the Proxmox web interface under Datacenter > Storage.

Step 4: Enabling High Availability

Step 4.1: Enable HA for VMs

Navigate to Datacenter > HA > Resources.
Add a VM or container to HA using the web interface or CLI:
1 ha-manager add vm:<VMID>

Step 4.2: Set Resource Priorities

Set priorities for HA resources to determine failover order:

ha-manager set vm:<VMID> --priority <PRIORITY>

Step 5: Testing Failover

Testing ensures that your HA setup works as expected.

Step 5.1: Simulate Node Failure

Power off one of the nodes hosting an HA-enabled VM:

poweroff

Check if the VM migrates automatically to another node:

ha-manager status

Step 5.2: Monitor Quorum

Ensure quorum is maintained:

pvecm status | grep Quorum

Step 6: Advanced Configuration

6.1 Fencing Setup

Fencing isolates failed nodes to prevent data corruption.

Example Fencing Command:

fence_node <NODE_NAME>

6.2 Cluster Maintenance

Before rebooting a node, stop the HA manager:

systemctl stop pve-ha-lrm pve-ha-crm

After reboot, restart services:

systemctl start pve-ha-lrm pve-ha-crm

Step 7: Monitoring and Troubleshooting

Use these commands for monitoring and troubleshooting:

Check cluster status:
1 pvecm status
View HA resource status:
1 ha-manager status

Debug logs:

  
journalctl -u pve-ha-lrm -u pve-ha-crm -f

Conclusion

Proxmox High Availability Clustering ensures minimal downtime for critical workloads by automatically migrating VMs during node failures. Regularly monitor your setup and test failover scenarios to ensure reliability in production environments.

proxmox

This post is licensed under CC BY 4.0 by the author.