Checkpoint CPU optimizations
We recently upgraded some of our WAN link bandwidth capacity from 1Gbps to 10Gbps to decrease transfer rates of backups across our two data centers. Traffic between each site encrypts with Checkpoint physical open appliances. The upgrade to the WAN links involved installing 10Gbps Intel NICs in our Checkpoint open servers. Once all the pieces were in place, I started to test everything using iperf3.
My initial iperf3 TCP results showed a maximum capacity of around 650Mbps. Something seemed to be limiting my ability to push more traffic across data centers. I started looking at the primary site Checkpoint VPN open server gateway. Using top in expert mode, I found that none of the CPU’s were maxed out. However, when I ran tests, one core climbed to 85%. I noted it and moved to the next device in the link. At the other site, the open server was maxing out one CPU core at 100%. Both open servers were running on different hardware with different CPU clock speed, so this made sense that the lower clock speed open server was maxed out. What I didn’t understand is why the other CPU cores weren’t utilized.
The Checkpoint expert command of “fw ctl affinity -l” showed that each NIC was assigned a single core along with multiple firewall kernel instances mapped to individual cores. A top session displayed that the ksoftirqd service was maxing out a single CPU core that was assigned to the 10G NIC interface. After reading through Checkpoint’s performance tuning administration guide, I discovered that I would have to enable multi-queue to assign a NIC multiple CPU cores. The “cpmq get -a” command in expert mode verified that my Intel 10G NIC supported multi-queue and the ability to assign up to 16 rx queues to a NIC. I had eight cores available on each open server. I unassigned a kernel fw instance from a CPU core to free it up for the 10G NIC. I then enabled multi-queue on the interface and assigned two NIC queues to two CPU cores. The reconfiguration required two reboots on the gateway. The first to remove the fw kernel instance from one core and another to enable multi-queue.
Once everything was setup, I was able to push 1Gbps using iperf3 across the VPN tunnel. Top showed that the NIC queues were utilizing two cores. Now I have to upgrade our esxi host lag as I’m running into a limit of a 1Gbps link within the lag. If I run into any Checkpoint CPU utilization issues, I can always create more NIC queues.
Running the cpmq get -v command now shows the following:
Active ixgbe interfaces:
The rx_num for ixgbe is: 2 (default)
multi-queue affinity for ixgbe interfaces:
irq | cpu | queue
210 0 TxRx-0
218 2 TxRx-1
Running fw ctl affinity -l now shows the following:
eth3: CPU 0
fw_0: CPU 5
fw_1: CPU 7
fw_2: CPU 3
fw_3: CPU 1
fw_4: CPU 4
fw_5: CPU 6
Interface eth4: has multi queue enabled