Category Archives: Virtualization

Backup VMware ESXi Host

Backup using vim-cmd

To ensure that the configuration of the target ESXi host is synchronized with persistent storage, run the following command:

vim-cmd hostsvc/firmware/sync_config

To back up ESXi configuration, run this command:

vim-cmd hostsvc/firmware/backup_config

The command will produce a link for downloading the configBundle.tgz archive.

Note that you have to replace the asterisk in the provided link with your IP/FQDN. Alternatively, access the backup file in the /scratch/downloads directory, where it is stored as configBundle-HostFQDN.tgz.

Example output

~ # vim-cmd hostsvc/firmware/backup_config
Bundle can be downloaded at : http://*/downloads/52a0a904-27aa-02ad-a6d9-ad629a51b012/configBundle-ESX2.CORP.LOCAL.tgz

Restore using vim-cmd

Before taking the first step, ensure that the ESXi version, build number, and UUID of the target host match the version, build number, and UUID of the ESXi configuration that needs to be recovered.

Then, connect to the target ESXi host via SSH and put the host into maintenance mode:

esxcli system maintenanceMode set –enable yes

or

vim-cmd hostsvc/maintenance_mode_enter

Use an SCP client to copy the archive with the ESXi configuration (configBundle-xxxx.tgz) to the target ESXi host directory.

Rename the configBundle-xxxx.tgz file to configBundle.tgz:

mv /tmp/configBundle-ESX2.CORP.LOCAL.tgz /tmp/configBundle.tgz

Recover the ESXi configuration:

vim-cmd hostsvc/firmware/restore_config /tmp/configBundle.tgz

The ESXi host will restart automatically.

Exit the maintenance mode:

esxcli system maintenanceMode set –enable no

or

vim-cmd hostsvc/maintenance_mode_exit

Configuration mismatch: The virtual machine cannot be restored because the snapshot was taken with VHV enabled.

Upon migrating a VM from one ESXi host to another ESXi host, I received the following message.

Configuration mismatch: The virtual machine cannot be restored because the snapshot was taken with VHV enabled. To restore, set vhv.enable to true

Check and confirm on both hosts that /etc/vmware/config has vhv.enabled = "TRUE" in the configuration.

This modification on the mismatched host requires a reboot.

hi

Enable nested virtualization for VM in Hyper-V

Nested virtualization is available on Windows 10 build 19636 and later. As far as I understand, AMD support will be officially available as part of Windows 11 and Windows Server 2022. Both products are expected in the second half of 2021.

Enable Nested Virtualization

Get the virtual machine intending to enable nested virtualization on using Powershell.

get-vm

Now with the name of the VM, enable nested virtualization.

Set-VMProcessor -VMName 'Windows 11' -ExposeVirtualizationExtensions $True

Confirm Nested Virtualization is Enabled

To confirm nested virtualization is enabled for the VM, or any VM, use the following Powershell command.

(Get-VM 'proxmox ve 7.2' | Get-VMProcessor ).ExposeVirtualizationExtensions

Case of Dead Path on ESXi

I had 8 paths go down to a dead state on an ESXi host.  The paths were MRU via Fiber Channel to a storage array.  One path worked and it was configured as RR path.
I knew this wasn’t a physical issue, it had to be a software/configuration issue on my host because there were:

  • No storage array errors
  • Additional hosts in the cluster had no problems
  • One path still worked from the HBA

Looking at the log (/var/log/vmkernel.log) I searched for one of the LUN identifiers, in my case “:L30” which was one of the dead paths.  This yielded a result showing an error with NMP plugin driver invalid command.
Next step was to figure out and verify what NMP details were and compare against a working host.

esxcli storage nmp device list |grep "Path Selection Policy:" |sort |uniq -c

I saw nothing out of the ordinary.
Apparently the storage did not like the use of RR so I removed the SATP claim rule though, so I removed it:

esxcli storage nmp satp rule remove -V IBM -M "^1746*" -P VMW_PSP_RR -s VMW_SATP_ALUA

Storage paths are happy now.

A general system error occurred: esxi1: CHAP setting not compatible.

During datastore creation in vSphere using the Nimble vCenter plugin, I get the following error:
A general system error occurred: esxi1: CHAP setting not compatible. hba=vmhba33
In vSphere Client I went to my ESXi host then Configuration, Storage Adapters, iSCSI Software Adapter.  Taking a look at vmhba33 Properties then CHAP… I see that there was a CHAP setting in there for an old Drobo system I had connected at one point.
width=556
Changing the option to Do not use CHAP resolved my issue.  I’m not using CHAP on my lab setup.
width=361

Intel Xeon E7540 replaced with X7560

Recently I’ve been dealing with CPU RDY of about 5% across two ESXi hosts. I’ve load balanced the best I could given necessary workloads and was able to get CPU RDY down to about 3-4% which still isn’t great.

I made the decision to grab a few Intel Xeon X7560 processors to replace the existing Intel Xeon E7540 processors to help alleviate some pressure. The main reason was moving from 6 core E7540 to 8 core X7560.

The specs are pretty close, including same Nehalem family for my EVC needs. The clock speed is a bit faster on the X7560 at 2.27GHz vs the E7540 at 2.0GHz.

I swapped out 4 in total; each ESXi host is currently licensed for 2 socket VMware vSphere Enterprise Plus.
Here’s to hoping this gets me under 2% CPU RDY! I’ll attach a graph next week after I’ve had time to let things run and show a (hopeful) improvement.

Performance tuning iSCSI Round Robin policies in ESXi for Nimble storage

Here’s an ESXi console script to loop through each Nimble eui.* adapter and set IOPS=0 and BYTES=0 (per Nimble recommendations).

for x in `esxcli storage nmp device list | awk '/Nimble iSCSI Disk/{print $7}' | sed -e 's/(//' -e 's/)//'`; do
echo $x
esxcli storage nmp psp roundrobin deviceconfig set -d $x -t bytes -B 0;
esxcli storage nmp psp roundrobin deviceconfig set -d $x -t iops -I 0 ;
esxcli storage nmp psp roundrobin deviceconfig get -d $x;
done

Note: If you change the order above and set bytes after iops, then the policy will be based on bytes and not IOPS.
To reset defaults, use the following script on the ESXi host console:

for x in `esxcli storage nmp device list | awk '/Nimble iSCSI Disk/{print $7}' | sed -e 's/(//' -e 's/)//'`; do
echo $x
esxcli storage nmp psp roundrobin deviceconfig set -d $x -t bytes -B 10485760;
esxcli storage nmp psp roundrobin deviceconfig set -d $x -t iops -I 1000 ;
esxcli storage nmp psp roundrobin deviceconfig set -d $x -t default;
esxcli storage nmp psp roundrobin deviceconfig get -d $x;
done

To make sure this survives a reboot, you can set a policy:

esxcli storage nmp satp rule add --psp=VMW_PSP_RR --satp=VMW_SATP_ALUA --vendor=Nimble --psp-option=policy=iops;iops=0

Note that if you previously configured a user-defined SATP rule for Nimble volumes to simply use the Round Robin PSP (per the Nimble VMware best practices guide), you will first need to remove that simpler rule, before you can add the above rule, or else you will get an error message that a duplicate user-defined rule exists. The command to remove the simpler rule is: –Bill

esxcli storage nmp satp rule remove --psp=VMW_PSP_RR --satp=VMW_SATP_ALUA --vendor=Nimble

IBM System X3850 Disable Processor Power Management

In order to work around the issue processor power management has to be disabled in system UEFI and vSphere Client.
To change power policies using server UEFI settings:

  1. Turn on the server.
    Note: If necessary, connect a keyboard, monitor, and mouse to the console breakout cable and connect the console breakout cable to the compute node.
  2. When the prompt ‘Press <F1> Setup’ is displayed, press F1 and enter UEFI setup. Follow the instructions on the screen.
  3. Select System Settings –> Operating Modes and set it to ‘Custom Mode’ as shown in ‘Custom Mode’ figure, then set UEFI settings as follows:
    Choose Operating Mode <Custom>
    Memory Speed <Max Performance>
    Memory Power Management <Disabled>
    Proc Performance States <Disabled>
    C1 Enhanced Mode <Disabled>
    QPI Link Frequency <Max Performance>
    QPI Link Disable <Enable All Links>
    Turbo Mode <Enable>
    CPU C-States <Disable>
    Power/Performance Bias <Platform Controlled>
    Platform Controlled Type <Maximum Performance>
    Uncore Frequency Scaling <Disable>

  4. Press Escape key 3 times, and Save Settings.
  5. Exit Setup and restart the server so that UEFI changes take effect.

Next, change power policies using the vSphere Client:

  1. Select the host from the inventory and click the Manage tab and then the Settings tab as shown in ‘Power Management view from the vSphere Web Client’ figure.
  2. In the left pane under Hardware, select Power Management.
  3. Click Edit on the right side of the screen.
  4. The Edit Power Policy Settings dialog box appears as shown in ‘Power policy settings’ figure.
  5. Choose ‘High performance’ and confirm selection by pressing ‘OK’ radio button.

My Notes and Benchmarks on VMware Flash Read Cache

I’ve spent some time exploring and studying the use and configuration of VMware Flash Read Cache (vFRC) and its benefits.  These are my notes.
Useful Resources

On a guest virtual machine, vFRC is configured in Disk configuration area.  The virtual machine needs to be on version 10 hardware.  vSphere needs to be minimum version 5.5.
width=592

Benchmarks

I took a baseline benchmark of a simple Windows Server 2016 virtual machine that had a thin provisioned 20GB disk using DskSpd (formerly sqlio).  The virtual machine disk disk is connected to an IBM DS3400 LUN with 4 x 300GB 15k RPM disks in RAID-10.
Baseline Virtual Machine

  • OS: Windows Server 2016
  • vCPU: 1
  • vRAM: 4GB
  • SCSI Controller: LSI Logic SAS
  • Virtual Disk:  40GB thin provisioned
  • Virtual machine hardware:  vmx-10
  • Virtual Flash Read Cache: 0

Some notes before running a test.  This is geared toward SQL workloads and identifies the type of I/O for the different SQL workload.
width=1091
DskSpd test

diskspd.exe -c30G -d300 -r -w0 -t8 -o8 -b8K -h -L E:	estfile.dat

Results of testing.

Command Line: diskspd.exe -c30G -d300 -r -w0 -t8 -o8 -b8K -h -L E:	estfile.dat
Input parameters:
        timespan:   1
        -------------
        duration: 300s
        warm up time: 5s
        cool down time: 0s
        measuring latency
        random seed: 0
        path: 'E:	estfile.dat'
                think time: 0ms
                burst size: 0
                software cache disabled
                hardware write cache disabled, writethrough on
                performing read test
                block size: 8192
                using random I/O (alignment: 8192)
                number of outstanding I/O operations: 8
                thread stride size: 0
                threads per file: 8
                using I/O Completion Ports
                IO priority: normal
Results for timespan 1:
*******************************************************************************
actual test time:       301.18s
thread count:           8
proc count:             1
CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
   0|  99.23%|   7.05%|   92.18%|   0.77%
-------------------------------------------
avg.|  99.23%|   7.05%|   92.18%|   0.77%
Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      7940784128 |       969334 |      25.14 |    3218.43 |    2.471 |    22.884 | E:	estfile.dat (30GB)
     1 |      8152604672 |       995191 |      25.81 |    3304.28 |    2.401 |    22.211 | E:	estfile.dat (30GB)
     2 |      8116256768 |       990754 |      25.70 |    3289.55 |    2.408 |    22.080 | E:	estfile.dat (30GB)
     3 |      8180006912 |       998536 |      25.90 |    3315.38 |    2.394 |    22.936 | E:	estfile.dat (30GB)
     4 |      8192147456 |      1000018 |      25.94 |    3320.30 |    2.395 |    22.569 | E:	estfile.dat (30GB)
     5 |      8283185152 |      1011131 |      26.23 |    3357.20 |    2.375 |    21.607 | E:	estfile.dat (30GB)
     6 |      7820320768 |       954629 |      24.76 |    3169.60 |    2.508 |    21.745 | E:	estfile.dat (30GB)
     7 |      7896784896 |       963963 |      25.00 |    3200.59 |    2.479 |    21.981 | E:	estfile.dat (30GB)
-----------------------------------------------------------------------------------------------------
total:       64582090752 |      7883556 |     204.49 |   26175.34 |    2.428 |    22.258
Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      7940784128 |       969334 |      25.14 |    3218.43 |    2.471 |    22.884 | E:	estfile.dat (30GB)
     1 |      8152604672 |       995191 |      25.81 |    3304.28 |    2.401 |    22.211 | E:	estfile.dat (30GB)
     2 |      8116256768 |       990754 |      25.70 |    3289.55 |    2.408 |    22.080 | E:	estfile.dat (30GB)
     3 |      8180006912 |       998536 |      25.90 |    3315.38 |    2.394 |    22.936 | E:	estfile.dat (30GB)
     4 |      8192147456 |      1000018 |      25.94 |    3320.30 |    2.395 |    22.569 | E:	estfile.dat (30GB)
     5 |      8283185152 |      1011131 |      26.23 |    3357.20 |    2.375 |    21.607 | E:	estfile.dat (30GB)
     6 |      7820320768 |       954629 |      24.76 |    3169.60 |    2.508 |    21.745 | E:	estfile.dat (30GB)
     7 |      7896784896 |       963963 |      25.00 |    3200.59 |    2.479 |    21.981 | E:	estfile.dat (30GB)
-----------------------------------------------------------------------------------------------------
total:       64582090752 |      7883556 |     204.49 |   26175.34 |    2.428 |    22.258
Write IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |               0 |            0 |       0.00 |       0.00 |    0.000 |       N/A | E:	estfile.dat (30GB)
     1 |               0 |            0 |       0.00 |       0.00 |    0.000 |       N/A | E:	estfile.dat (30GB)
     2 |               0 |            0 |       0.00 |       0.00 |    0.000 |       N/A | E:	estfile.dat (30GB)
     3 |               0 |            0 |       0.00 |       0.00 |    0.000 |       N/A | E:	estfile.dat (30GB)
     4 |               0 |            0 |       0.00 |       0.00 |    0.000 |       N/A | E:	estfile.dat (30GB)
     5 |               0 |            0 |       0.00 |       0.00 |    0.000 |       N/A | E:	estfile.dat (30GB)
     6 |               0 |            0 |       0.00 |       0.00 |    0.000 |       N/A | E:	estfile.dat (30GB)
     7 |               0 |            0 |       0.00 |       0.00 |    0.000 |       N/A | E:	estfile.dat (30GB)
-----------------------------------------------------------------------------------------------------
total:                 0 |            0 |       0.00 |       0.00 |    0.000 |       N/A
  %-ile |  Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
    min |      0.068 |        N/A |      0.068
   25th |      0.261 |        N/A |      0.261
   50th |      0.274 |        N/A |      0.274
   75th |      0.305 |        N/A |      0.305
   90th |      0.413 |        N/A |      0.413
   95th |      3.097 |        N/A |      3.097
   99th |     57.644 |        N/A |     57.644
3-nines |    198.563 |        N/A |    198.563
4-nines |    995.725 |        N/A |    995.725
5-nines |   1896.496 |        N/A |   1896.496
6-nines |   1954.282 |        N/A |   1954.282
7-nines |   1954.318 |        N/A |   1954.318
8-nines |   1954.318 |        N/A |   1954.318
9-nines |   1954.318 |        N/A |   1954.318
    max |   1954.318 |        N/A |   1954.318

The important part of this shows that at 204MB/s throughput and 26k IOPs, I had average 2ms latency.

thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev
----------------------------------------------------------------------------------------
total:       64582090752 |      7883556 |     204.49 |   26175.34 |    2.428 |    22.258

Here is a view from my monitoring software, essentially validating the latency.
width=850
A good starting point for SQL workload testing would be something like:

diskspd –b8K –d30 –o4 –t8 –h –r –w25 –L –Z1G –c20G D:iotest.dat > DiskSpeedResults.txt

At this point, I just need to get the SSD installed on the host and test VMware Flash Read Cache.
To be continued…