Saturday, June 25, 2011

A quick primer on Changed Block Tracking (CBT)

We are about to implement a new backup solution that is based on Symantec Netbackup 7, and - like any modern VMware backup solution - it leverages a very cool feature named Changed Block Tracking (CBT) that was introduced in vSphere 4.0 to enable efficient block level incremental backups.

Since it has been around for a while there are numerous good articles around about that topic (see references). I will not just reproduce them here, but summarize the most important key facts you need to know if you come into touch with it for the first time.

1. How does CBT work and what is it good for?
If CBT is enabled for a virtual disk the VMkernel will create an additional file (named ...-ctk.vmdk) in the same directory where it stores a map of all the virtual disk's blocks. Once a block is changed it will be recorded in this map file. This way the VMkernel can easily tell a backup application what blocks of a file have changed since a certain point in time. The application can then perform an incremental backup by saving only these changed blocks.
CBT is also used by Storage VMotion that is able to move a virtual machine's disk files from one datastore to another while it is running.

2. How do you enable CBT?
CBT is to be enabled per virtual disk, and VMware's KB1031873 describes how to do this via editing a VM's advanced configuration parameters through the VI client. Unfortunately this requires the VM to be powered off. However, you can also change the setting while the VM is running by using an appropriate script like the one published here. To make the change effective then you need to perform a so called stun/unstun-cycle on the VM (i.e. power on/off, suspend/resume, create/delete snapshot).
It is important to know that CBT is not enabled by default, because it introduces a small overhead in virtual disk processing.

3. How do CBT and snapshots play together?
When you create a snapshot of a VM's virtual disk an additional ctk-file is created for the delta-disk file of the snapshot. Once this snapshot is deleted again the delta-ctk will be merged with the base-ctk, just like the delta-disk is merged with the base-disk.

4. Important notes and references

  • KB1020128: Changed Block Tracking (CBT) on virtual machines
  • KB1031873: Enabling Changed Block Tracking (CBT) on virtual machines
  • While an application backs up a VM using CBT the VM cannot be vMotioned: KB2001004
  • Inconsistency resolved in vSphere 4.0 U3 and vSphere 4.1: KB1021607
  • KB1031106: Virtual machine freezes temporarily during snapshot removal on an NFS datastore in a ESX/ESXi 4.1 host
  • Eric Siebert on CBT: A detailed introduction
  • Additions by Duncan Epping: Even more details...

Saturday, June 18, 2011

How to hide unused FlexNICs

When I configured an HP Blade Enclosure with VirtualConnect modules for the first time I stumbled over an issue that probably has bothered most of the people doing this, especially if they run ESX(i) on the blade servers:

The BL620c G7 blade servers we are using have four built in 10Gbit-ports, and each of them can be partitioned into up to four so-called FlexNICs (or FlexHBAs for FCoE if you use them together with FlexFabric VirtualConnect modules like we do). The overall 10GBit bandwidth of one port will be split among its FlexNICs in a configurable way. You could e.g. have four FlexNICs with 2,5 GBit each, two with 6 and 4 GBit, or any combination of one to four FlexNICs with their bandwidth adding up to 10GBit.
For the OS (e.g. VMware ESXi) that is installed on the blade server each FlexNIC appears as a separate PCI device. So an ESX(i) host installed on a BL620c G7 can have up to 16 NICs. Cool, eh?

However, we did not really want to use too much of that feature and divided the first two 10Gbit-ports in a 4Gbit-FlexHBA and a 6GBit-FlexNIC each. The third and fourth port we even configured as single 10GBit-FlexNICs.

Now, the problem is that every 10Gbit-port will show up as four PCI devices even if you have configured less than four FlexNICs for it. Even if you have not partitioned it at all, but use it as a single 10Gbit-NIC, it will show up as four NICs with the unconfigured ones being displayed as disconnected!
In our case we ended up with ESXi seeing (and complaining about) 10 disconnected NICs. Since we monitor the blades with HP Insight Manager it also constantly warned us about the disconnected NICs.

So, we thought about a method to get rid of the unused FlexNICs. If we had Windows running directly on the blades this would have been easy: We would just disable the devices and Windows (and also HP Insight Manager) would not be bothered by them. However, in ESX(i) you cannot just disable a device ... but you can configure it for "VMDirectPath":

PCI Passthrough configuration of a BL620c G7
This dialog can be found in the Advanced Hardware Settings of a host's configuration. What does it do?
With VMDirectPath you can make a host's PCI device available to a single VM. It will be passed through to the VM, and the guest OS will then be able to see and use that device in addition to its virtual devices.
This way it is possible to present a physical device to a VM that you normally would not be able to add.

In the dialog shown above you configure which devices are available for VMDirectPath (also called PCI Passthrough). You can then add all the selected devices to the hardware of individual VMs.
We really did not want to do the latter... but there is one desirable side effect of this configuration: A device that is configured for VMDirectPath becomes invisible for the VMkernel. And this is exactly what we wanted to achieve for the unused FlexNICs!

So we configured all unused FlexNICs for VMDirectPath, and they were no longer being displayed as (disconnected) vmnics. If you want to do the same you need to know what PCI device a vmnic corresponds to. In the screenshot I posted you will notice that for some of the PCI devices the vmnic name is displayed in brackets, but not for all. So, it can be hard to figure out what devices need to be selected, but it's worth it!

Saturday, June 11, 2011

Using Converter: Hot or cold clone?

Ulli Hankeln (aka continuum) recently started an interesting thread in the VMware communities asking what cloning method people prefer with VMware Converter.
Some time ago we started a fairly large server virtualization project (several hundred servers) and ask ourselves exactly this question. After some discussions we identified some evaluation criteria and judged both of two methods following these criteria:

1. Risk of data loss
This is clear point for cold clone, because it will make the machine absolutely unavailable and unchangeable during the conversion process. With a hot clone it is possible that data on the machine is changed during the conversion process that will then not be replicated to the virtual clone.

2. Operations complexity
This in contrast is a clear point for hot cloning. This process is fully automated through the Converter GUI. For cold cloning you need to boot the server from a CD. If you are lucky you can do that remotely be using HP iLO or similar remote control adapters. But you will always stumble over some servers that you will need to visit physically to mount the boot CD.

3. Operations success rate and possible errors
What can go wrong with a cold clone? You might not be able to even start it, because you cannot boot the server through iLO (or similar means) and do not have physical access to the system (at the time you need it) to mount the CD physically. If this is not a problem the second possible error is that the boot CD does not include the necessary drivers for the conversion: You need a working mass storage driver and a working network driver, but the CD only includes a limited set of drivers that might not be suitable for you if the server you are going to virtualize has quite modern hardware. It is possible though to rebuild the boot CD and add additional drivers to it, but this is an extra step and needs extra time and effort.
What can go wrong with a hot clone? You will always be able to start the hot cloning, but in some situations it will fail half way through or at the end. Possible errors are e.g. corrupt file systems or bugs in converter. In such a case troubleshooting is difficult, although you will find some hints in the VMware KB. Fortunately, this does not happen too often. In our experience in clearly less than 5% of all cases.
So, what's working better? It very much depends on your setup, the types of hardware you have, the OSs you are virtualizing and the applications you are running in them.
For us hot cloning wins this criteria, but we were not able to decide that until we already started our virtualization project and made good progress with it.

4. VMware support
In recent releases of Converter (including 5.0 being in beta right now) VMware has dropped support for cold cloning. I guess the reason for this is that they wanted to get rid of maintaining the boot CD and instead concentrate on improving the hot conversion process. And they really did that e.g. by adding incremental data synchronization after the first data copy run.
So, this is a win for hot cloning. Anyway, it is still possible to use cold cloning with the current releases of Converter, because it has support for importing the images of third party cloning tools (like Symantec Ghost and Acronis TrueImage). The reconfiguration process of Converter will make these images bootable with the virtual clone by injecting the necessary mass storage drivers into it.

Conclusions: In our project we have a "hot cloning first"-policy. To mitigate the risk of data loss we stop all unneeded services on the source machine (like application and database services) before starting the process. On machines that are used interactively (like Windows Terminal Servers) we disable logons.
Only if the hot clone fails we try a cold clone as the next step.

As stated earlier, your mileage may vary. What do you think? Like always comments are welcome.

Wednesday, June 8, 2011

VMware Converter 5.0 beta available

VMware has released a beta of the upcoming new version 5.0 of Converter. Check this community for details.
According to the preliminary release notes the new version will support the next version 5.0 of vSphere, but also earlier versions down to ESX 3.0 and vCenter 2.5.
But the most important and best news is that Converter 5 will properly handle disk alignments (see my previous post about the lack of this in earlier versions)!

Sunday, June 5, 2011

Two things to know about VMware Converter

VMware Converter is a tool to convert physical machines (either online or via an existing backup image) to VMware virtual machines. It is available as a free stand-alone version and in a vCenter-integrated version.
Like others we are using it a lot for virtualizing existing physical servers, just because it's free and/or comes pre-installed with vCenter. It also does a pretty good job and is well supported by VMware, but ...
you should be aware of two issues when using Converter more than occasionally.

1. Windows 2000 support
With the latest versions (Stand-alone converter 4.3 and vCenter 4.1-integrated) VMware dropped support for converting Windows 2000 machines (see the notes about supported guest OSs in the Release Notes). The really bad about this is that it does not just tell you this when you try to convert a Windows 2000 machine, but throws an error message about not being able to install the Converter agent on the target computer. It looks like it tries to install the Windows XP version of the agent which fails.
At first it looks like this is not a big problem, because older versions of the converter still support Windows 2000. If you run vSphere 4.1 you can use the stand-alone Converter 4.0.1 to convert Windows 2000 machines by connecting to the vCenter 4.1 server or directly to an ESX(i) 4.1 host. We have done this a lot and it always worked. However, if you carefully look at the Release Notes of Converter 4.0.1 you will notice that it only supports vSphere 4.0 as virtualization platform, but not vSphere 4.1.
We asked VMware support how we - as a vSphere 4.1 customer - are supposed to convert a Windows 2000 machine using Converter in a way that is fully supported by VMware. Here are the instructions (it's only one possible way, but you will get the idea):
a) Install an ESX(i) 4.0 host and add it to an existing vCenter 4.1 instance
b) Use the Stand-alone Converter 4.0.1 to connect to this ESX(i) 4.0 host and convert the Windows 2000 machine
c) Migrate the virtualized Windows 2000 machine to an ESX(i) 4.1 host (either cold or by VMotion)
That's a bit cumbersome, isn't it? Anyway, as stated above you can also use the stand-alone converter 4.0.1 to connect directly to vSphere 4.1. It is not officially supported, but seems to work quite well.

2. Disk alignment
If you care about storage performance then you want your VMFS volumes and your guest OS partitions to be aligned. There are a lot of good explanations about what disk alignment is and why it is important. My personal favorite is on Duncan Epping's blog.
Now, the big issue is that VMware Converter does not align the guest OS partitions in the target virtual machine. Although VMware is also pointing out the importance of disk alignment since a long time (see e.g. this ESX3 white paper) they have still - as of version 4.3 - not yet built this capability into their own Converter product.
So, if you are serious about disk performance and are planning for a large virtualization project you may want to consider alternatives to VMware Converter. There are other commercial products available that do proper disk alignment. One example is Quest vConverter.

Update (2011-09-01): Good news. Today VMware released Converter 5.0 which is now able to do disk alignments!

Sunday, May 29, 2011

Network troubleshooting, Part III: A real life example (Broadcom NICs dropping packets)

Recently we had a strange problem inside a Linux VM: a rsync-job that was used to copy data from a local disk to a NFS-mounted share reproducibly failed during data copy with a "broken pipe" error message.

Using the methods I wrote about in Part I and Part II of this little troubleshooting series (and some trial and error for sure) we found out that the issue would only occur if the VM was using a certain type of physical NIC, the HP NC371i (with a Broadcom BCM5709 chipset).
Later we also discovered corresponding VMKernel.log-messages like this one:

... vmkernel: 36:02:06:55.923 cpu5:6816883)WARNING: Tso: 545: TSO packet with one segment.>
... vmkernel: 36:02:06:56.325 cpu5:7129949)WARNING: Tso: 545: TSO packet with one segment.
... vmkernel: 36:02:06:57.128 cpu4:6816885)WARNING: Tso: 545: TSO packet with one segment.
... vmkernel: 36:02:06:57.128 cpu4:6816885)WARNING: LinNet: map_pkt_to_skb: This message has repeated 640 times: vmnic1: runt TSO packet (tsoMss=1448, frameLen=1514)

Enough evidence to open a support call with VMware... The outcome was that there is a known problem with the bnx2 driver (that is used for this type of NIC). It drops TSO packets that are below a certain minimum size it expects. The issue only occurs with some of the Broadcom chipsets that this driver can handle. The BCM5709 was not on the list before we opened our case, but it looks like it is also affected.

By the way, TSO stands for TCP segmentation offload and is used to offload the necessary segmentation of large TCP packets to the NIC's hardware. A good thing, if it works flawlessly.
The obvious workaround is to disable TSO by using the appropriate driver options. You could disable it on the host's physical Broadcom-NICs, but this would mean sacrificing the performance benefits of TSO for all VMs using these NICs.
We did not do that, because all other VMs did not have any problems with TSO. Instead we decided to disable TSO only inside the Linux VM that had this problem. This solved the issue for us.

Saturday, May 28, 2011

Network troubleshooting, Part II: What physical switch port does the pNIC connect to?

When you have found out what physical NIC (pNIC) a VM is actually using (see my previous post) you may want to check the external switch port that this pNIC connects to (Is it properly configured, what about the error counters?). Okay, what switch port do you need to check?
It is considered good data center practice to have every connection from every server's pNIC to the switch ports carefully documented. Do you? Do you trust the documentation? Is it up to date? If yes you are fine and can stop reading here...

If you want to be sure, and if you use Cisco switches in your data centers then there is a much more reliable way to track these connections: The Cisco Discovery Protocol (CDP). On Cisco devices this is enabled by default, and it periodically broadcasts interface information to the devices attached to its ports (like your ESX hosts).
By default ESX(i) (version 3.5 and above) will receive these broadcasts and display the information in it through the VI client. In the Hosts and Clusters-view select Networking in the Configuration tab of a host. This will display your virtual switches with their physical up-links (vmnic0, vmnic1, etc.). Now click on the little speech bubbles next to the Physical Adapters and a window like the following will pop up:

CDP information shown in the VI client

You can find a lot useful information here. The Device ID is the name of the Cisco switch. And Port ID shows the number/name of the switch module and the port number on that module. So you can tell your network admins exactly what switch port they need to check.

If CDP information is not available for a physical adapter the pop-up window will also tell you this. Possible reasons: You don't use Cisco switches or have CDP broadcasts disabled on them, or the ESX(i) host's interfaces are not in CDP listen mode.

For more detailed information on CDP and how to configure it in ESX see the VMware KB: Cisco Discovery Protocol (CDP) network information.