Showing posts with label vSphere 5. Show all posts
Showing posts with label vSphere 5. Show all posts

Saturday, July 14, 2012

VMware vCenter Update 1a and ESXi 5.0 Build 768111

VMware has released Update 1a for vCenter and important patches for ESXi 5.0 on July 12th 2012.

What's new and fixed in vCenter Update 1a:
  • This is the first update that is also available for the vCenter (Linux based) appliance
  • For the appliance it switches the embedded DB2 database to VMware's own vPostgres database.
  • The problem with excessive memory consumption of the tomcat6 service (KB2013890) has been resolved.
  • The problem with HA restarts failing for VMs that were migrated with Storage VMotion (KB2013639) is fixed.
  • For a full list of fixes see Release notes.
What's new in the ESXi 5.0 Patch Release ESXi500-201207001:

Here is a link to the full ESXi 5.0 update bundle: ESXi500-201207001.



Thursday, June 28, 2012

Explaining the CX + vSphere5 + VAAI support riddle

I recently blogged about the oddity of EMC CX arrays not being supported with VAAI on vSphere 5.0 anymore and - like many others - I wondered what are the real reasons behind this. Today Chad Sakac (EMC) explained the riddle in a web cast covering a VNX engineering update and - well - exactly this issue.

If you missed it: he just published the presentation that he showed on his blog. Here is a summary:
Read more »

Tuesday, June 19, 2012

How to list vSphere 5.0 HA restarts with PowerCLI

VMware HA usually does a good job restarting VMs in case of an ESX(i) host failure. Imagine this happened at night - you come into office the next morning and want to know what VMs were restarted (or eventually failed to restart) because of the HA event.

You can find that out by looking at the vCenter event log, but if this happened several hours (or even longer) ago the vSphere client will no longer display the associated events. Even if the events are still displayed you will have a hard time to find them and compile a list of restarted VMs.

To cope with this situation I wrote a small PowerCLI script that searches the vCenter event log, finds the relevant entries and prints the list of VMs that were restarted during the latest HA event. It works with vSphere 5.0 only. Here it is:
param(
[string]$vcenter = "localhost",
[int]$last = 24,
[switch]$help = $false
)

$maxevents = 250000

"`nScript to generate list of successful and failed VM restart attempts after an HA host failure."
if ($help) {
"Optional parameters:
-help: display this help
-vcenter servername: connect to vCenter server servername (default is localhost)
-last n: analyze events from the last n hours (default is 24)
"
exit
} else {
"(Use -help for list of parameters)"
}

$stop = get-date
$start = $stop - (New-TimeSpan -Hours $last)

if (!(get-pssnapin -name "VMware.VimAutomation.Core" -ErrorAction SilentlyContinue )) { add-pssnapin "VMware.VimAutomation.Core" }

write-host "`nConnecting to vCenter server $vcenter ..."
Connect-VIServer $vcenter | out-null

write-host "`nGetting all events from $start to $stop (max. $maxevents) ..."
$events = Get-VIEvent -Start $start -Finish $stop -MaxSamples $maxevents

write-host Got $events.Length events ...

write-host -nonewline "`nSearching for host failure events ..."
$ha = @()
$events | where-object { $_.EventTypeID -eq "com.vmware.vc.HA.DasHostFailedEvent" } | foreach { $ha += $_ }
write-host (" found " + $ha.Length + " event(s).")
if ($ha.Length -eq 0) {
write-host "`nNo host failure events found in the last $last hours."
write-host "Use parameter -last to specify number of hours to look back.`n"
exit
} else {
write-host ("`nLatest host failure event was " + $ha[0].ObjectName + " at " + $ha[0].CreatedTime + ".")
}

$events = $events | where-object { $_.CreatedTime -ge $ha[0].CreatedTime }

write-host "`nList of successful VM restarts:"
$events | where-object { $_.EventTypeID -eq "com.vmware.vc.ha.VmRestartedByHAEvent" } | foreach {
write-host $_.CreatedTime: $_.ObjectName
}

write-host "`nList of failed VM restarts:"
$failures = @{}
$events | where-object { $_.FullFormattedMessage -like "vSphere HA stopped trying*" } | foreach {
$vmname = $_.FullFormattedMessage.Split(" ")[6]
if (!($failures.ContainsKey($vmname))) {
$failures.Add($vmname,$_.CreatedTime)
write-host $_.CreatedTime: $vmname
}
}

Disconnect-VIServer -Force -Confirm:$false
The script takes three optional parameters:
  • -vcenter servername: Connect to the vCenter server named servername (default is localhost)
  • -last n: Analyze events of the last n hours (default is 24). If e.g. the HA event was during a weekend, and you run this script on Monday you may need to raise this to as much as 72. Please note: The higher n is the longer the script will run and the more memory it will consume!
  • -help: display help on parameters
Usually you may want to specify at least the vCenter server name. If you have only one you can of course hard code it into the script as the default value (instead of localhost) in line 2.

How does it work?

As a first step the script reads all events of the last n (default: 24) hours and searches them for host failure events (event id "com.vmware.vc.HA.DasHostFailedEvent", see line 36). If there were multiple host failures in this time frame it will only look at the latest one and discard all earlier events (line 46).

To find out what VMs were restarted the script looks for events of id "com.vmware.vc.ha.VmRestartedByHAEvent". It will print the time stamps of these events and the names of the VMs that were restarted (line 49 to 51).

At last the script looks for events messages that start with the text "vSphere HA stopped trying". These events are thrown when HA fails to restart a VM multiple times in a row. With vSphere 5 HA will try very hard and repeatedly to restart a VM, so you might see this event multiple times for each failing VM. The script records the failing VMs in a hashed array in order to print their names only once, together with the time stamp of the latest failing restart attempt.


Thursday, May 24, 2012

How to disable Storage I/O Control for an unavailable datastore

If you run your VMs on FC based storage then you occasionally have the necessity to unmap a storage LUN from one or more ESX(i) hosts (e.g. when retiring a storage array or re-organizing your storage layout). It is important to do this in the right way using the procedures that are documented in
The procedure is very complex for vSphere 4.1, but fortunately is has become much easier with vSphere 5.0.

What happens if you fail to follow these procedures and just unpresent the LUN on the storage array, so that the hosts can not access it anymore? The ESX(i) hosts will detect an APD (all paths down) condition in this case, and - particularly ESX(i) 4.1 hosts - can become very unhappy about this (see KB1016626 and KB1030980). Again, ESXi 5.0 hosts are much more resilient to APD conditions: they will eventually turn them into PDL (= permanent device loss, s. KB2004684) conditions and will completely recover from the LUN loss after rescanning the HBAs ... unless you have Storage I/O control (SIOC) enabled on the lost datastores ... and this is what happened to us today :-( The vmkernel.log files were flooded with the following messages, because SIOC was trying to access the lost datastore:

Permanent Device Loss (PDL) with SIOC enabled
Now the problem is: You cannot just disable SIOC on a lost datastore using the vSphere client - you should have done this before unmapping the storage LUN!

One way to recover from this situation is to reboot any of the affected hosts. However, I really wanted to save the time to put all the hosts in maintenance mode and reboot them one after the other. So I looked for a way to forcibly unmount the datastore directly on the host via esxcli or similar. All the ways that are documented in the VMware KB articles and docs did not work in my case, but I finally stumbled over this wonderful blog post by William Lam:
Does SIOC actually require Enterprise Plus and vCenter?
It is a bit old and refers to ESX(i) 4.1, but it is still valid for ESXi 5.0! Here William describes a way to enable (and disable!) SIOC for a datastore directly on a host without using (even without having available!) a vCenter server.
And it is really easy: All you need to know is the device ID of the datastore/LUN. In ESXi 5.0 you can find it out by using the command
   # esxcli storage vmfs extent list
It will list all datastores with their labels and device IDs (starting with naa.). And then you can use the following vsish command to disable SIOC for the device
   # vsish -e set /storage/scsifw/devices/<naa-id>/iormState 1496
That's it! After waiting a few seconds the affected datastore suddenly disappeared from the list of mounted datastores in the vSphere client, and the VMkernel.log error messages also stopped.

Please note: vsish is a powerful but largely undocumented utility to query and set VMkernel parameters. It is only available in an ESXi local or remote shell. William has quite a few posts about it on his virtuallyGhetto blog. This time it saved us the trouble and time of rebooting a whole cluster of hosts ...

Thursday, May 10, 2012

Updated: What's the deal with EMC CX arrays not supporting VAAI with vSphere 5.0?

(If you have read this post before then skip to the bottom for recent updates)

We are in the process of upgrading our production environment to vSphere 5.0 U1. A new vCenter 5.0 server is already in place, and we attached the production ESXi hosts (that are still on 4.1 U2) to it.
The next step would be to upgrade the hosts. In the meantime, since this is a qualified production environment running more than 1.500 virtual servers I became a bit paranoid about hardware and software/firmware compatibility, and decided to double check if the environment is fully supported (or if we should upgrade any firmware first). I was pretty confident that we are on the safe side because we never had any issues with vSphere 4.1 and always kept the environment up to date.

But then I stumbled over this VMware KB article: EMC CX and VNX Firmware and ESX requirements for vStorage APIs for Array Integration (VAAI) support. It states that ESXi 5.0 does not support VAAI on our Clariion CX4 arrays even although they have the latest FLARE code:


vSphere 4.1vSphere 5.0
CX4 Series Flare 30/29/28+VAAI SupportedNot Supported
VNX Series OE 31 or Later *VAAI SupportedVAAI Supported

I checked the test hosts that we already updated and they showed VAAI being supported on the CX4 LUNs in the vSphere client. However, the KB article recommends to disable VAAI on these hosts ...

I quickly searched the Internet for relevant posts and statements and browsed through the VMware Community forums. There I found this post where people complain about the VAAI status shown as unsupported. It looks like there are dependencies to the LUN size (smaller or larger than 2TB) and to the array's failover mode (sounds like you need to use ALUA / failover mode = 4 which we are already using). But from this post I get the impression that everything is fixed with the latest 30.x FLARE code releases).

We contacted both VMware and EMC to get some clarification now, and I will keep this post updated with any new information. In the meantime I ask everyone using ESXi 5.0 with EMC CX storage to comment on this post whether you are using VAAI, if you have any issues with it, and what FLARE code you have on the arrays. BTW, you can check what VAAI primitives are in use with your storage LUNs by running the following esxcli command:
   esxcli storage core device vaai status get

Thanks in advance for any helpful comments!


Update (2012-05-20): I got some feedback from other users, VMware and EMC on this issue, and used my best Google-Fu to get related information about vSphere 5, VAAI, VMFS-5 and/or EMC CX support. Here are the results (somewhat unsorted):
Regarding my specific issue (lack of official support for VAAI with EMC CX arrays and ESXi 5.0) I finally got one really detailed and helpful comment from an EMC representative: EMC tested the VAAI functionality of ESXi 5.0 with their EMC CX4 arrays (I specifically asked for the CX4). It has been "tested as functional", but it only passed the certification tests of the ATS and Zero primitives, but failed the certification test of the XCopy primitive.
I wonder why this happened, because VAAI was fully supported with ESXi 4.1 (i.e. it must have also passed the XCopy test), and with ESXi 5.0 there were no changes to the XCopy primitive (at least I could not find any information about such a change).

Anyway, this is probably the reason why EMC and VMware do not officially support VAAI with ESXi 5.0 on CX arrays. From a customer's point of view this is disappointing, but from EMC's point of view this decision is understandable, because they want to focus on their current products and not waste support resources on somewhat outdated arrays like the CX ones. However, according to this EMC representative it may be possible to come to an individiual support agreement with EMC and VMware (EMC calls this RPQ = Request for Product Qualification) if you nevertheless must or want to have an official support statement for whatever reason.

I also had the idea of disabling only the XCopy primitive to be on the safe side, but I don't want to do this globally, because we also use a fully VAAI supported VNX array with our production hosts. So I was glad to find a link to the promising KB article KB2012967 (see list above), but this link currently doesn't work. I will ask VMware about this ...



Update (2012-06-20): In the meantime an EMC representative confirmed to me that all three VAAI primitives work fine with vSphere 5.0 and CX arrays, and that they even support it, but you need to file an RPQ with them in order to get formal support.

We have this configuration (VAAI turned on for both our CX4 and VNX arrays) running with vSphere 5.0 in production for about 6 weeks now without any problems.

The link to VMware KB article KB2012967 still doesn't work, but it is now listed to as being "[Archived]" in the reference section of KB1033665 ...





Update (2012-07-11): It turned out that you no longer need an RPQ with EMC. They now officially support the vSphere 5's VAAI implementation on the CX4. The combination is listed in their latest Simple Support Matrix for VMware vSphere 5 (Powerlink account needed for download):
Snippet from the EMC Simple Support Matrix for vSphere 5 (of July 2012)
VMware though does still not officially support it (KB2008822 is unchanged as of today). In another blog post I already explained the reason for this.







Friday, March 16, 2012

[Release] vSphere 5.0 Update 1 etc.

Yesterday VMware published minor updates to their Cloud Infrastructure suite products including vCenter and ESXi 5.0 Update 1. The news is already well covered, e.g. here in the VMware vSphere blog by Duncan Epping.

It looks like there are only few new features in there. For me the most exiting one is that VAAI Thin Provisioning Block Reclaim/UNMAP is back, not inline, but on demand though. However, the list of bug fixes is very long, and this is good! I have the feeling that many enterprise customers that are still on vSphere 4.x were waiting for this update and will now more seriously attack the migration to vSphere 5.0 (following the conservative rule "Don't trust a .0 release"). At least this is what we were thinking ...

Monday, February 20, 2012

About the VMware Tools of ESXi 5.0 and why you should install them on vSphere 4.x

There is a rather new VMware KB article available that describes an interesting problem with the VMware Tools version of ESX(i) 4.1 Update 2: If the clock resolution of a Windows VM has been changed from the default then the VMware Tools service will continually consume 15% CPU performance (in a 1 vCPU VM, for 2 vCPU VMs it will be 7%, etc.).
We have seen this problem on few of our VMs, it looks like there are certain Windows applications around that change the clock resolution thus causing the problem. Detailed background information about the Windows clock resolution (and why it is not a good idea to change it) is available from Microsoft.

The resolution documented in the KB article is to downgrade the VMware Tools to an earlier version or - and this is probably surprising for most of us - to install the VMware Tools version of ESXi 5.0 instead.
This reminds us of the fact that VMware has changed their VMware Tools support policy with the introduction of vSphere 5.0: The VMware Product Interoperability Matrixes now include a selection for the VMware Tools, and it shows that the Tools of ESXi 5.0 are "interoperable" not only with ESXi 5.0 but also with ESX(i) 4.1 and even ESX(i) 4.0:

VMware Tools Interoperability Matrix
 ... whereas earlier versions were only interoperable with the corresponding ESX(i) version.

So, if you are still on vSphere version 4.1 or 4.0 and are planning to upgrade to vSphere 5 sooner or later then you can start deploying the VMware Tools of ESXi 5.0 now, and avoid the effort of future tools upgrades.
You can download the latest version of the ESXi 5.0 tools here at packages.vmware.com.

If you run a manual custom installation of the ESXi 5.0 tools in a Windows VM you will notice that there are some new components included:
VMware Tools 5.0 components default selection
The default selection of components (this is what you get when doing an automatic install or upgrade) is now more suitable for VMware ESXi than it was with earlier versions of the tools, but it still includes two components that are useful for VMware Workstation and completely useless when running ESXi: the Record/Replay Driver and the Audio Driver. Earlier versions of the Tools would also install the Shared Folders component by default, although it is also only useful with VMware Workstation.

A last hint: There is still another "feature" in the VMware Tools package for Windows that I personally find very annoying: Once you have installed the Tools you are by default not able to modify or repair the installation through the "Add or Remove Programs" control panel applet. To fix this find the GUID key for the VMware Tools package in the registry under
   HKLM\Software\Microsoft\Windows\CurrentVersion\Uninstall
and change the NoModify and NoRepair values there to 0.

Wednesday, January 25, 2012

How to use ESXi 5 as an NTP server - OR - How to permanently add custom firewall rules?

Recently my attention was caught by a question posted to the VMware Community forums that sounds odd at first sight: Is it possible to configure ESXi 5.0 to act as a NTP server?

I wondered why should you try to do this? On the one hand it is not recommended to use ESXi for anything else than the task that it was designed for: being a hypervisor. On the other hand it is not recommended to run a VM as NTP server, because exact timekeeping can be quite a challenge in VMs as they do not own a real hardware clock timer. So, should you run a physical box just for NTP? Small shops that have reached 100% virtualization run only ESXi on their remaining physical servers. So I can understand people considering an exception and wanting to run an ESXi host as NTP server - it is a very lightweight service anyway ...

Now back to the question ..., and the answer is: Yes, it is. In fact it is very easy to do this, because once you have configured ESXi 5.0 to act as a NTP client it will also automatically act as a NTP server! The NTP daemon (/sbin/ntpd) does both at the same time, and its configuration file (/etc/ntp.conf) even allows any other machine to query it by default. There is only one hurdle: the ESXi 5.0 firewall.
By default it blocks the port for incoming NTP queries (UDP port 123). We need to create a custom firewall extension to open that port. KB2005304 explains how to do this. Basically you need to create a custom XML configuration file in the directory /etc/vmware/firewall, e.g. /etc/vmware/firewall/ntpd.xml with the following contents:

<!-- Firewall configuration information for NTP Daemon -->
<ConfigRoot>
  <service>
      <id>NTP Daemon</id>
      <rule id='0000'>
          <direction>inbound</direction>
          <protocol>udp</protocol>
          <porttype>dst</porttype>
          <port>123</port>
      </rule>
      <enabled>false</enabled>
      <required>false</required>
  </service>
</ConfigRoot>

(Take care when you copy or modify this: The XML tags are case sensitive!)

Then load the new configuration by running the following command inside a ESXi shell:
  esxcli network firewall refresh

After that you can see the custom firewall rule in the firewall properties dialog of the vSphere client:

Custom "NTP Daemon" firewall rule
 Enable the rule, and you are done ...
... until the next reboot of the host, because User defined xml firewall configurations are not persistent across ESXi host reboots. The KB article that describes this problem also includes a work-around to resolve it: Put the XML file on a shared datastore and modify the /etc/rc.local boot script to copy the file to the correct location on every reboot.

This works, but I personally consider this an ugly hack, because this modification is not inherent in the system but introduces a dependency to an external resource (the datastore). So I created a VIB file that you can effectively install on ESXi and that will permanently add the XML file to the system.
Run the following commands inside an ESXi shell to install the VIB file:

   esxcli software acceptance set --level CommunitySupported
   esxcli software vib install -v http://files.v-front.de/fwenable-ntpd-1.2.0.x86_64.vib

The first command  is needed for ESXi to accept the custom VIB, because it does not include a trusted signature file. The second command will download and install the VIB file (Note: you can also download the file with a browser, store it on a local datastore and reference the local file with the install command).
The installation will not require the host to be in maintenance mode and it will be immediately effective without the need to reboot the host! It will also automatically reload the firewall rules, so the only step left is to enable the rule in the vSphere client.

By the way, I created this VIB file with a new and improved version of my TGZ2VIB5 script that I currently work on. Once I have finished this new version and made it available here I will also post a detailed description of how I created the VIB file.

Friday, January 13, 2012

Undocumented parameters for ESXi 5.0 Active Directory integration

Since vSphere version 4.1 it is possible to integrate an ESXi host into a Microsoft Active Directory (AD). After the host is joined to the domain you can assign permissions to AD groups and users by connecting directly to the host with the vSphere client.
Instructions on how to do this (with ESXi 5.0) is available e.g. here in the VMware Online Documentation.

I first looked at AD integration when vSphere 4.1 was released and found one really annoying drawback in it that ruled it out from a possible implementation in our environment: When an ESXi 4.1 host is joined to a domain it will automatically (and repeatedly!) look up an AD group called "ESX Admins", and as soon as it finds this group it will grant this group Administrator permissions on the ESXi host. The real problem here is that the name ("ESX Admins") of this AD group is hard coded and can not be configured.
This may be a nice feature for small environments - you just need to create this group, fill in the necessary people and you are done. But if you think about an enterprise environment of a large company with lots of different sites, IT teams and vSphere installations, but only one Active Directory, you can not assume that all ESXi hosts in this company are managed by the same group of people.

When vSphere 5.0 was released I looked at the release notes and documentation to find out if this drawback was removed, but I did not find any positive information. Tests I did also showed that an ESXi 5.0 host behaves the same way, looks up the "ESX Admins" group and adds it with Administrator permissions.

However, recently I stumbled over the following when browsing the advanced configuration parameters of an ESXi 5.0 host:
Configuring the "ESX Admins" group
Yes, with ESXi 5.0 it is possible to change the name of the AD group that is automatically added by setting the advanced configuration option Config.HostAgent.plugins.hostsvc.esxAdminsGroup. You can even completely disable this functionality by setting the option  Config.HostAgent.plugins.hostsvc.esxAdminsGroupAutoAdd to false.
I searched for this again in the VMware documentation and the Knowledge Base, but did not find it being mentioned anywhere. So it looks like at the time this is completely undocumented, but it works as expected (I could not resist from immediately trying this out)!

Thursday, October 27, 2011

Update: ESXi 5.0 on HP G7 blades, now a Go!

About three weeks back I reported on Emulex firmware problems that prevented the use of ESXi 5.0 on HP G7 blade hardware. This was fixed now, somehow...

HP has now updated the advisory that describes the issue and published an updated firmware that fixes the VLAN handling problems with ESXi 5.0 if it is used together with the be2net driver 4.0.355.1.

Be sure that you read the release notes of the firmware! It looks like it is an emergency/workaround release that leaves many issues unresolved. A firmware version that you can really trust for production will probably be available mid-November.

Update (2012-12-09): HP and Emulex published the final version of the OneConnect firmware (4.0.360.15a) on Nov 19th. VMware's KB2007397 also lists the recommended drivers to use with this firmware for both ESXi 4.1 and 5.0.

Update (2012-03-09): HP has published yet another firmware update on March 5th. Download version 4.0.360.15b. The previous link has become invalid.

Update (2012-04-16): Please refer to my HP & VMware links page to find the download for the latest version of the firmware.

Wednesday, October 26, 2011

VMware finally released the Open Source Code of vSphere 5.0!

Great news! Today VMware finally made the vSphere v5.0 Open Source code archives available for download.

Why is that important?

Since the release of VMware's ESXi 5.0 (Aug 24, 2011) many people are asking for the development of drivers for hardware devices that are not supported by ESXi 5.0 out-of-the-box.

ESXi device drivers are based on Linux device drivers (which lead to the persistent misunderstanding that ESXi itself is based on Linux), but the stock Linux driver code must be modified in a specific way to be compatible with ESXi.

With past versions of ESXi (up to 4.1) it was possible to study and reproduce these required modifications, because VMware published the source code of the ESXi device drivers (the original Linux code plus their modifications). The reason for this is that most Linux drivers are licensed under the GPL (General Public License), and the GPL requires that derived works are also published under the GPL and their source code is made freely available (aka the "Copyleft" principle).

Now, that VMware also published the Open Source Code of ESXi 5.0 (including the device drivers that it contains) it will be possible (or at least much easier) to develop custom ESXi 5.0 drivers for devices that are not officially supported by VMware.


Saturday, October 1, 2011

Currently a No-Go: ESXi 5.0 on HP G7 blades

Back in May I reported on problems with ESXi 4.1 and the Emulex OneConnect CNA that is built into HP's G7 blade servers.
If you now try to install ESXi 5.0 on such a hardware you will have a strong déjà vu: The be2net driver that is available right now for ESXi 5.0 is not really functioning due to "VLAN tagging issues". HP has published an advisory on this stating that an updated driver (that should fix these issues) is "currently in the certification process" and will be made available in "Q4 2011".

Okay, I won't update our production hosts to ESXi 5.0 that soon anyway, but I just wanted to install it on some spare blades for testing and evaluation. Too bad ... waiting for a fix again ...

Update (2011-10-27):
HP has now updated the advisory and published an updated firmware that fixes the VLAN handling problems with ESXi 5.0 if it is used together with the be2net driver 4.0.355.1.
Be sure that you read the release notes of the firmware! It looks like it is an emergency/workaround release that leaves many issues unresolved. A firmware version that you can really trust for production will probably be available mid-November.

Update (2012-04-16):
In the meantime it looks like all problems have been fixed with newer firmware and driver versions. Please refer to this newer post of mine!

Unable to assign license after installing a server with the HP ESXi 5.0 ISO

With the availability of vSphere 5.0 HP published a customized ESXi installation ISO for HP servers.
There have been reports that this build includes an annoying bug: HP has included a license file that has wrong permissions set. That potentially causes errors once you want to assign an own license to the host.

You can fix this by removing the offending license file with the following commands (to be executed on the ESXi host after directly after it has been installed):

  esxcli software vib remove -n hp-esx-license --no-live-install

and reboot the host. The command will remove the HP license and restore the original state of the host being in evaluation mode.

HP has also published an advisory describing the problem and providing a way to update their license package to fix the problem.


Thursday, August 25, 2011

The anatomy of the ESXi 5.0 installation CD - and how to customize it

1. Introduction

With vSphere 5 VMware introduced the Auto Deploy Server and the Image Builder that allow to customize the ESXi installation ISO with partner supplied driver and tools packages.
The Image Builder is a Powershell snapin that comes with the latest version of the PowerCLI package. It allows to add software packages to a pre-defined set of packages (a so-called ImageProfile) and even lets you create an installation ISO from such a baseline making it easier than ever to customize the ESXi installation.

However, doing this is not a straight-forward task. It requires a working installation of the Powershell, plus the PowerCLI software, access to the offline-bundle that makes up the base installation (which is not included with the free version of ESXi!), a custom driver in VIB format, and some guidance on what Powershell-cmdlets you need to use to add the custom driver package and build an ISO from it.
For the developers of custom drivers it requires to supply their packages in VIB format, and it's not trivial and costs extra effort to build such a package (compared to a simple OEM.TGZ file).

I wondered if it is still possible to customize the ESXi 5.0 install ISO with a simple OEM.TGZ file like you can do with ESXi 4.1, e.g. with my ESXi-Customizer script. And yes, it is possible - but it's very different now! I want to provide some background information here on how this works:

2. The contents of the ESXi 5.0 installation ISO

First let's have a look at the root directory of the ESXi 5.0 install ISO:

Contents of the ESXi 4.0 install CD root directory
Unlike the ESXi 4.1 ISO you can see lots of ISO9660-compatible file names here (all capitals and 8.3-format). You can guess from their names that the files with the V00 (and V01, V02, etc.) extensions are device driver archives. The original  type of these files is VGZ, the short form of VMTAR.GZ. That means that they are gzip'ed vmtar-files.

vmtar is a VMware proprietary variant of tar, and you need the vmtar-tool to pack and unpack vmtar archives. It is part of ESXi 5.0 and also ESXi 4.x. Other files have the extensions TGZ and T00 (like TOOLS.T00). These files are gzip'ed standard tar files that the boot loader can also handle. Good.

Comparing with the ESXi 4.1 media you will notice that there is no ddimage.bz2 file any more. In earlier versions of ESXi this is a compressed image that is written to the installation target disk and contains the whole installed ESXi system. Actually you can write this image to a USB key drive to produce a bootable ESXi system without ever booting the install CD. You cannot do this with ESXi 5.0 any more. However, customizing the install CD has become easier this way, because you do not need to add a second copy of your oem.tgz file to this system image.

There are also files named ISOLINUX.BIN and ISOLINUX.CFG in the ISO root. That means that ESXi 5.0 still uses the isolinux boot loader to make the installation CD bootable. If you look into ISOLINUX.CFG it includes a reference to the file BOOT.CFG, and in BOOT.CFG you find references to all the VGZ and TGZ files:
Contents of the BOOT.CFG file
A second copy of the BOOT.CFG file is in the directory \EFI\BOOT. The ESXi 5.0 install ISO (and ESXi 5.0 itself) was built to boot not only on a standard x86 BIOS, but also on new (U)EFI enabled BIOS versions. Just one thing to remember: If you change the one BOOT.CFG you better make the same change to the other.

Now let's have a closer look at a driver VGZ package.

3. What's in a driver's vgz-file?

As mentioned before you need the vmtar-tool to look into a VGZ-file. Since it is only part of ESXi itself you need to have access to an installed copy of ESXi (either 4.1 or 5.0). Luckily you are able to install ESXi 4.1 (and also 5.0!) inside a VMware Workstation 7 VM.
I did this by creating a VM of type "ESX Server 4" with typical settings except for the size of the virtual disk (2GB is enough for ESXi) and installing ESXi 5.0 in it. During installation the driver files from the CD root are uncompressed and copied to the directory /tardisks, so here is where you can find them again. After enabling the local shell (luckily still available with 5.0) I logged in and was finally able to look inside and unpack such a driver archive using the vmtar tool:
Unpacking NET-E100.V00 with the vmtar tool
So there are basically three files in the archive:

1. The driver binary module (with no file name extension, e1000 in this example) that will be unpacked to the well known location /usr/lib/vmware/vmkmod.

2. A text file that maps PCI device IDs to the included driver:
Contents of /etc/vmware/driver.map.d/e1000.map
3. Another text file that maps PCI IDs to vendor and device descriptive names:
Contents of /usr/share/hwdata/driver.pciids.d/e1000.ids

It is good to know that the PCI ID mapping files are now separated by driver. In ESXi 4.1 there is a single pci.ids file and a single simple.map file for all drivers which raised the potential of having conflicting copies of these files in case you merged multiple OEM drivers into the image.

It looks easy now to add a custom driver to the install CD: Just create a tgz-file containing the three files mentioned above, copy it to the ISO root directory and add its name to the two BOOT.CFG files. And yes, this will indeed work for the CD boot! The custom driver will be loaded and you will be able to install ESXi, ... but the installation routine will not copy the tgz-file to the install media, and if you boot the installed system the first time it will behave like a regular install without the custom driver.

So, there is more to it...

4. The image database IMGDB.TGZ

There is a file named IMGDB.TGZ in the root directory of the CD that is also listed in the BOOT.CFG files and has the following contents:
Unpacking the IMGDB.TGZ file
It contains files that will be unpacked to the directory /var/db/esximg. For each driver (or other software package) an XML-file is created under the vibs sub directory. There are a lot more of these files than shown here (I fiddled the output with "..."), one example is net-e1000--925314997.xml for the e1000 driver. Let's look into this file:
The contents of net-e1000--925314997.xml
The xml-file contains information about the package including possible dependencies on other packages and a list of all included files. Its file name ("net-e1000--925314997.xml") consists of the name element plus a (probably) unique number with 9 or 10 digits. The list of payloads is the list of included archive files (either of type vgz or tgz), in most cases it's just one. The name of the payload is limited to 8 characters ("net-e100" in this case) and is the name of the corresponding file in the CD's root directory. The extension of this file is expected to be ".v00" if the file is of type vgz and ".t00" if the file is of type tgz. If there are name conflicts with other packages the number in the extension is counted up. E.g. the payload file for the e1000e driver is "net-e100.v01".

Then there is the host image profile XML file in the directory /var/db/esximg/profiles. In our example this is the file ESXi-5.0.0-381646-standard1293795055. Let's look into this one:

... ... ... (lot more <vib></vib> entries cutted) ... ... ...
Contents of the host image profile XML file
Here we find a list of all vib-packages that make up the currently installed system. Please note that the vib-id of a package strictly corresponds to the element values that are in the associated vib xml file (see picture before), it is composed the following way:
<vendor>_<type>_<name>_<version>
So the vib-id element of the net-e1000 driver e.g. is
VMware_bootbank_net-e1000_8.0.3.1-2vmw.0.0.383646

The payload names that are listed in the image profile file are the same as in the distinct vib xml files with the exception that here the exact file names (e.g. "net-e100.v00") are listed rather than just the file type (vgz or tgz).

Conclusion: If we want to add a custom driver to the install CD we need to do the following (in addition to the steps described in section 3.): modify the contents of IMGDB.TGZ, add a vib xml file for the driver (similar to net-e1000...xml) to it and update the contained image profile file to include the driver as an additional <vib>-entry.

There is another particular XML element in both the vib files and image profile file that we need to take care of: the <acceptancelevel>. VMware distinguishes four different acceptance levels: VMwareCertified, VMwareAcceptedPartnerSupported and CommunitySupported, in the XML files they are coded as certified, vmwarepartner and community. The names are pretty self-explanatory, and one can easily guess that certified is stricter than vmware that is stricter than partner that in turn is stricter than community. In other words: If the host image profile is of acceptance level certified only packages of the same acceptance level can be part of it. If it is of acceptance level vmware only VMware certified and VMware accepted packages can be installed. If it is of acceptance level partner (and this is the default!) partner supported packages can be installed in addition to that. The least restrictive level is community that would accept all four types of packages.
My expectation is that custom drivers for whitebox hardware are community supported (unless they are published by a hardware vendor company). However, if the driver's vib file contains the acceptance level community the image profile's acceptance level must also be changed to community. Otherwise the installation of the package will fail.

5. Can we automate it?

Yes, we can! The latest version of ESXi-Customizer does automate all the steps described here to add custom drivers in tgz-format to an ESXi 5.0 install ISO. You only need to feed it with a tgz-file that contains the three files listed in section 3 of this post.

Please note: Packages made for earlier ESXi versions will not work with ESXi 5.0, not only because the directory structure has changed, but also because the earlier versions' driver modules won't be loaded by the new version! And - at the time of this writing - there are probably no oem.tgz-style driver packages available that are compatible with ESXi 5.0!
Hopefully, this will soon change. If you are looking for a driver of a device that does not work out-of-the-box with ESXi 5.0 check the Unofficial Whitebox HCL at vm-help.com.


Saturday, August 6, 2011

vSphere 5: release date rumors and licensing changes

From what I have heard the originally targeted release date for VMware's vSphere 5 was August 5th. Now this has passed and it did not happen. There are now rumors ongoing that it will be released on August 22nd (see source)...
I don't know why it is being delayed. One possible reason is the change in licensing that was announced on August 3rd (see VMware's Power of Partnership Blog). With the revelation of vSphere 5 on July 12th VMware introduced a new licensing method based on vRAM (the amount of RAM allocated to running VMs) which lead to a storm of protest among customers and partners, especially because of the low amount of vRAM per physical CPU that was originally communicated. With the announcement above VMware has doubled this entitlement for most vSphere editions and they also capped the accountable vRAM for a single VM to 96GB (even if it has more RAM than that).
This will definitely help to speed up the adoption of vSphere 5 ... once it is released.

Update (2011-08-23): Okay, nothing again ... So it will probably happen on Friday (August 26th), just before VMworld 2011 (starting on Monday 29th).

Update (2011-08-25): It is out now, the official release date was August 24th. Customers with subscription go here to download. The free ESXi version is available here.


Thursday, July 14, 2011

vSphere 5 licensing - check your environment now to see how it affects you

There has been a lot of rant about the new licensing model of vSphere 5 (see my previous post), because for certain customers (specifically those with a very high RAM per CPU ratio which is more and more common with recent server hardware) will need to buy more vSphere 5 licenses to cover the vRAM usage as they had vSphere 4 licenses before.

Before you start complaining yourself check your environment now to find out how it will affect you. There are a number of PowerCLI scripts available now for doing this. I personally like LucD's the most, get it here: http://www.lucd.info/2011/07/13/query-vram/.

For my production environment it outputs the following:

  vCenter        : [MyVC-FQDN]
  vRAMConfigured : 2732.2
  vRAMUsed       : 2624.8
  vRAMEntitled   : 6000
  LicenseType    : vSphere 4 Enterprise Plus

Note that the used vRAM is lower than the configured vRAM, because it only takes into account the total RAM of all running VMs (and I also have some that are powered off).
The current version of the script also counts the assigned licenses only. However, if you have spare licenses that are currently unassigned, they will also add up to vRAM entitlements once they are upgraded to vSphere 5 (I ask Luc to fix that, maybe there is a new version of his script soon).

Anyway, as you can see, I am lucky with the new licensing model and would (yet) have plenty of unused vRAM in my pool if I upgraded today.

Update: There is now an even better script available by Virtu-Al: It can also handle ESX versions earlier than 4.1, looks for unassigned licenses and has a nice HTML output:
  http://www.virtu-al.net/2011/07/14/vsphere-5-license-entitlements/
It is referenced in an official VMware Blog post that tries to better explain the new licensing model and the motivation behind it.

Tuesday, July 12, 2011

VMware raised the bar - Announcement of vSphere 5 and other new products

Today VMware made some major announcements of new products and product versions that are planned to be available in Q3 2011 (see original press release):

vSphere 5 includes the following improvements and new features (compared to vSphere 4.1):
  • Improved VM scalability (up to 32 vCPUs and 1 TB RAM) and performance (3x to 4x I/O improvements)
  • New and improved HA architecture (easier to set up and more scalable)
  • Autodeploy: On-the-fly deployment of ESXi hosts through PXE-boot
  • Profile driven storage: allows to define classes of storage (distinguished e.g. by performance and availability) and tie VMs to it by defining "storage policies"
  • Storage DRS: Automatic initial storage placement and balancing of VMs
  • vSphere 5 hosts are ESXi only. No more classic ESX (like previously announced)
  • Change in licensing model: vSphere 5 will still be licensed per physical CPU socket, but introduces another component: vRAM, which is the amount of RAM configured for VMs. Each CPU license entitles for the use of a specific amount of vRAM (dependent on the vSphere edition, e.g. 48 GB for Enterprise Plus). vRAM entitlements can be pooled among all hosts managed by a vCenter instance. For details see the new Licensing Whitepaper.
vCenter Site Recovery Manager (SRM) 5 introduces "vSphere Replication" a.k.a. host-based mirroring and a new "automatic failback" feature. For details see VMware's official product page.

The new vSphere Storage Appliance turns the local hard disks in your ESXi hosts into mirrored, highly available NFS datastores. This way you can use VMotion, DRS and HA without the need for additional shared storage hardware. See the product overview and the technical whitepaper.

vShield 5 introduces new sensitive data discovery and intrusion detection capabilities.

vCloud Director 1.5 now supports fast provisioning with linked clones (a feature that was already available with the Lab Manager product that is now obsoleted by vCloud Director) and supports SQL as its database.

Sunday, July 3, 2011

Raising the bar, Part V - vSphere 5 is near

If you look at VMware's homepage these days you will notice an announcement of a live event on July 12th. It is titled "Raising the bar, Part V".
You don't need to be a visionary to figure out that this can only mean that VMware will announce the long-awaited new major release of its virtualization platform: vSphere 5.

This does not necessarily mean that vSphere 5 will become generally available on July 12th. However, once it is available I will post a list of at least the most important new features of it. So, stay tuned!