Showing posts with label BL620c. Show all posts
Showing posts with label BL620c. Show all posts

Thursday, December 15, 2011

HP Virtual Connect firmware update - can you do this online?

I don't know the answer to this question, but I'm trying to find this out ...

We have two HP c7000 enclosures with Virtual Connect FlexFabric modules to connect to external Cisco Ethernet switches and Brocade FC switches. Both enclosures are fully loaded with 8x BL620c G7 blade servers running ESXi 4.1 Update 2.
Right now we are still able to completely evacuate an enclosure if we want to do maintenance (mainly firmware upgrades) on it, because we have stretched two clusters over both enclosures that each have not more than 50% of their capacity used.

However, given our current VM growth rate we will soon reach a point where this will be no longer possible (without purchasing and deploying a third enclosure). So, I'm currently testing and looking for ways to do an online Virtual Connect firmware upgrade without interrupting network and SAN connectivity. With all the redundancy that is in the enclosure this should be possible, and an HP engineer I lately talked to confirmed that this is indeed possible using HP's Virtual Connect Support Utility (VCSU), and he pointed me to its manual for instructions.

I remember that I already tried this method a while ago. I don't know the firmware and tool versions anymore that I did this test with, but it was not very successful. Although I followed the instructions given I noticed ping timeouts for up to 15 seconds during the upgrade process (I was pinging the hosts VMkernel address).

I just started a thread in the VMTN forums to get some input from others. Has anyone done this successfully? Is there anything to check and configure that is not obvious before trying this? Please share your experience by posting to the VMTN thread or leaving a comment here. Thanks!

Once I have found a working method I will of course update this post!

Update (2011-12-21): I found it ... Read about it in my next post!

Wednesday, October 12, 2011

HP Virtual Connect profile not applied ...

When I recently rebooted one of our BL620c G7 blades with ESXi 4.1 installed on it I found that the server had suddenly lost network connectivity after the reboot.
A quick check on the console revealed that the Virtual Connect profile that was defined for that blade had not been applied to it. I realized that because the MAC addresses of the NICs had not been overwritten with the virtual addresses of the Virtual Connect profile.

I tried powering down and up the blade, re-assigning the Virtual Connect profile multiple times, all to no avail ... Then I had the idea that it might be related to the iLO-board of the blade, and - yes, indeed - after resetting the iLO3-board of the blade the Virtual Connect profile was properly applied and all was fine again.

While later looking on hp.com for some related information I stumbled over the Customer advisory c02820591 that described an issue with the Virtual Connect profile being lost upon an iLO3 reset. Not exactly the issue that I had, and the advisory also stated that this is fixed with iLO3 firmware version 1.20, and that is already installed on our iLOs. However, the advisory confirmed my assumption that the Virtual Connect profile is applied by the iLO-board.

So, if you have similar problems try resetting the iLO-board before you start pulling your hair out, or the blade out of the chassis ...

Saturday, June 18, 2011

How to hide unused FlexNICs

When I configured an HP Blade Enclosure with VirtualConnect modules for the first time I stumbled over an issue that probably has bothered most of the people doing this, especially if they run ESX(i) on the blade servers:

The BL620c G7 blade servers we are using have four built in 10Gbit-ports, and each of them can be partitioned into up to four so-called FlexNICs (or FlexHBAs for FCoE if you use them together with FlexFabric VirtualConnect modules like we do). The overall 10GBit bandwidth of one port will be split among its FlexNICs in a configurable way. You could e.g. have four FlexNICs with 2,5 GBit each, two with 6 and 4 GBit, or any combination of one to four FlexNICs with their bandwidth adding up to 10GBit.
For the OS (e.g. VMware ESXi) that is installed on the blade server each FlexNIC appears as a separate PCI device. So an ESX(i) host installed on a BL620c G7 can have up to 16 NICs. Cool, eh?

However, we did not really want to use too much of that feature and divided the first two 10Gbit-ports in a 4Gbit-FlexHBA and a 6GBit-FlexNIC each. The third and fourth port we even configured as single 10GBit-FlexNICs.

Now, the problem is that every 10Gbit-port will show up as four PCI devices even if you have configured less than four FlexNICs for it. Even if you have not partitioned it at all, but use it as a single 10Gbit-NIC, it will show up as four NICs with the unconfigured ones being displayed as disconnected!
In our case we ended up with ESXi seeing (and complaining about) 10 disconnected NICs. Since we monitor the blades with HP Insight Manager it also constantly warned us about the disconnected NICs.

So, we thought about a method to get rid of the unused FlexNICs. If we had Windows running directly on the blades this would have been easy: We would just disable the devices and Windows (and also HP Insight Manager) would not be bothered by them. However, in ESX(i) you cannot just disable a device ... but you can configure it for "VMDirectPath":

PCI Passthrough configuration of a BL620c G7
This dialog can be found in the Advanced Hardware Settings of a host's configuration. What does it do?
With VMDirectPath you can make a host's PCI device available to a single VM. It will be passed through to the VM, and the guest OS will then be able to see and use that device in addition to its virtual devices.
This way it is possible to present a physical device to a VM that you normally would not be able to add.

In the dialog shown above you configure which devices are available for VMDirectPath (also called PCI Passthrough). You can then add all the selected devices to the hardware of individual VMs.
We really did not want to do the latter... but there is one desirable side effect of this configuration: A device that is configured for VMDirectPath becomes invisible for the VMkernel. And this is exactly what we wanted to achieve for the unused FlexNICs!

So we configured all unused FlexNICs for VMDirectPath, and they were no longer being displayed as (disconnected) vmnics. If you want to do the same you need to know what PCI device a vmnic corresponds to. In the screenshot I posted you will notice that for some of the PCI devices the vmnic name is displayed in brackets, but not for all. So, it can be hard to figure out what devices need to be selected, but it's worth it!

Thursday, May 26, 2011

Updated be2net driver fixes issues with G7 blades

When we started to deploy our HP ProLiant BL620c G7 blade servers we stumbled over some issues with the driver (be2net) for the built-in FlexNIC adapters. They are documented in the VMware KB:
We followed the recommendation in these articles and updated the be2net driver to version 2.102.554.0. However, we still experienced hangs of the ESXi host and network outages whenever the host was rebooted or had its dvS-connections reconfigured.
These hangs were accompanied by VMKernel.log-messages like this one:

... vmkernel: 10:06:11:06.193 cpu0:4153)WARNING: CpuSched: 939: world 4153(helper11-0) did not yield PCPU 0 for 2993 msec, refCharge=5975 msec, coreCharge=6374 msec,

After opening a support call with VMware we finally found out that these problems were caused by improper handling of VLAN hardware offloading by the be2net driver, and that they only occur when you are using distributed virtual switches (dvS) like we did.
So, after configuring the blade hosts with virtual standard switches (vSS) the problem went away.

Since then we were waiting for a fixed be2net-driver (from Emulex) to be able to return to dvS. We really did not want to abandon this option because it offers some benefits (load based teaming of the physical uplinks and Network I/O Control) over the standard switch.

Today, the waiting finally ended. Emulex has finished the fixed driver, it is available here:
VMware ESX/ESXi 4.x Driver CD for Emulex OneConnect 10Gb Ethernet Controller

Update (18. Jul 2011): In the meantime VMware made two new KB articles available that reference the problems described here and the new driver:
In the latter one it is also recommended to update the NIC's firmware. The current one (as of today) is available at HP as a bootable ISO file. Thanks to makö for pointing this out in this post's comments.