Friday, December 11, 2009

Microsoft's Hyper-V R2 vs. VMware's vSphere: A cost comparison

Caveats

Without actual, real-world workloads and operating systems, I decided to take a simplistic approach to this comparison by keeping things relatively streamlined. For example, you'll notice that: in both the VMware and Microsoft cases, the total workload is 150 virtual machines; every virtual machine is running Windows Server 2008 Standard; and every host is identically configured with dual processors.

My goal was to develop a reasonable, consistent baseline and then just work the numbers to see how things would fall out. Further, all of the pricing you see is list; no volume discounts are applied because that information varies too much between vertical and between licensing programs.

These calculations do not include hardware costs; this is a software-only affair. This comparison does not attempt to include maintenance costs, either, as these vary by support level. The Microsoft pricing does include some software maintenance, but only because it's included in the list price. Note that only the Data Center (DC) edition of Windows is compared against the various editions of vSphere. To use any other Windows version in an enterprise virtualization platform would simply not make sense and would drive pricing through the roof due to Microsoft's general DC-only virtualization policies.

In addition, when it comes to server virtualization, my organization (Westminster College) is a VMware vSphere/ESX shop, but I am also keeping my eye on Hyper-V. I stayed as unbiased as possible in this cost comparison.

Cost comparison

Table A shows you this comparison, along with two costing options. Below the table, I provide more detail.

Table A


Click the image to enlarge.

License model. The method by which the product is licensed. This is either per host or per processor.

List price. This is the published list price for each component based on the license model. For example, for vSphere, the list price is per processor, as indicated in the row above.

VMs/host. How many virtual machines will run on each physical host? Originally, I gave an advantage to vSphere due to the product's memory overcommitment capabilities, among other things. I feel that VMware's features give it a significant capacity edge over Hyper-V. However, I decided to keep the comparison as streamlined and consistent as possible. In order to do so, I needed to keep the capacity information consistent. You will note that pricing Option 2 shows what could happen when you take into account vSphere's increased VM density.

Total VMs. How many total virtual machines will be hosted by the infrastructure?

Req'd hosts. Based on the inputs from the VMs/host and Total VMs rows, how many hosts are needed to support this environment?

Host processors. How many processors are installed in each physical host? Since some products are licensed on a per-processor/per-socket basis, this is important.

VM OS. What operating system will run inside each virtual machine in the environment? For the sake of comparison, I indicate that every VM will run Windows Server Standard. In the real world, this is highly unlikely, but it does provide a good baseline software cost comparison.

Management. What management tool or tools will be used to manage the virtual infrastructure? For vSphere, vCenter is your choice. For Microsoft, there are various options, but at the density levels in the table, using Microsoft's Server Management Suite Data Center, which is pretty expensive and licensed per processor. There are other management options for Hyper-V, so don't take what you see here as the final word. Consider the issue of management software carefully and buy only what you need. Note that this suite includes only CALs for individual management tools, such as Operations Manager; you still need to separately license the Operations Manager server. The same holds true if you decide to use Configuration Manager and/or Data Protection Manager. Those CALs are also included in the suite, but the server license is a separate deal.

Management license model. As is the case with the hypervisor, the management software is licensed either per host or per processor.

Management list price. What is the list price for the management software based on the license model?

Windows Server. How much does Windows Server Standard cost?

Hypervisor Costs. What is the total cost of the hypervisor software based on the inputs?

Windows Server Costs. At first glance, how much do the individual Windows Server Standard virtual instances cost? There is a $0 amount in the Hyper-V column because Windows Data Center includes unlimited virtual instances of Windows Server, any edition. If you were to run this infrastructure on Hyper-V on Windows Server Enterprise, you'd pay a whole lot for those virtual instances since only four are included with an Enterprise license.

Management Costs. How much does the management software cost? From everything I've read, the Microsoft management product is licensed per physical host, thus making the management costs go through the roof.

Total. What is the total cost of the solution?

Option 1
This is an extremely important and cost saving option… and it's legal. Several years ago, Microsoft made the following statement regarding the unlimited virtualized instances allowance in the Windows Server Data Center license:

"Licensing does not depend on which virtualization technology is used. With all processors in a server licensed for Windows Server 2003 R2, Datacenter Edition, you can run one instance of the software in a physical operating system environment and an unlimited number of instances in virtual operating system environments. With VMWare GSX Server or SWsoft Virtuozzo, this means you can run one physical instance plus unlimited virtual instances. With VMWare ESX Server, it means you can run unlimited virtual instances because there is no need for a physical instance."

This means that you can run any hypervisor you want and, as long as you also buy a Windows Server DC license to apply to that physical host, you can run as many Windows Server instances as you like on that server. This basically means that you're buying software you'll never use — Windows Server DC — but if you do that math, you can see that having the ability to run as many copies of Windows as you like on a virtual host can quickly add up to a lot of savings. For a dual processor server, a DC license would cost about $6,000. It would take only two virtual enterprise instances or six virtual standard instances to hit the breakeven point. Since this example assumes 10 VMs per host, there are a lot of savings.

Option 2
This option shows the possible effect of vSphere's increased VM density.

Final notes

Again, the values I placed in the table are only for comparison. I'd expect to see higher densities on VMware hosts, which quickly brings down the cost. As you eliminate VMware hosts due to density increases, you get just about $7,000 back per server ($3,495 x 2 processors), and you also save on the associated Windows DC license ($6,000 - $2,999 x 2 processors). For example, suppose you increase the density to 15 VMs per server under vSphere Enterprise Plus. In this case, your total cost would drop from $199,815 to $134,875, which isn't too far off the Hyper-V cost. If you need fewer features,  you can license vSphere Standard or Advanced and actually save money on the VMware solution.

If you want to play around with the numbers in this cost comparison, the table is available as a downloadable Excel spreadsheet.

Friday, December 4, 2009

VMware Esxcfg command in detail

Esxcfg-firewall

Description: Configures the service console firewall ports
Syntax: esxcfg-firewall <options>

Options:

-q

Lists current settings

-q <service>

Lists settings for the specified service

-q incoming|outgoing

Lists settings for non-required incoming/outgoing ports

-s

Lists known services

-l

Loads current settings

-r

Resets all options to defaults

-e <service>

Allows specified service through the firewall (enables)

-d <service>

Blocks specified service (disables)

-o <port, tcp|udp,in|out,name>

Opens a port

-c <port, tcp|udp,in|out>

Closes a port previously opened by –o

-h

Displays command help

-allowincoming

Allow all incoming ports

-allowoutgoing

Allow all outgoing ports

-blockincoming

Block all non-required incoming ports (default value)

-blockoutgoing

Block all non-required outgoing ports (default value)


Default Services:

AAMClient

Added by the vpxa RPM: Traffic between ESX Server hosts for VMware High Availability (HA) and EMC Autostart Manager – inbound and outbound TCP and UDP Ports 2050 – 5000 and 8042 – 8045

activeDirectorKerberos

Active Directory Kerberos - outbound TCPs Port 88 and 464

CIMHttpServer

First-party optional service: CIM HTTP Server - inbound TCP Port 5988

CIMHttpsServer

First-party optional service: CIM HTTPS Server - inbound TCP Port 5989

CIMSLP

First-party optional service: CIM SLP - inbound and outbound TCP and UDP Ports 427

commvaultDynamic

Backup agent: Commvault dynamic – inbound and outbound TCP Ports 8600 – 8619

commvaultStatic

Backup agent: Commvault static – inbound and outbound TCP Ports 8400 – 8403

ftpClient

FTP client - outbound TCP Port 21

ftpServer

FTP server - inbound TCP Port 21

kerberos

Kerberos - outbound TCPs Port 88 and 749

LicenseClient

FlexLM license server client - outbound TCP Ports 27000 and 27010

nfsClient

NFS client - outbound TCP and UDP Ports 111 and 2049 (0 – 65535)

nisClient

NIS client - outbound TCP and UDP Ports 111 (0 – 65535)

ntpClient

NTP client - outbound UDP Port 123

smbClient

SMB client - outbound TCP Ports 137 – 139 and 445

snmpd

SNMP services - inbound TCP Port 161 and outbound TCP Port 162

sshClient

SSH client - outbound TCP Port 22

sshServer

SSH server - inbound TCP Port 22

swISCSIClient

First-party optional service: Software iSCSI client - outbound TCP Port 3260

telnetClient

NTP client - outbound TCP Port 23

TSM

Backup agent: IBM Tivoli Storage Manager – inbound and outbound TCP Ports 1500

veritasBackupExec

Backup agent: Veritas BackupExec – inbound TCP Ports 10000 – 10200

veritasNetBackup

Backup agent: Veritas NetBackup – inbound TCP Ports 13720, 13732, 13734, and 13783

vncServer

VNC server - Allow VNC sessions 0-64: inbound TCP Ports 5900 – 5964

vpxHeartbeats

vpx heartbeats - outbound UDP Port 902


Note: You can configure your own services in the file /etc/vmware/firewall/services.xml

esxcfg-firewall examples:
Enable ssh client connections from the Service Console:
# esxcfg-firewall -e sshClient
Disable the Samba client connections:
# esxcfg-firewall -d smbClient
Allow syslog outgoing traffic:
# esxcfg-firewall -o 514,udp,out,syslog
Turn off the firewall:
# esxcfg-firewall -allowIncoming
# esxcfg-firewall -allowOutgoing
Re-enable the firewall:
# esxcfg-firewall -blockIncoming
# esxcfg-firewall –blockOutgoing


Esxcfg-nics

Description: Prints a list of physical network adapters along with information on the driver, PCI device, and link state of each NIC. You can also use this command to control a physical network adapter's speed and duplexing.
Syntax: esxcfg-nics <options> [nic]

Options:

-s <speed>

Set the speed of this NIC to one of 10/100/1000/10000. Requires a NIC parameter.

-d <duplex>

Set the duplex of this NIC to one of 'full' or 'half'. Requires a NIC parameter.

-a

Set speed and duplex automatically. Requires a NIC parameter.

-l

Print the list of NICs and their settings.

-r

Restore the NICs configured speed/duplex settings. (Internal use only)

-h

Displays command help


esxcfg-nics examples:
Set the speed and duplex of a NIC (vmnic2) to 100/Full:
esxcfg-nics -s 100 -d full vmnic2
Set the speed and duplex of a NIC (vmnic2) to auto-negotiate:
esxcfg-nics -a vmnic2


Esxcfg-vswitch

Description: Creates and updates virtual machine (vswitch) network settings
Syntax: esxcfg-vswitch <options> [vswitch[:ports]]

Options:

-a

Add a new virtual switch.

-d

Delete the virtual switch.

-l

List all the virtual switches.

-L <pnic>

Set pnic as an uplink for the vswitch.

-U <pnic>

Remove pnic from the uplinks for the vswitch.

-p <portgroup>

Specify a portgroup for operation. Use ALL for operation to work on all portgroups

-v <vlan id>

Set VLAN ID for portgroup specified by -p. 0 would disable the VLAN.

-c

Check to see if a virtual switch exists. Program outputs a 1 if it exists, 0 otherwise.

-A <name>

Add a new portgroup to the virtual switch.

-D <name>

Delete the portgroup from the virtual switch.

-C <name>

Check to see if a portgroup exists. Program outputs a 1 if it exists, 0 otherwise.

-r

Restore all virtual switches from the configuration file (Internal use only)

-h

Displays command help



esxcfg-vswitch examples:

Add a pnic (vmnic2) to a vswitch (vswitch1):
esxcfg-vswitch -L vmnic2 vswitch1
Remove a pnic (vmnic3) from a vswitch (vswitch0):
esxcfg-vswitch -U vmnic3 vswitch0
Create a portgroup (VM Network3) on a vswitch (vswitch1):
esxcfg-vswitch -A "VM Network 3" vSwitch1
Assign a VLAN ID (3) to a portgroup (VM Network 3) on a vswitch (vswitch1):
esxcfg-vswitch -v 3 -p "VM Network 3" vSwitch1


Esxcfg-vswif

Description: Creates and updates service console network settings. This command is used if you cannot manage the ESX Server host through the VI Client because of network configuration issues.
Syntax: esxcfg-vswif <options> [vswif]

Options:

-a

Add vswif, requires IP parameters. Automatically enables interface.

-d

Delete vswif.

-l

List configured vswifs.

-e

Enable this vswif interface.

-s

Disable this vswif interface.

-p

Set the portgroup name of the vswif.

-i <x.x.x.x> or DHCP

The IP address for this vswif or specify DHCP to use DHCP for this address.

-n <x.x.x.x>

The IP netmask for this vswif.

-b <x.x.x.x>

The IP broadcast address for this vswif. (not required if netmask and ip are set)

-c

Check to see if a virtual NIC exists. Program outputs a 1 if the given vswif exists, 0 otherwise.

-D

Disable all vswif interfaces. (WARNING: This may result in a loss of network connectivity to the Service Console)

-E

Enable all vswif interfaces and bring them up.

-r

Restore all vswifs from the configuration file. (Internal use only)

-h

Displays command help.


Note: You can set the Service Console default gateway by editing the /etc/sysconfig/network file or through the VI Client under Configuration, DNS & Routing.

esxcfg-vswif examples:
Change your Service Console (vswif0) IP and Subnet Mask:
esxcfg-vswif -i 172.20.20.5 -n 255.255.255.0 vswif0
Add a Service Console (vswif0):
esxcfg-vswif -a vswif0 -p "Service Console" -i 172.20.20.40 -n 255.255.255.0


Esxcfg-route

Description: Sets or retrieves the default VMkernel gateway route
Syntax: esxcfg-route <options> [<network> [<netmask>] <gateway>]
<network> can be specified in 2 ways: as a single argument in <network>/<mask> format or as a <network> <netmask> pair.
<gateway> is either an IP address or 'default'

Options:

-a

Add route to the VMkernel, requires network address (or 'default') and gateway IP address.

-d

Delete route from the VMkernel, requires network address (or 'default').

-l

List configured routes for the Service Console.

-r

Restore route setting to configured values on system start. (Internal use only)

-h

Displays command help


esxcfg-route examples:

Set the VMkernel default gateway route:
esxcfg-route 172.20.20.1
Add a route to the VMkernel:
esxcfg-route -a default 255.255.255.0 172.20.20.1


Esxcfg-vmknic

Description: Creates and updates VMkernel TCP/IP settings for VMotion, NAS, and iSCSI
Syntax: esxcfg-vmknic <options> [[portgroup]]

Options:

-a

Add a VMkernel NIC to the system, requires IP parameters and portgroup name.

-d

Delete VMkernel NIC on given portgroup.

-e

Enable the given NIC if disabled.

-D

Disable the given NIC if enabled.

-l

List VMkernel NICs.

-i <x.x.x.x>

The IP address for this VMkernel NIC. Setting an IP address requires that the -n option be given in same command.

-n <x.x.x.x>

The IP netmask for this VMkernel NIC. Setting the IP netmask requires that the -i option be given in the same command.

-r

Restore VMkernel TCP/IP interfaces from configuration file. (Internal use only)

-h

Displays command help


esxcfg-vmknic examples:

Add a VMkernel NIC and set the IP and subnet mask:
esxcfg-vmknic -a "VM Kernel" -i 172.20.20.19 -n 255.255.255.0

Thursday, December 3, 2009

What is Changed Block Tracking in vSphere?


CBT is a new feature in vSphere that can keep track of the blocks of a virtual disk that have changed since a certain point in time. This is extremely useful for backup and replication applications that can use this information to greatly improve incremental backup and replication times. Without CBT these applications have to figure out changed blocks on their own so being able to get this information for free using the vStorage advanced programming interfaces is extremely valuable to them.

CBT is not really part of the vStorage APIs but is a new feature of the VMkernel that is built into the storage stack. The CBT feature can be accessed by third-party applications as part of the vStorage APIs for Data Protection. Applications can use the API to query the VMkernel to return the blocks of data that have changed on a virtual disk since the last backup operation. You can use CBT on any type of virtual disk, thick or thin and on any datastore type except for physical mode Raw Device Mappings. This includes both NFS and iSCSI datastores.

Besides requiring vSphere, a prerequisite for using CBT is that a virtual machine must be using version 7 virtual hardware. While this is the default in vSphere (you can still choose the old version 4 hardware that was used in VMware Infrastructure 3, if you upgraded a host from ESX 3 to ESX 4 you must also upgrade the virtual hardware of the VMs to version 7 to use this feature.



The CBT feature is disabled by default; the reason for this is there is a very small bit of overhead that occurs when using it. However this overhead is a small price to pay for the great efficiencies that happen by enabling it. This feature is not global and can be enabled on only select VMs that you want to take advantage of this functionality. It can be enabled either through the vSphere client or by using the SDK. To enable it in the vSphere client you need to add a configuration parameter to each VM using the below steps:

1. Power off the VM. This is necessary to add a configuration parameter, edit the settings of the VM. Select the Options tab, then under Advanced, General click the Configuration Parameters button.


2. Next, click the Add Row button. You first need to add a general parameter for the VM to enable the feature and then add additional ones for each virtual disk that you wish to enable it on. For the general parameter enter "ctkEnabled" for the name and "true" for the value.


3. For each virtual disk you need to add "scsi#:#.ctkEnabled" for the name where the # signs should be replaced by the controller/disk number for each disk. Usually this is 0:0 for the first disk, then 0:1 for the second disk, etc. Also use "true" for the value for this parameter.


4. Click OK when you are done to save it.

An alternate method to enable CBT is using the SDK. Many backup applications that support CBT will automatically do this for you. For details on using the SDK method see this VMware tech note which describes how to use VirtualMachineConfigSpec and ReconfigVM_Task methods to accomplish this programmatically. Typically you will not want to enable this unless you have a specific application that can utilize this feature such as VMware Data Recovery or Veeam Backup & Replication.

Once enabled a VM must go through what is called a stun/unstun cycle for it to take effect. This cycle happens during certain VM operations including power on/off, suspend/resume, create/delete snapshot. During this cycle a VM's disk are reopened which allows a change tracking filter to be inserted into the storage stack for that VM. You might wonder where CBT stores the information about changed blocks for a virtual disk, it does this in a special "-ctk.vmdk" file that is created in each VM's home directory for each virtual disk that it is enabled on.


This size of this file is fixed and does not grow beyond its initial size unless you increase the size of a virtual disk. The size of this file will vary based on the size of a virtual disk which is approximately .5MB for every 10  GB of virtual disk size. Inside this file the state of each block is stored for tracking purposes using sequence numbers that can tell applications if a block has changed or not. One of these files will exist for each virtual disk that CBT is enabled on.

CBT is a great feature that really improves the efficiency and speed of virtual machines' backup, restore and replication operations in vSphere. Several backup applications have already taken advantage of this new feature and are reporting greatly improved incremental backup times and being able to achieve near continuous data protection because of it.

Special thanks to John Troyer and Jon Bock from VMware and Anton Gostev from Veeam for taking the time to help me better understand the vStorage APIs and CBT.

Posted by: Eric Siebert

Eric Siebert, vSphere on http://itknowledgeexchange.techtarget.com/

Monday, November 2, 2009

Common Fault issues in VMware Infrastructure

1007819
Dealing with an unresponsive virtual machine

1007802
Confirming a virtual machine is unresponsive

1007805
Locating a virtual machine log files on an ESX host

1000674
Can't stop or kill virtual machine

1004344
Identifying causes of not being able to power cycle ESX Server virtual machines

1007808
Ensuring a virtual machine is not inaccessible due to a VMware VirtualCenter issue

1007813
Troubleshooting a virtual machine that has become unresponsive because of an ESX host

1007814
Troubleshooting a virtual machine that is unresponsive because of configuration issues

1007818
Dealing with unresponsive guest OS issues

1007808
Ensuring a virtual machine is not inaccessible due to a VMware VirtualCenter issue

1003895
Stopping, starting, or restarting the VirtualCenter Server service

1002687
Virtual machine stops responding in a Power On state in VirtualCenter

7114568
Cannot Power on Virtual Machines, "Not enough licenses installed to perform operation" Error Message

1004592
Vmware VirtualCenter console displays handshake error.

    

1007813
Troubleshooting a virtual machine that has become unresponsive because of an ESX host

1003751
Verifying that ESX virtual machine storage is accessible

1004144
ESX Server virtual machines stop responding due to shared storage connectivity issues

1003755
Verifying sufficient free disk space for an ESX virtual machine

1003564
Investigating disk space on an ESX host

1003659
Identifying shared storage issues with ESX 3.x

1006791
Server stops responding and shows errors on a purple screen

10051
Virtual machine does not power on because of missing or locked files

1003690
Ensuring your hardware is functioning correctly

1004005
[Internal] Third Party System Management agents in the Service Console

1007814
Troubleshooting a virtual machine that is unresponsive because of configuration issues

1005734
Troubleshooting a virtual machine that stops responding or fails when the CD-ROM entry is ATAPI

1001637
Virtual machine does not power on and there is high CPU reservation

1002025
Virtual machine stops responding during backup

1002836
Why snapshot removal can stop a virtual machine virtual machine for long time

1003164
Guest stops responding after connecting a USB CD-ROM

    

1007818
Dealing with unresponsive guest OS issues

1004007
Investigating operating system disk space

1007577
Do not use guest OS performance tools to monitor virtual machine performance

1007866
Using Windows Event Viewer to identify the cause of an unresponsive or failed virtual machine

1004764
Unable to shutdown Windows using Shutdown Guest option

     

Brought to you by VMwarewolf.com VMware and Virtualization Technical Discussions from a VMware Technical Support Engineer

Friday, September 4, 2009

Resolving VMware ESX problems without pulling the plug


Differences between virtual machines and physical servers highlight the unique challenges of resolving virtual machine issues. On a physical server you can always pull the power plug as a last resort before restarting a server. But this strategy may not work on virtual machines, which only have virtual power switches. There are, however, a few toolkits available that either help prevent problems, or make your troubleshooting process easier. I'll discuss several of these in this tip, and give you step-by-step instructions on how to fix various common problems.

VMware Tools

The first set of tools you want to familiarize yourself with is VMware Tools. VMware Tools is a set of enhanced drivers and applications that installs on your virtual machine's (VMs) operating system. As a best practice, you should make a habit of always installing VMware Tools to ensure the optimal performance and stability of your VM. Also, double check to make sure that you're running the latest version of VMware Tools after you install any upgrades to ESX (incidentally, some ESX patches will also require updates to VMware Tools). There is a column in the Virtual Machine view in the VMware Infrastructure Client (VI Client) that will show the VMware Tools status of every VM and whether it is OK, out of date or not installed.

Virtual machine file types

As part of the troubleshooting process, you'll need to understand all the various file types involved with fixing a possible problem. Let's review the files associated with a virtual machine:

  • .nvram file – This file contains the CMOS/BIOS for the VM.
  • .vmdk files – These are the disk files that are created for each virtual hard drive in your VM. There are three different types of files that use the vmdk extension, they are:
    • *–flat.vmdk file - This is the actual raw disk file that is created for each virtual hard drive.
    • *.vmdk file – This is the disk descriptor file which describes the size and geometry of the virtual disk file.
    • *–delta.vmdk file - This is the differential file created when you take a snapshot of a VM (also known as REDO log)
  • .vmx file – This file is the primary configuration file for a virtual machine. When you create a new virtual machine and configure the hardware settings for it that information is stored in this file.
  • .vswp file – This is the VM swap file (earlier ESX versions had a per host swap file) and is created to allow for memory overcommitment on a ESX server.
  • .vmss file – This file is created when a VM is put into Suspend (pause) mode and is used to save the suspend state.
  • .log file – This is the file that keeps a log of the virtual machine activity and is useful in troubleshooting virtual machine problems.
  • .vmxf file – This is a supplemental configuration file in text format for virtual machines that are in a team.
  • .vmsd file – This file is used to store metadata and information about snapshots.
  • .vmsn file - This is the snapshot state file, which stores the exact running state of a virtual machine at the time you take that snapshot.

Log files
Once you understand VM file types, you'll want to become very familiar with log files. Log files are the best method for troubleshooting problems with virtual machines. It's the first place you should check when problems occur.

The most important file is the Vmware.log file. This is the main log file for the VM on the ESX server, and is located in the working directory for the VM. Vmware.log is always the current working log for the VM and older log files are incremented numerically, i.e. vmware-1.log

You should also check /var/log/vmkernel and /var/log/vmware/hostd.log on the ESX host for any errors that may be related to the problem you are experiencing with your VM. Sometimes, restarting the hostd service (service mgmt-vmware restart) on the ESX host will resolve quirky problems with virtual machines. For more common problems, there are more specific techniques that will likely resolve your problem; I'll go over these next.

Problem: Can't shut down a virtual machine

Let's say you can not shutdown a VM using the VM power controls. You can try using command line methods to try and manually kill your stuck VM. There are several methods for doing this below. Employ these methods only as a last resort, short of restarting your ESX host.

  • The first option you should always try is the command line equivalent to using the VI Client which is the vmware-cmd command.
    • Login to the service console
    • Type "vmware-cmd –l" to get a list of all VMs and their paths
    • You can check the VM state by typing "vmware-cmd //.vmx getstate"
    • To forcibly stop type vmware-cmd //.vmx stop hard"
    • Check VM state again, it should now be off
    • Type "vmware-cmd //.vmx start" to power on VM
  • The second option is to try and manually kill the VM's process by finding its process identifier (pid) and issuing the kill command to terminate it.
    • Login to the service console
    • Type "vmware-cmd –l" to get a list of all VM's and there paths
    • You can check the VM state by typing "vmware-cmd //.vmx getstate"
    • Type "ps -ef | grep "
    • The second column is your pid of the vmkload_app of the virtual machine, you can also type "ps –eaf" to see all running processes
    • Type "kill -9 "
    • Check VM state again, it should now be off
    • Type "vmware-cmd //.vmx start" to power on VM
  • The last option is to use the vm-support to command to try and force the VM to shutdown.
    • Login to the service console
    • Get the vmid of the VM you want to kill by typing "vm-support –x" or "cat /proc/vmware/vm/*/names"
    • Kill the VM and generate core dumps and logs by typing "vm-support –X "
    • You will be prompted if you want to include a screenshot of the VM, send an NMI to the VM and send an abort command to the VM. You must answer yes to the abort question to kill the VM. The entire process will take about 5-10 minutes to run. It will create a tar archive in the directory.

Problem: Can't power on a virtual machine

Another common problem may be that you can not power on a VM. This can happen if the host server does not have enough resources for the VM to use. For example, if the VM has a memory reservation set and the ESX host does not have enough physical memory to meet the reservation, then it cannot power on the VM. If this happens you can either remove the memory reservation from the VM and migrate it to another host with more free physical memory, or you can free up physical memory on the existing host.

Also, when a VM is powered on it needs to create a vswp file in the working directory of the VM on the ESX host that is equal to the amount of RAM assigned to the VM (minus any memory reservations). If there is not sufficient disk space on your ESX host, then you will also not be able to power on the VM. A workaround it to set a memory reservation equal to the amount of RAM assigned to the VM so the vswp file will be 0 bytes in size. It's important, however, to always take care to leave additional disk space on your VMFS volumes for things like logs, swap files and snapshots.

Problem: Virtual machine encountering boot errors due to OS corruption
If a VM is having problems while booting due to operating system corruption or faulty configuration, a good way to deal with this is to add its virtual disk to another working VM so you can access the drive and make any needed repairs. To repair the VM, you should make sure the problem VM is powered off. Next add an additional drive to a working VM and browse to the problem VM's disk file. Boot the working VM; you can now access the drive of the problem VM to make any changes or corrections. When you are done remove the drive from the working VM, add it back to the problem VM and try booting it again.

Problem: General virtual machine OS issues

For troubleshooting problems with the VM's operating system, I create a toolkit of ISO files that contain helpful troubleshooting applications that I can quickly mount on a VM's CD-ROM and use (or boot from) to make repairs to a VM. A few of the ISO files I use include:

  • Sysinternals utilities - Great utilities for troubleshooting Windows server problems.
  • Gparted – A Linux-based disk partition editor.
  • Knoppix - A Linux-based live CD with many tools and applications.
  • Ultimate Boot CD - A live CD with many system repairs and testing tools.
  • UBCD4Win - A Windows-based live CD with many system repairs and testing tools.

Conclusions
These are just a few of the problems and techniques that you will use when troubleshooting virtual machine problems. The information in this article should help you the next time you experience a problem with a troublesome VM.