08-30-2010 07:12 AM
Last week we had troubles on our san environment : Total loss of lun acces
on 20 virtual machines running on vmware ESX 3.5
I need to understanding what it happend
On vmware we have 6 ESX, each ESX has a view with 5 DATASTORES, so 5 LUNS.
This ESX share same LUNS. On them VM are running.
For hardware test reason : one thing was done on our environement :
We removed on Blade / ESX server, put 1 hard disk running Windows 2003 ( up to date with MS ), then
we restart Windows 2003 ( very bad idea ). We have zonning per port. So Windows 2003
was able to view 5 LUNS of vmware. To resume, we replaced one hard disk with ESX by on hard disk running
After few minutes, ESX vmware was not able to work. Unable to access to LUN.
VM crashed ( 20 vms down on productive environement ).
We successfully troubleshoot that with complete stop of ESX and start againg ( without windows 2003 !!! ).
LUN was not signed by windows 2003 ( VM was ok after restart of ESX ).
What did it happen ? Does Windows 2003 lock LUN ? Does it logout / shutdown initiator LUN ?
So ESX had LUN locked ?
We had on vmware :
- WARNING: SCSI: 2933: CheckUnitReady on vmhba1:1:0 returned I/O error 0x0/0x2 sk 0x2 asc 0x0 ascq 0x0
- StorageMonitor: 196: vmhba1:1:0:0 status = 2/0 0x2 0x4 0x2
Vmware support don't understand, lun was no more ready.
We had like 5 ESX and 1 windows 2003 with NETAPP baie, in same zone.
How to understand what it happend ?
If we have an ESX and one Windows 2003 in same zone, if they can access
to same lun, if Windows 2003 starts or shutdown, does it affect LUN ? Does
it logout LUN ?
Thanks for any informations.
08-30-2010 07:25 AM
First of all, as I have mentioned in my previous posts, SAN admins hardly can troubleshoot the CHASSIS servers.Only the vendor can.
From my practical experience, I did a zoning, mapped the LUN to ESX servers.Normally, after reboot in windows the LUNs are visible. In ESX , it is not possible to reboot it whenever u wish. Butt what I do, I unpublish it and then publish it again. It gets visible.In extreme case I have to remove mapping, remove zoning and do the zoning again and mapping the LUNs again and it appears.
I, even VMpeople do not understand the fact. They only say we are not able to see the LUNs after doing my activity. Then I do it again like I mentioned in the bold..
This is again my practical experience. Now if VMpeople can put some light on it or Netapp people.