Fibre Channel (SAN)

Reply
Occasional Contributor
Posts: 6
Registered: ‎12-30-2014
Accepted Solution

SilkWorm 4900 won't boot from hda2 but will boot from hda1

Hello fellow SAN admins and seasons greetings to you all.

Since this is my first post here, let me introduce myself.

 

My name is Christos and I am a UNIX and storage/backup admin in a small ISP in my country, Greece.

 

The other day, we had a power loss in one of our datacenters.

After power came back, one of our 4900 switches was left with all it's LEDs amber.

 

Here is the printenv:

 

> printenv
AutoLoad=yes
ENET_MAC=00051E03D339
InitTest=MEM()
LoadIdentifiers=Fabric Operating System;Fabric Operating System
OSLoadOptions=quiet;quiet
OSLoader=ATA()0x91950;ATA()0xee77
OSRootPartition=hda2;hda1
SkipWatchdog=yes

 

The switch could not boot with these settings.

Here is the outpout from the console:

> boot ATA()0x91950
Booting "Manually selected OS" image.
Entry point at 0x01000000 ...

Linux/PPC load: 
Uncompressing Linux...done.
Now booting the kernel
Linux version 2.4.19 (swrel@sith) (gcc version 2.95.3 20010112 (prerelease)) -n #1 Tue Oct 3 21:06:25 PDT 2006
cpld_init: platform (44) not supported
Brocade Silkworm port (C) 2002 MontaVista Software, Inc. (source@mvista.com)
On node 0 totalpages: 65536
zone(0): 4096 pages.
zone (0): min(32), low(160), high (256)
zone(1): 61440 pages.
zone (1): min(480), low(2400), high (3840)
zone(2): 0 pages.
Kernel command line: 
Set up jiffies counter to wrap in 0 seconds.
Calibrating FIT timer... running at 3058 Hz. [TSR_FP=1]
Calibrating delay loop... 599.65 BogoMIPS
Memory: 253236k available (1784k kernel code, 1100k data, 76k init, 0k highmem)
Dentry cache hash table entries: 32768 (order: 6, 262144 bytes)
Inode cache hash table entries: 16384 (order: 5, 131072 bytes)
Mount-cache hash table entries: 4096 (order: 3, 32768 bytes)
Buffer-cache hash table entries: 16384 (order: 4, 65536 bytes)
Page-cache hash table entries: 65536 (order: 6, 262144 bytes)
POSIX conformance testing by UNIFIX
PCI: Probing PCI hardware
Unknown bridge resource 2: assuming transparent
PCI: moved device 00:01.0 resource 2 (101) to 1400
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
Journalled Block Device driver loaded
SGI XFS with no debug enabled
i2c-core.o: i2c core module version 2.6.3 (20020322)
i2c-dev.o: i2c /dev entries driver module version 2.6.3 (20020322)
i2c-proc.o version 2.6.3 (20020322)
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0xfdfea300 (irq = 1) is a 16550A
ttyS01 at 0xfdfe9200 (irq = 0) is a 16550A
PPC 405 watchdog driver v0.5. (Timer driven)
IBM gpio driver version 07.25.02
GPIO #0 at 0xd1000700
SWBD Platform Driver v1.0: [type 44, rev 1].
Config Silkworm 
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
read cpld_data 0x81
Silkworm CE-2 CPLD ATA interface configured [CPLD version 1]
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
hda: SMART CF, ATA DISK drive
ide0 at 0xd30151f0-0xd30151f7,0xd30153f6 on irq 28
hda: 1022112 sectors (523 MB) w/1KiB Cache, CHS=1014/16/63
Partition check:
 hda: hda1 hda2
RAMDISK driver initialized: 16 RAM disks of 50000K size 1024 blocksize
loop: loaded (max 8 devices)
eth0: ZMII bridge in RMII mode
eth0: Phy @ 0x1, type BCM5221 (0x004061e4)
eth0: IBM OCP 10/100 Mbps ethernet: EMAC0, MAC 00:05:1e:03:d3:39
eth0: Tx/Rx Interrupt mitigation (1500 pps)
eth1: ZMII bridge in RMII mode
eth1: Read error on PHY 0x02, register 2
eth1: PHY 0x02 not found
ATA polled-mode panic dumper on char-major-252.
silkworm: Using SWBD34 flash configuration
Creating 6 MTD partitions on "Boot flash":
0x00000000-0x00020000 : "bootenv0: boot environment (1)"
0x00020000-0x00040000 : "bootenv1: boot environment (2)"
0x00040000-0x00200000 : "prom0: boot prom (1)"
0x00200000-0x003c0000 : "prom1: boot prom (2)"
0x003c0000-0x003e0000 : "unused"
0x003e0000-0x00400000 : "bootsel: boot prom selector"
mtdchar: write-caching enabled
iic0: IBM on-chip iic adapter module 2003.15.08
iic1: IBM on-chip iic adapter module 2003.15.08
cpld_write_register: Cannot access CPLD. Initialization needed
cpld_write_register: Cannot access CPLD. Initialization needed
iic1: Registered I2C mux callback for SWBD44
M41T11 Real-time-clock Driver v1.1
m41t11: Called to probe for bus IIC-0
m41t11: I2C Real-Time-Clock detected on iic0 addr 0x68
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 2048 buckets, 16Kbytes
TCP: Hash tables configured (established 16384 bind 16384)
Linux IP multicast router 0.06 plus PIM-SM
ip_tables: (C) 2000-2002 Netfilter core team
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: ext2 filesystem found at block 0
RAMDISK: Loading 2603 blocks [1 disk] into ram disk... done.
Freeing initrd memory: 2603k freed
VFS: Mounted root (ext2 filesystem).
Attempting to find a root file s hda:ystem on hda2... hda1
 hda2
 hda: hda1 hda2
kjournald starting.  Commit interval 5 seconds
EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on ide0(3,2), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
 hda: hda1 hda2
 hda: hda1 hda2
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Trying to move old root to /initrd ... okay
Freeing unused kernel memory: 76k init
INIT: version 2.78 booting
EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on ide0(3,2), internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on ide0(3,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
INIT: Entering runlevel: 3
eth0: Link status change: Link Down.
eth0: Link status change: Link Up. 100 Mbps Full duplex Auto (autonegotiation complete).

INITCP: MB CPLD Vers: 0x81 DC CPLD Vers: 0x82 Image ID: 0x1f
/bin/mknod: wrong number of arguments
Try `/bin/mknod --help' for more information.
/bin/mknod: wrong number of arguments
Try `/bin/mknod --help' for more information.
/bin/mknod: wrong number of arguments
Try `/bin/mknod --help' for mor

 

After that, I swaped hda2 with hda1 in the OSRootPartition (OSRootPartition=hda1;hda2)

and the switch boots now normaly.

The system is coming up, please wait...
Read board ID of 0x88 from addr 0x23
Read extended model ID of 0x1d from addr 0x22
Matched board/model ID to platform index 8

Read board ID of 0x88 from addr 0x23
Read extended model ID of 0x1d from addr 0x22
Matched board/model ID to platform index 11
Checking system RAM - press any key to stop test

Checking memory address: 00100000

System RAM test using Default POST RAM Test succeeded.

Press escape within 4 seconds to enter boot interface.

1) Start system.
2) Recover password.
3) Enter command shell.

Option? 3

Boot PROM password has not been set.
> setenv OSRootPartition=hda1;hda2
> saveenv
> boot ATA()0x91950
Booting "Manually selected OS" image.
Entry point at 0x01000000 ...

Linux/PPC load: 
Uncompressing Linux...done.
Now booting the kernel
Linux version 2.4.19 (swrel@sith) (gcc version 2.95.3 20010112 (prerelease)) -n #1 Tue Oct 3 21:06:25 PDT 2006
cpld_init: platform (44) not supported
Brocade Silkworm port (C) 2002 MontaVista Software, Inc. (source@mvista.com)
On node 0 totalpages: 65536
zone(0): 4096 pages.
zone (0): min(32), low(160), high (256)
zone(1): 61440 pages.
zone (1): min(480), low(2400), high (3840)
zone(2): 0 pages.
Kernel command line: 
Set up jiffies counter to wrap in 0 seconds.
Calibrating FIT timer... running at 3058 Hz. [TSR_FP=1]
Calibrating delay loop... 599.65 BogoMIPS
Memory: 253236k available (1784k kernel code, 1100k data, 76k init, 0k highmem)
Dentry cache hash table entries: 32768 (order: 6, 262144 bytes)
Inode cache hash table entries: 16384 (order: 5, 131072 bytes)
Mount-cache hash table entries: 4096 (order: 3, 32768 bytes)
Buffer-cache hash table entries: 16384 (order: 4, 65536 bytes)
Page-cache hash table entries: 65536 (order: 6, 262144 bytes)
POSIX conformance testing by UNIFIX
PCI: Probing PCI hardware
Unknown bridge resource 2: assuming transparent
PCI: moved device 00:01.0 resource 2 (101) to 1400
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
Journalled Block Device driver loaded
SGI XFS with no debug enabled
i2c-core.o: i2c core module version 2.6.3 (20020322)
i2c-dev.o: i2c /dev entries driver module version 2.6.3 (20020322)
i2c-proc.o version 2.6.3 (20020322)
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0xfdfea300 (irq = 1) is a 16550A
ttyS01 at 0xfdfe9200 (irq = 0) is a 16550A
PPC 405 watchdog driver v0.5. (Timer driven)
IBM gpio driver version 07.25.02
GPIO #0 at 0xd1000700
SWBD Platform Driver v1.0: [type 44, rev 1].
Config Silkworm 
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
read cpld_data 0x81
Silkworm CE-2 CPLD ATA interface configured [CPLD version 1]
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
cpld_ide_init_hwif_ports: dp=0, cp=0, hw=c0365f08
hda: SMART CF, ATA DISK drive
ide0 at 0xd30151f0-0xd30151f7,0xd30153f6 on irq 28
hda: 1022112 sectors (523 MB) w/1KiB Cache, CHS=1014/16/63
Partition check:
 hda: hda1 hda2
RAMDISK driver initialized: 16 RAM disks of 50000K size 1024 blocksize
loop: loaded (max 8 devices)
eth0: ZMII bridge in RMII mode
eth0: Phy @ 0x1, type BCM5221 (0x004061e4)
eth0: IBM OCP 10/100 Mbps ethernet: EMAC0, MAC 00:05:1e:03:d3:39
eth0: Tx/Rx Interrupt mitigation (1500 pps)
eth1: ZMII bridge in RMII mode
eth1: Read error on PHY 0x02, register 2
eth1: PHY 0x02 not found
ATA polled-mode panic dumper on char-major-252.
silkworm: Using SWBD34 flash configuration
Creating 6 MTD partitions on "Boot flash":
0x00000000-0x00020000 : "bootenv0: boot environment (1)"
0x00020000-0x00040000 : "bootenv1: boot environment (2)"
0x00040000-0x00200000 : "prom0: boot prom (1)"
0x00200000-0x003c0000 : "prom1: boot prom (2)"
0x003c0000-0x003e0000 : "unused"
0x003e0000-0x00400000 : "bootsel: boot prom selector"
mtdchar: write-caching enabled
iic0: IBM on-chip iic adapter module 2003.15.08
iic1: IBM on-chip iic adapter module 2003.15.08
cpld_write_register: Cannot access CPLD. Initialization needed
cpld_write_register: Cannot access CPLD. Initialization needed
iic1: Registered I2C mux callback for SWBD44
M41T11 Real-time-clock Driver v1.1
m41t11: Called to probe for bus IIC-0
m41t11: I2C Real-Time-Clock detected on iic0 addr 0x68
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 2048 buckets, 16Kbytes
TCP: Hash tables configured (established 16384 bind 16384)
Linux IP multicast router 0.06 plus PIM-SM
ip_tables: (C) 2000-2002 Netfilter core team
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: ext2 filesystem found at block 0
RAMDISK: Loading 2603 blocks [1 disk] into ram disk... done.
Freeing initrd memory: 2603k freed
VFS: Mounted root (ext2 filesystem).
Attempting to find a root file s hda:ystem on hda1... hda1
 hda2
 hda: hda1 hda2
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on ide0(3,1), internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
 hda: hda1 hda2
 hda: hda1 hda2
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Trying to move old root to /initrd ... okay
Freeing unused kernel memory: 76k init
INIT: version 2.78 booting
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on ide0(3,1), internal journal
kjournald starting.  Commit interval 5 seconds
EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on ide0(3,2), internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
INIT: Entering runlevel: 3
eth0: Link status change: Link Down.
eth0: Link status change: Link Up. 100 Mbps Full duplex Auto (autonegotiation complete).

INITCP: MB CPLD Vers: 0x81 DC CPLD Vers: 0x82 Image ID: 0x1f


Fabric OS (brocsw1b)


brocsw1b console login: uptime: 2424; sysc_qid: 0
2014/12/30-13:50:49, [HAM-1004], 118,, INFO, SilkWorm4900, Processor rebooted - Reboot

SNMP Research SNMP Agent Resident Module Version 15.3.1.4 
Copyright 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001 SNMP Research, Inc.
sysctrld: all services Standby
Services starting a COLD recovery
2014/12/30-13:51:02, [ZONE-1022], 119,, INFO, brocsw1b, The effective configuration has changed to fb1swa.  
sec0: Security is initializing........
sysctrld: all services Active
Starting LBIST on slot 0, id 27...

Completed xsvf execution successfully.
Lbist completed in 0.062782 seconds on slot 0. 
POST1: Started running Tue Dec 30 13:51:07 GMT-3 2014
POST1: Test #1 - Running turboramtest
POST1: Script PASSED with exit status of 0 Tue Dec 30 13:51:10 GMT-3 2014 took (0:0:3)
POST2: Started running Tue Dec 30 13:51:11 GMT-3 2014
POST2: Test #1 - Running portloopbacktest (SERDES)
POST2: Test #2 - Running minicycle (SERDES)
POST2: Test #3 - Running minicycle (BI LINKS FE_BI->CORE_BI)
POST2: Test #4 - Running minicycle (BI LINKS CORE_BI->FE_BI)
POST2: Running diagshow
POST2: Script PASSED with exit status of 0 Tue Dec 30 13:53:06 GMT-3 2014 took (0:1:55)
2014/12/30-13:53:07, [BL-1000], 120,, INFO, brocsw1b, Initializing Ports...
2014/12/30-13:53:08, [BL-1001], 121,, INFO, brocEnabling switch...
sw1b, Port Initialization Completed.

 

My question is how can I fix the hda2?

Is there a way to copy the good root from hda1 to hda2?

Will linux tools (dd for instance) work?

 

Thanks in advance.

Occasional Contributor
Posts: 8
Registered: ‎12-27-2014

Re: SilkWorm 4900 won't boot from hda2 but will boot from hda1

[ Edited ]

Hi Christos,

 

try the command " firmwarecommit " It syncs both partitions , BUT: save all config and licenses... if the CF-Card inside the chassis is having problems, it could be the worst case that the second partition will fail...

 

Kind Regards,

 

meisterdausi

aka Jochen

Occasional Contributor
Posts: 6
Registered: ‎12-30-2014

Re: SilkWorm 4900 won't boot from hda2 but will boot from hda1

[ Edited ]

Thanks Jochen!

That 's exactly what I needed.

 

However, I have another ...question now: 

The file system on hda2 needs fsck. I searched (as root) but the command what nowhere to be found!

 

Can you help me with this as well?

 

Cheers.

 

 

 

Occasional Contributor
Posts: 8
Registered: ‎12-27-2014

Re: SilkWorm 4900 won't boot from hda2 but will boot from hda1

Hi Christos,

 

i think fsck is only available via command shell. So you need to reboot the switch, hit escape when trhe dialog appears.

 

Then Option 3 for Command shell, and then you can mount hda2 and run a fsck .

 

Is the commit telling you, that the filesystem has issues?

 

Regards,

 

Jochen

Occasional Contributor
Posts: 8
Registered: ‎12-27-2014

Re: SilkWorm 4900 won't boot from hda2 but will boot from hda1

Hi Christos,

 

i started my 4900 (IBM 2005-B64) now, and i see, you don´t have to reboot/restart the box!

 

Just run fsck /dev/hda2 on the filesystem , but it sould´nt be in use for this...

The Switch here is telling me that this could cause severe filesystem damage! So use on your own risk :-D

 

Greetings,

 

Jochen

Occasional Contributor
Posts: 6
Registered: ‎12-30-2014

Re: SilkWorm 4900 won't boot from hda2 but will boot from hda1

It's not wise to fsck a mounted filesystem. You'll most likely corrupt it.

In my case however, I 've booted from hda1, so I can unount the hda2 at my will.

Alas, there's no fsck command (the binary) in the system.

 

brocsw1b:root> df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/root             241M  123M  106M  54% /
/dev/hda2             241M  122M  107M  53% /mnt
brocsw1b:root> umount /dev/hda2
brocsw1b:root> df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/root             241M  123M  106M  54% /
brocsw1b:root> mount /dev/hda2 /mnt
brocsw1b:root> df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/root             241M  123M  106M  54% /
/dev/hda2             241M  122M  107M  53% /mnt
brocsw1b:root> find / -name "fsck"
brocsw1b:root> find / -name "*fsck"
brocsw1b:root> find / -name "*fsck*"

 Maybe FOS 5.2.0a lacks the fsck command?

brocsw1b:admin> firmwareshow
Primary version: 	v5.2.0a
Secondary version: 	v5.2.0a

 

 

 

Occasional Contributor
Posts: 6
Registered: ‎12-30-2014

Re: SilkWorm 4900 won't boot from hda2 but will boot from hda1

Unfortunatelly, firmwarecomit didn't help.

Still the switch refuses to boot from hda2 :(

 

Occasional Contributor
Posts: 8
Registered: ‎12-27-2014

Re: SilkWorm 4900 won't boot from hda2 but will boot from hda1

Hmmm...

 

My Switch is now running v6.0.0 so it´s possible that the command is not implemented...

 

Does your whole Fabric run on v5.x? Maybe it´s possible to do a firmwareupdate and get the commands to repair the filesystem?

 

Best regards,

 

Jochen

Occasional Contributor
Posts: 6
Registered: ‎12-30-2014

Re: SilkWorm 4900 won't boot from hda2 but will boot from hda1

Well Jochen, my whole fabric runs on 5.2.0a and 5.1.0a, so no luck there.

 

However, since it's New Year's eve, and nothing is really happening in terms of workload, I thought nothing better to do than FOS troubleshooting :)

 

It turned out that the problem was the /usr/bin/cut binary, which gets called from within the /etc/rc.d/init.d/fabos script at various points, in order to create the /dev entries eg: rmajor=`grep raslog /proc/devices | cut -d ' ' -f 1`

This $rmajor variable was then use to create the /dev/raslog: /bin/mknod -m 666 /dev/raslog c $rmajor 0

But because cut was corrupted it core dumped and the whole init script was failing....

 

What I did was to rename the /usr/bin/cut to /usr/bin/cut_CORRUPTED and copy the binary from the other partition, the one that boots ok. Voila!

 

My switch now boots from both partitions, altough the second one is surely not it it's best shape, but the redundancy is there.

 

I hope this post will be of benefit to some fellow SAN admin.

 

BTW: What type/brand is the CF inside these things?

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.

vADC is now Pulse Secure
Download FREE NVMe eBook