Fibre Channel (SAN)

Reply
Occasional Contributor
Posts: 6
Registered: ‎08-24-2016
Accepted Solution

Problem understanding NTP with logical switches

Hello,

 

I read the SAN admin guide, command reference guide, some posts on the forum, but have difficulties to understand how to configure NTP on logical switches.

 

 

This is my SAN architecture.

 

Two identical SAN networks which have no connection between them. Each of them are composed of:

- 2 DCX, the first on site 1, the second on site 2 (an ISL trunk between them)

- a dozen of SAN switches connected on the first DCX on site 1

- a dozen of SAN switches connected on the second DCX on site 2

 

On each switch, and DCX too, there is the default logical switch (context 128) and another logical switch (context 10).

 

It will evolve, but now, all ports are on the logical switch 10 (FID10 = context 10).

 

So each swith in FID128 consider itself alone in the fabric and consequently, fabric principal.

Do I have to configure tsclockserver on my NTP server IP (and NOT LOCL) on all switches in FID128? And tstimezone?

 

In FID10, I am not sure of the better solution:

- only the actual principal switch with tsclockserver on the NTP server IP and the others on LOCL (what happen if this principal switch is down?). tstimezone on witch switch?

- only the DCX (two on each site) with 0x01 priority, do I have to enter the NTP server IP address on both? And the other switches with 0xFF (never principal switch) with tsclockserver LOCL. Where do I have to configure tstimezone? On all switches, or only those where the NTP server IP address has been configured?

- other solution?

 

I am not sure of what is the best practice.

 

Thanks for your answer.

Brocade Moderator
Posts: 36
Registered: ‎03-29-2010

Re: Problem understanding NTP with logical switches

Short answer: Yes, you do.

 

Longer answer can be found in the Admin guide, page 27:

 

All switches in the fabric maintain the current clock server value in nonvolatile memory. By default,
this value is the local clock server (LOCL) of the principal or primary FCS switch. Changes to the
clock server value on the principal or primary FCS switch are propagated to all switches in the
fabric.
In a Virtual Fabric, all the switches in the fabric must have the same NTP clock server configured.
This includes any Fabric OS v6.2.0 or earlier switches in the fabric. This ensures that time does not
go out of sync in the logical fabric. It is not recommended to have LOCL in the server list.
When a new switch enters the fabric, the time server daemon of the principal or primary FCS switch
sends out the addresses of all existing clock servers and the time to the new switch. When a switch
with Fabric OS v6.1.0 or later enters the fabric, it stores the list and the active servers.

NOTE
In a Virtual Fabric, multiple logical switches can share a single chassis. Therefore, the NTP server
list must be the same across all fabrics.

 

You have Virtual Fabric(logical switch FID 10) configured. Ergo, you must populate the tsclockserver command in each of your logical switches by setcontext command to that logical switch and run the command with the EXACT SAME IP or DNS name entries as the fabric principal or FCS switch.

 

Example: tsclockserver "chronos.cru.fr; canon.inria.fr; 192.93.2.20"

 

Note the literal quotes, and note the semicolon separators. Do this for ALL switches in the fabric. Yes, you must set your time zone too!

 

Best of luck,

doc

Any and all information provided by me is for entertainment value and should not be relied upon as a guaranteed solution or warranty of mechantability. All systems and all networks are different and unique. If you have a concern about data loss, or network disconnection, please open a TAC service request for service through Brocade, or through your OEM equipment provider. If this provided you with a solution to this issue, Please mark it with the button at the bottom "Accept as solution".

Occasional Contributor
Posts: 6
Registered: ‎08-24-2016

Re: Problem understanding NTP with logical switches

Hi doc, Smiley Happy

 

Thank you for your answer.

 

I read everything about date, tsclockserver, NTP and principal switch in the admin guide, also what you write (page 74 of FOS admin guide 7.2.0).

 

In date:

In a Virtual Fabric, there can be a maximum of eight logical switches per Backbone. Only the
default switch in the chassis can update the hardware clock.

 

And in other places some explanations not clear about how NTP is managed in the Brocade SAN switches.

 

 

To resume (I know well how work the tsclockserver command):

 

- the addresses of my NTP servers in all context (default and 10) of all switches, so NO LOCL

 

- the tstimezone in all context of all switches

PROBLEM: when I enter this value, it displays "System Time Zone change will take effect at next reboot", but I must avoid interruption of service, so I can't reboot... And I do not understand why this require a reboot? So it will never be applied?

 

Another problem, a few switches of the same FC networks, in a particular perimeter, can't have access to the NTP server, and it will not change due to security reasons. What about their date and time? Will it be updated through the fabric with LOCL parameter? And I suppose I must also configure the same tstimezone.

 

Thanks again for your detailed explanation, really appreciated. Smiley Happy

 

Regards,

Ludo

Brocade Moderator
Posts: 121
Registered: ‎03-29-2011

Re: Problem understanding NTP with logical switches

Hi,

 

you can ignore the message about the next reboot - the change will applied without reboot. Notice that tstimezone is switch property compared to tsclockserver (which updated through the fabric when the command is input).

 

The FC switch at the perimeter of your nextwork without network access to the NTP server will be updated with the correct time.

The way time is updated in the switches, is that the current principal switch (or FCS switch) in each fabric will query the configured

ntp servers in sequence every 64 seconds. And then the principal switch will distribute the time in band in the fabric (common transport FC-CT) to all other switches in the fabric.  If the principal switch is the defailt switch, it will update the drift file / RTC as well.

So, ensure that your configured principal switches (you have configured one of the central / new switches as 'backup' principal switch, too) are able to reach the list of ntp servers.

 

The tsclockserver is distributed as doc mentionned to all switches in a fabric when the command is executed and is needed if the principal switches goes down or the fabric is segment, then the new principal switch can continue distribute the current time inband in the fabric.


Any and all information provided by me is not reviewed, approved or endorsed by Brocade and is provided solely as a convenience for Brocade customers. All systems and all networks are different and unique. If you have a service affecting network problem, please open a TAC service request for service through Brocade, or through your OEM equipment provider. If this provided you with a solution to this issue, please mark it with the button at the bottom "Accept as solution"
Occasional Contributor
Posts: 6
Registered: ‎08-24-2016

Re: Problem understanding NTP with logical switches

Hi Martin,

 

Thank you for your also detailed answer.


Martin.Sjölin wrote:
you can ignore the message about the next reboot - the change will applied without reboot. Notice that tstimezone is switch property compared to tsclockserver (which updated through the fabric when the command is input).

Very clear, thank you.


Martin.Sjölin wrote:
The FC switch at the perimeter of your nextwork without network access to the NTP server will be updated with the correct time.

Crystal clear, thanks again. I will configure these switches with tsclockserver LOCL and the correct tstimezone.


Martin.Sjölin wrote:
The way time is updated in the switches, is that the current principal switch (or FCS switch) in each fabric will query the configured ntp servers in sequence every 64 seconds. And then the principal switch will distribute the time in band in the fabric (common transport FC-CT) to all other switches in the fabric.  If the principal switch is the defailt switch, it will update the drift file / RTC as well.

So, ensure that your configured principal switches (you have configured one of the central / new switches as 'backup' principal switch, too) are able to reach the list of ntp servers.


It's what I understood at first, thank you for the explanation.

If I understand well your message, and it is what I thought before, only some switches (at least two, for a principal and a backup), in EACH network, and EACH fabric/context (default logical switch 128 and logical switch 10 here) must be configured with the tsclockserver on a valid and reachable NTP server. The others in LOCL.


doc wrote:
You have Virtual Fabric(logical switch FID 10) configured. Ergo, you must populate the tsclockserver command in each of your logical switches by setcontext command to that logical switch and run the command with the EXACT SAME IP or DNS name entries as the fabric principal or FCS switch.

 

Example: tsclockserver "chronos.cru.fr; canon.inria.fr; 192.93.2.20"


I think I misunderstood doc's message (I am not fluent in english), because I thought every switches must be configured with the IP addresses of the NTP servers.


Martin.Sjölin wrote:

The tsclockserver is distributed as doc mentionned to all switches in a fabric when the command is executed and is needed if the principal switches goes down or the fabric is segment, then the new principal switch can continue distribute the current time inband in the fabric.


So is it a correct solution to give the highest priority to my "master/backup" switches in each network, in each logical switch/context (128 and 10), with the 0x01 value (and with the NTP server addresses), and 0xFF to the others (and with LOCL)?

 

Thank you again.

 

Regards,

Ludo

 

Brocade Moderator
Posts: 121
Registered: ‎03-29-2011

Re: Problem understanding NTP with logical switches

Hi Ludo,

 

normally we configured all switch with the ntp server(s) via tsclockserver CLI. In fact entering the tsclockserver command will distribute the list of NTP servers to all switch the fabric. And in case of segementation or other issues in the  FC network, all switches have a possible clock source. So, please ensure that tsclockservers is set to the NTP list on all switches, even though the ntp server are only queried from a principal switch, or not even reach from the perimeter switches

 

Quote from the CMD guide:

 

"All switches in the fabric maintain the current clock server IP address in nonvolatile memory. By default, this value is LOCL., that is, the local clock of the Principal or the Primary FCS switch is the default clock server. Changes to the clock server IP addresses on the Principal or Primary FCS switch are propagated to all switches in the fabric. "

 

regards


Any and all information provided by me is not reviewed, approved or endorsed by Brocade and is provided solely as a convenience for Brocade customers. All systems and all networks are different and unique. If you have a service affecting network problem, please open a TAC service request for service through Brocade, or through your OEM equipment provider. If this provided you with a solution to this issue, please mark it with the button at the bottom "Accept as solution"
Valued Contributor
Posts: 521
Registered: ‎03-20-2011

Re: Problem understanding NTP with logical switches

I don't see why you would not want to setup correct tsclockserver and tstimezone in all the switches (I mean all physical and all logical as well) in your fabric(s). It is not so difficult especially if you can use some ssh scripting. But this way you will make sure that all the clocks are in sync, and this may save you a lot of time when troubleshooting.

 

I have seen so many weird SAN issues in my life, and some of them even involved the abandoned default 128 FID having no SAN connections at all. So yes, it is essential that the default unused FID also has correct clock, also because it updates the hardware clock of the chassis.

 

If your LAN isolated switches will become SAN segmented as well, and therefore loose the SAN distributed clock from principal switches, they will only show a message like "NTP server is not reachable, using LOCL", but will stay with the good clock values until the SAN merges back, I hope that hardware clock is not too much wrong in the modern world devices.

 

A note about reboot after setting the tstimezone. As you know, FOS is based on Linux. Therefore TZ variable is set in the environment of all the processes. When you change the system default TZ, there is no way to change the TZ of the already running processes (unless if some of them can handle some customized signal that makes them reread the configuration or something like that, but I'm not sure that this is implemented in any of the FOS processes/daemons). So, the only way to switch to the new TZ value is to restart the process. But most of the FOS processes are not restartable. You might have seen the cases where a sudden death of a process is handled by the restart of the entire CP. Therefore the restart of the CP is essentially required to completely switch to the new TZ setting after use of tstimezone command. BUT! You don't have to restart the entire switch to restart the CP. In the DCX-like switches, you can do a dual hafailover to make sure that both CPs are restarted with new TZ. In the smaller switches, you can do hareboot with the same effect.

Brocade Moderator
Posts: 121
Registered: ‎03-29-2011

Re: Problem understanding NTP with logical switches

Hi,

 

I agree with alexey.stepanov that it is better to NTP server on all switches to ensure that time is synchronized for troubleshooting. And the tslockserver command only need to be run once in each fabric (ntp list distributed all other switch at that time) but tstimezone on all of them. Notice the following from the help for tstimezone

 

     Time zone is used in computing local time for error  report-
     ing  and  logging.  An  incorrect  time  zone setup does not
     affect the switch operation in any way.

 

     System services started during the  switch  boot  reflect  a
     time zone change only at the next reboot.

 

For the user, the change of the timezone is effectvely immediately

 

SW6510_1:FID128:admin> date
Thu Aug 25 12:50:59 Localtime 2016

 

SW6510_1:FID128:admin> firmwareshow
Appl     Primary/Secondary Versions
------------------------------------------
FOS      v7.4.1c
         v7.4.1c

 

SW6510_1:FID128:admin> tstimezone
Time Zone Hour Offset: 1
Time Zone Minute Offset: 0

SW6510_1:FID128:admin> fabriclog -s
Time Stamp      Input and *Action                           S, P   Sn,Pn  Port  Xid
===================================================================================
Switch 0; Thu Aug 18 13:51:23 2016 GMT-1 (GMT+1:00)
13:51:23.372464 *Fss Init                                   NA,NA  NA,NA  NA    NA
13:51:23.374389 *Initiate State (max_port=200)              NA,NA  F2,NA  NA    NA
13:51:23.441162 Expd1 0x00000000 00000000 0000ffff ffffffff F2,NA  F2,NA  0     NA
13:51:32.797486 Rcv FSS_RECOV_COLD                          F2,NA  F2,NA  NA    NA
13:51:32.797929 D-port Offline Skip Cnt 1(inst = 1)         F2,NA  F2,NA  NA    NA

 

SW6510_1:FID128:admin> tstimezone 2,0
System Time Zone change will take effect at next reboot

 

SW6510_1:FID128:admin> fabriclog -s
Time Stamp      Input and *Action                           S, P   Sn,Pn  Port  Xid
===================================================================================
Switch 0; Thu Aug 18 14:51:23 2016 GMT-2 (GMT+2:00)
14:51:23.352050 *Fss Init                                   NA,NA  NA,NA  NA    NA
14:51:23.353974 *Initiate State (max_port=200)              NA,NA  F2,NA  NA    NA
14:51:23.420748 Expd1 0x00000000 00000000 0000ffff ffffffff F2,NA  F2,NA  0     NA
14:51:32.777072 Rcv FSS_RECOV_COLD                          F2,NA  F2,NA  NA    NA
14:51:32.777515 D-port Offline Skip Cnt 1(inst = 1)         F2,NA  F2,NA  NA    NA

 

Interesting, when checking the running process (/proc/<pid>/environ), only webtools / apache in had TZ set in their environment on this 7.4.1c switch.


Any and all information provided by me is not reviewed, approved or endorsed by Brocade and is provided solely as a convenience for Brocade customers. All systems and all networks are different and unique. If you have a service affecting network problem, please open a TAC service request for service through Brocade, or through your OEM equipment provider. If this provided you with a solution to this issue, please mark it with the button at the bottom "Accept as solution"
Occasional Contributor
Posts: 6
Registered: ‎08-24-2016

Re: Problem understanding NTP with logical switches

Hi Martin,


Martin.Sjölin wrote:

normally we configured all switch with the ntp server(s) via tsclockserver CLI. In fact entering the tsclockserver command will distribute the list of NTP servers to all switch the fabric. And in case of segementation or other issues in the  FC network, all switches have a possible clock source. So, please ensure that tsclockservers is set to the NTP list on all switches, even though the ntp server are only queried from a principal switch, or not even reach from the perimeter switches

 

Quote from the CMD guide:

 

"All switches in the fabric maintain the current clock server IP address in nonvolatile memory. By default, this value is LOCL., that is, the local clock of the Principal or the Primary FCS switch is the default clock server. Changes to the clock server IP addresses on the Principal or Primary FCS switch are propagated to all switches in the fabric. "

 

regards


Ok, thank you. Finally, only tsclockserver with my NTP servers IP addresses and tstimezone on all switches, all logical switches, even when some switches can't reach it.

Clear and simple, good! Smiley Happy

 

 

Hi Alexey,


alexey.stepanov wrote:

I don't see why you would not want to setup correct tsclockserver and tstimezone in all the switches (I mean all physical and all logical as well) in your fabric(s). It is not so difficult especially if you can use some ssh scripting. But this way you will make sure that all the clocks are in sync, and this may save you a lot of time when troubleshooting.


I never said I would not want to setup tsclockserver and tstimezone on all switches. I am used to the FOS CLI and SSH.

I said the documentation is not clear about the tsclockserver configuration.

 

It says a switch (principal) update date and time with the NTP server, but not how the other switches must be configured.

 

Logically (for me), a switch with an NTP server IP address will update date and time itself, and with the principal switch in the fabric when configured otherwise (LOCL). But it appears it is not working like that.

Likewise, it is logical for me to not configure an NTP server IP address on an equipment who can't reach this server. But again, it appears it is not working like that.

And I need to understand these points to know how to configure priorities to limit IP addresses who need an access to the NTP servers.

 

So my questions here are not about what I want or what I don't want to do, but how it works to configure every equipment and all logical switches properly.


alexey.stepanov wrote:

I have seen so many weird SAN issues in my life, and some of them even involved the abandoned default 128 FID having no SAN connections at all. So yes, it is essential that the default unused FID also has correct clock, also because it updates the hardware clock of the chassis.

 

If your LAN isolated switches will become SAN segmented as well, and therefore loose the SAN distributed clock from principal switches, they will only show a message like "NTP server is not reachable, using LOCL", but will stay with the good clock values until the SAN merges back, I hope that hardware clock is not too much wrong in the modern world devices.

 

A note about reboot after setting the tstimezone. As you know, FOS is based on Linux. Therefore TZ variable is set in the environment of all the processes. When you change the system default TZ, there is no way to change the TZ of the already running processes (unless if some of them can handle some customized signal that makes them reread the configuration or something like that, but I'm not sure that this is implemented in any of the FOS processes/daemons). So, the only way to switch to the new TZ value is to restart the process. But most of the FOS processes are not restartable. You might have seen the cases where a sudden death of a process is handled by the restart of the entire CP. Therefore the restart of the CP is essentially required to completely switch to the new TZ setting after use of tstimezone command. BUT! You don't have to restart the entire switch to restart the CP. In the DCX-like switches, you can do a dual hafailover to make sure that both CPs are restarted with new TZ. In the smaller switches, you can do hareboot with the same effect.


Thank you very much for all these really interesting and useful explanations!

 

I agree with you with the correct clock on the default FID, it is also why I am here trying to understand how it works on Brocade switches. I want to minimize the risk of "weird issues"...

 

Yes, I have a lot of these messages "NTP server is not reachable, using LOCL" on the LAN isolated switches. It is why I would like to use LOCL, to get rid of them on logs. Knowing they will never have access to these NTP server addresses (if someday they have access on a NTP server, it will be another one), what benefit to configure an unreachable NTP server address in place of LOCL? Does LOCL not allow the switch to synchronize date and time with the principal switch on fiber channel?

 

Thanks again with your tstimezone explanation. I learn really useful information again. I know a little about CP behaviour on DCX, but was completely unaware about the hareboot.

 

Regards.

Occasional Contributor
Posts: 6
Registered: ‎08-24-2016

Re: Problem understanding NTP with logical switches

Thank you Martin.

 

I think I missed something you and other contributors said (and in the documentation). The tsclockserver command is fabric propragated, so if I configure LOCL or an NTP server IP address, it will be the same configuration on all switches in the fabric.

 

It is my mistake, I understand better your messages.

 

Thank you again.

 

Regards.

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.