05-12-2016 11:59 AM
OK to carbon date myself this question actually started when we were a McData shop. To preference my question I am not talking about pushing a port long distance I fully understand that we need to adjust buffer credits based on the distance, link speed and frame size.
Way back when we had McData directors they have enough intelengence to automatcally set the buffer credits 64 on an F port if a storge device was detected. The explaination was to support a fan-in ratio of 8:1. So we could have 8 host connections with the buffer credit set to 8 all talking to a single storge port with the buffer credits set to 64. The thought was the storage port should never run out of credits and the data flow would not be interrupted from the storge side. This appeared to work well never had any indications of buffer credit zero problems or congestion on storage ports.
Ok, time moves on and we replaced our McData hardware with Brocade DCX's which by default set all the F ports to 8. We continued with the success of the past and we bump up the buffer credits to 64 on all of our storage ports and again this appears to be working well. Doing some testing to validate our thinking we compared the buffer credit zero numbers for the same storge port with the buffer credits set to both the default value of 8 and the elevated value of 64. The higher value dramaticaly reduced the number of buffer credit zero counts which one would assume to mean the data is flowing much better.
Using the portbuffershow command in the CLI we can clearly see that the storage ports are taking advanage of the elevated buffers on the storage ports peaking out around 40 buffers in use. Repeating the portbuffershow command it clearly shows that the numbers of buffers actually being used is dynamic and changes based on the I/O load. Again trying to keep the fan-in ratio around 8:1 things appeared to be working very well buffer credit zero counts were greatly reduced.
The only purpose of buffer credits is flow control. Flow control applies fiber channel connection regardless of the distance, across town or across the computer room. The less we interrupt the flow (buffer credit zero) the better the performance. And if the buffers are available why not use them.
Due to a company merger our SAN support team has grown and some of the new members of the team don't agree with the idea of increasing buffer credits on storage ports and they should be kept at the default value of 8. Their way of thinking is the only time one would adjust the buffer credits is for a port that is going a long distance. So with that in mind I would like to get the thoughts of other SAN administrators. Should we leave the buffer credits default as they were when the hardware came out of the box, or does it make sense to continue to used the "tuned" approach what we have seen work well in the past?
05-17-2016 06:55 AM
By default a Brocade SAN switch does indeed reserve 8 credits for each F-Port. However, devices can request any number of credits when they FLOGI and the switch will always comply with the request and assign the requested credits.
Per the Brocade Command Reference:
"Use this command to display the current long distance buffer information for the ports in a port group."
Meaning 'portbuffershow' is only intended for long distance ports.
To see the number of credits actually assigned to a non-long-distance port, look in the 'fe' type line with: portloginshow
admin> portloginshow 3
Type PID World Wide Name credit df_sz cos
fe 3a0300 02:9f:00:11:0d:00:00:00 16 2048 c scr=0x3
ff 3a0300 02:9f:00:11:0d:00:00:00 12 2048 c d_id=FFFFFC
This device requested 16 credits when it logged in and the switch granted them, but 'portbuffershow' will still display '8', because port 3 is not a long distance port.
Regards buffer credit zero counts, tim_txcrd_z:
The tim_txcrd_z counter can be somewhat misleading, as it does not measure if there was any impact associated with the out of buffer condition; only that the port was being well utilized and was out of free buffers at the moment in time when the poll came in. It has a polling cycle of every 2.5 micro seconds, (400k polls/sec), and will increment if there are no free buffers in the pool at the moment in time when the poll comes in. This is quite different than a port being out of buffers when it actually needs one to start transmitting a new frame.
Frames are transmitted serially out of a port and a port can only actively transmit the contents of one frame at a time. If a port is busy transmitting the contents of it's last buffer when the tim_txcrd_z poll comes in, then the counter will increment, because at that moment in time, there were no free buffers. However, provided a previously transmitted buffer gets freed up before the current frame transmission completes, then there is no impact and the fact the tim_txcrd_z counter had incremented at all is only saying the port is being well utilized.
At 8Gb/sec it takes 2.148 usecs to transmit a 2148 byte FC frame. (8,000,000,000 bits/sec Divided by 17184 bits/frame)
The txcrd_z poll comes in every 2.5 usecs independently of FC frame timing. Being as the poll clock and FC frame transmit times are not sync'd, polling the same frame transmission twice will occur frequently.
With 4Gb ports and their slower transmission rate, double polling becomes the norm and worsens with 2G, etc.
The tim_txcrd_z counter is useful, but it only tells part of the picture. If there were actual impact due to an out of buffer condition, we would see Frame Discards, Link Timeouts, and so on.
When you increase the buffer allocation on an F-Port and see the tim_txcrd_z counter incrementing less, it's likely because the attached device will only use the number of credits it requested during FLOGI. There might be more credits manually allotted on the switch's F-Port, but the device's N-Port does not know this, so will not use them. The device has it's own default buffer credit count and this does not change just because the switch side has more.
Additionally, at 8Gb/sec, one FC frame of 2148 Bytes will take 0.6444 km of cable length from starting bit to ending bit. Even with the default of 8 buffer credits, provided the cable length is less than 5.1 km, you would never have all 8 buffers outstanding since the switch cannot fit that many frames on the wire end-to-end all at the same time.
At 4Gb/sec, a frame takes 1.2888 km, so even fewer frames can be on the same length cable.
05-17-2016 01:46 PM - edited 05-17-2016 01:49 PM
Hi T** (I assume it's you),
we had some cases together over the years and I highly value your opinion (and generosity when I think about the home-brewed :-) ) and learned from you, but there are some statements in your post I can't agree to.
There is a difference between buffers and credits. Each port has buffers - quick memory for storing frames. 1 frame per buffer. For some (for example switch ports) it's possible to change their number by assigning more buffers from a pool of "memory slots" to that port.. For the most device ports it's not possible.
In the FLOGI the device port tells the switch its amount of buffers. The switch ASIC has a register for each port storing the number of buffers of the attached device port. That thing is called the credits. (or buffer credits or buffer-to-buffer-credits). It's just a number - or better - a counter starting with the amount of buffers the device port has in hardware.
In the FLOGI accept the switch port tells the device port its own amount of buffers. The device port then sets its credits counter for the switch port to that amount.
Just an example:
A storage device has 100 buffers per port. It will tell the switch it has 100 buffers. The switch will put 100 into the credit counter and you will see that in portloginshow (and portregshow). The switch is able to send 100 frames to the storage device without the necessity of an R_RDY coming back.
As a standard switch port, the FLOGI accept back to the storage device will contain the info that the switch has 8 buffers for the storage device's frames. These 8 buffers can be seen in portbuffershow. The device then knows it can send 8 frames to the switch without waiting for an R_RDY.
The buffers don't need to be the same.
So in your example the device requested 16 credits (or advertized its 16 buffers). The switch will of course utilize them if R_RDYs are not coming back before the credit counter reaches 0.
If you increase the buffers (!) of a switch port using portcfgfportbuffers, the switch port will have more "memory slots" assigned to itself. And this will surely be reflected in portbuffershow. Without the need of being a long-distance port.
This will increase the credits in the HBA of the device.
A switch can easily agree to any amount of buffers told by a device port, because for the switch port it's just a counter. It does not need resources on switch side. If someone would build an HBA with 500,000 buffers, the switch would assign 500,000 credits to the F-Port. Credits! Not Buffers!
Following your explanation the whole area of statistical mathematics would be nonsense. Yes there is a fixed sample interval and without using rounded numbers (like 8,000,000,000 bits/sec) it's even closer to 2,5µs for the transmission of a full-sized frame. On the other hand the average frame size is usually much smaller than 2k. A lot of smaller frames are sent as well. So, yes, it could be that the credits counter turned to 1 or 2 and to 0 again between two samples. But we speak about very big numbers here and so we can use the methods of statistic to approximate the time without credits. It's just not probable that you have bad luck all the time, every time.
In addition, tim_txcrd_z will only be increased if BOTH (!) conditions are true:
- the credit counter is 0
- there is at least 1 frame waiting to be sent out.
So for every tim_txcrd_z there actually was a frame waiting and might it be only for a very short time.
Given the fact that frame discards (after EHT) or even link timeouts (ED_TOV?) takes ages in the fibre channel world, you can of course have a performance problem even before you see any of those.
"When you increase the buffer allocation on an F-Port and see the tim_txcrd_z counter incrementing less, it's likely because the attached device will only use the number of credits it requested during FLOGI. There might be more credits manually allotted on the switch's F-Port, but the device's N-Port does not know this, so will not use them. The device has it's own default buffer credit count and this does not change just because the switch side has more."
In my eyes this is clearly wrong.
1) The device will send frames according to the buffer information (and therefore according to the credits) it got from the FLOGI accept.
2) If you give a switch's F-port more buffers, the device will know that from the FLOGI accept and it will use them.
3) Your first sentence "When..." doesn't even make sense. It basically says: If you change A and then X is better, it's likely because A didn't change. ??!?
I didn't calculate the length of a full frame in 8G, I assume your's is correct. But:
1) You would need twice the buffer credits to span the 5.1km, because the R_RDYs have to travel back to you the same distance while you still have credit to send frames.
2) Average frame size is usually much smaller and even a 150byte frame occupies a full frame buffer but is much shorter on the link.
3) There is additional time needed (to process and forward the frame) before the R_RDY is sent out for an incoming frame. For example if the switch wants to send out a frame itself, it will do that first, before caring about sending R_RDYs back. (That's the reason there is a tim_rdy_pri counter for situations when the R_RDYs wait a long time and had to be prioritized in order to relieve credit starvation on the other side).
Taking all this into consideration 8 buffers is okay for cable length in the low 3 digits. But for longer ones I would always recommend to use portcfgfportbuffers.
For the original post: It's an interesting question if too much buffers could be a problem. I could imagine that it's possible to create "imbalances" in the presence of slow drain devices - or bottlenecks due to sub-optimal fan-in/fan-out ratios. But I didn't think that to an end so far and would rather see the opinion of others here in the forum.
05-20-2016 12:52 AM - edited 05-20-2016 12:55 AM
I don't think you need much more buffers than your physical optical link can transfer. This is based on a link length AND the frame size. I think some people tend to underestimate the second part of the equation. A typical Oracle write is 8KB so it takes 4 full size frames, but in addition it also takes 3 small size frames with SCSI commands and responses. If I'm not mistaken, MS SQL block size is 4KB, so the ratio would be even worse. So yes, I'd say that "extra" buffers are always good to have to accommodate the smaller frames, but not because of "fan-in/fan-out" stuff.
07-07-2016 05:26 PM
Seb is right, Remember that credit notification (not negotiation) is dependant on flow and is uni-directional. Increasing the number of buffers on an F-port simply allows the attached device to SEND more frames without waiting for an R_RDY primitive, not the other way around. The Condor (1/2/3) and GoldenEye ASIC's never had "SEND-buffers" so all frames are stored on the ingress side of a switch.
If you have a lot of READ traffic in a bursty pattern increasing the number of buffercredits may reduce some latency as you have some "wiggle-room" between 8 and 64 in your case. Thats the reason why the McData's in the "old-days"reserved 64 buffers to cope with the lack of available credits in the, back then, relatively lack of real-estate in the HBA's. Today's HBA's are for more capable in off-loading frame into the OS especially with the latest generation of PCI-E and NVME capable devices.
I wrote a number of articles over here related to buffers and credits over here.