The mainframe blogs that I have written in the past have typically been about technology and how our customers can better use their FICON capabilities. But on this occasion I am going to provide a more personal blog about my own 45 year journey through this historical era of the Mainframe and what being a Mainframe Practitioner has meant to me. The reason that you might want to read about this journey is that you will discover that I had to reinvent myself a number of times during all of these years. I made changes to my career in order to continue to challenge myself professionally and to continue to provide value to my employers. Change is not bad. It is usually uncomfortable being taken out of one’s comfort zone but the end result is a better, more talented and valuable you. If you can embrace change, as I have, then your experiences in computer technology and, in particular, the mainframe world, will be absolutely amazing. Also in this reading, for those of you with some decades of experience, you’ll relate to the way things were at various times in our industry – it is kind of fun to remember it all.Read more...
April 7 is a momentous occasion in the history of computing. It marks the 50th anniversary of IBM's announcement of the System/360 and 50 years of IBM mainframe computing. Brocade has been right there in partnership with IBM for over 25 of those fifty years, introducing innovations in I/O and storage networking that have helped change the world.
The 30 largest banks in the world run their mission critical mainframe business workloads on a Brocade FICON SAN infrastructure. Why does the world's financial system run on Brocade FICON SANs?Read more...
When working in any industry, there are many technical names, terms and acronyms that are used constantly. For those of us in IT one of those familiar terms is “firmware”. Like most professional organizations, I have found that when firmware is being discussed it is about “what it does” rather than “what it is”. We often take for granted that, since firmware just seems to be part of today’s intelligent devices, everyone knows exactly what it is. However, I suspect that is not always the case so read on to find out more about “what it is” in regards to switched Fibre Channel protocols.Read more...
Is SDN really a new idea, or has the mainframe already been doing this in some form for many years?Read more...
My last blog posting discussed the 23 July 2013 set of IBM System z announcements, and how those announcements encourage channel consolidation by IBM System z customers. One of the items I briefly covered was Dynamic Channel Path Management (DCM) for FICON. This blog post will discuss DCM in more detail, and make the case why you should seriously consider implementing DCM.
DCM support for native FICON channels was originally introduced in z/OS V1.11 with support for a single, intermediate FICON switching device between the FICON channel and storage control units. The recent IBM z/OS V 2.1 announcement significantly enhanced DCM for FICON. In z/OS V2.1, DCM was enhanced to support FICON channel path connections through two intermediate FICON switching devices, i.e. with z/OS V 2.1, DCM for FICON now supports cascaded FICON configurations.
This really is an important step in the evolution of System z I/O architectures. With DCM for FICON now supporting cascaded configurations, it will be much easier to use a smaller number of channels (channel consolidation) and optical fiber connections for FICON I/O, particularly for multi-site installations that rely on cascaded FICON. Remember, z/OS V2.1 also introduced support for up to 24K subchannels per FICON channel...Read more...
On 23 July IBM made a series of hardware, operating system and software announcements for the zEnterprise platforms. IBM also announced the eagerly awaited new business class zEnterprise with the zBC12 announcement. There were several parts of each announcement that pertained to I/O and channel technology. One common denominator of each of these parts was that they 1) encourage a movement towards consolidation of channels on the host, and 2) encourage avoiding direct attached FICON channels and adopting switched FICON architectures.
The first channel subsystem enhancement IBM announced was very big news: IBM announced increased addressing with up to 24k subchannels per channel (port) for the FICON Express features. To help facilitate growth as well as continuing to enable server consolidation, IBM now supports up to 24k subchannels per FICON Express channel (channel path identifier - CHPID). End users will now be able to define more devices per FICON channel, which includes primary, secondary, and alias devices. The maximum number of subchannels across all device types addressable within an LPAR remains at 63.75k for subchannel set 0 and 64k-1 for subchannel sets 1 and higher. This support is exclusive to the zEC12 and the zBC12 and applies to the FICON Express8S, FICON Express8, and FICON Express4 features when defined as CHPID type FC. This is supported by z/OS , z/VM , and Linux on System z...Read more...
I recently returned from a week long business trip to South Africa. I typically go to South Africa every 12-15 months to meet with our OEM partners, and our FICON/mainframe customers. Brocade has a great two man team in South Africa with Nick Pateman and Carlos Isidro. On this most recent trip I met with several FICON customers as well as IBM, HDS, and EMC. Our meetings consisted of a general Brocade FICON update: how our new technology enhancements with FOS 7.0 and 7.1 work, a product roadmap discussion, and usually some more technical Q&A. I will be going back in late November to teach our FICON architect certification course.
One of the things I have noticed since my first trip to South Africa in early 2008 is the growth in the IT industry. The three OEM partners I work with most often (IBM, HDS, EMC) are all experiencing tremendous growth in their business. The same is true for Brocade. This growth from our OEMs is not just in South Africa-it is in the entire African continent. They already have, or are opening offices and facilities in Kenya, Morocco, Namibia, Nigeria, Tanzania, Senegal, and Cameroon. IBM is even apparently opening a mainframe skills center in Johannesburg to cover the continent and train new mainframers. Africa is the next great growth market for the IT industry, and with that it is the next great growth market for mainframes, mainframe attached storage, and SAN. It could even be thought of as the last frontier of IT. And the growth is being driven from South Africa.
Here are some interesting facts:
1) Over the past decade, Africa’s real GDP grew by 4.7% a year, on average—twice the pace of its growth in the 1980s and 1990s.
2) By 2009, Africa’s collective GDP of $1.6 trillion was roughly equal to Brazil’s or Russia’s.
3) Telecom companies in Africa have added 316 million subscribers—more than the entire U.S. population—since 2000.
4) According to UN data, Africa offers a higher return on investment than any other emerging market.
How important is Africa to IBM? According to an article in the February 16, 2013 issue of The Economist
1) Since mid-2011 it has set up shop in Angola, Mauritius and Tanzania, as well as Senegal.
2) IBM now boasts a presence in more than 20 of Africa’s 54 countries.
3) Last August IBM opened a research lab in Nairobi, one of only 12 in the world.
4) And between February 5th and 7th 2013, Ginni Rometty, IBM chief executive, and all who report directly to her met dozens of African customers, actual and prospective, in Johannesburg and the Kenyan capital. It was, Mrs Rometty said, the first time the whole top brass had assembled outside New York since she became the boss just over a year ago.
According to a February 21, 2013 Bloomberg Business Week article titled " For IBM, Africa Is Risky and Rife With Opportunity":
"IBM’s global revenue dipped 2.3 percent, to $104.5 billion, in 2012, about the same level it was in 2008. Of that, sales out of Africa kicked in about $400 million and are forecast to more than double and surpass $1 billion in 2015...... That’s faster growth than IBM saw in India, where it started a push in 1992 and surpassed $1 billion in revenue in 2007, the person said."
Here is a slide from a recent IBM Investor Briefing:
Over the past three years, IBM has new mainframe customers in Senegal, Cameroon, Ethiopia, Nigeria, and Namibia. Financial institutions in Africa are rapidly realizing what the rest of the world has known since April 1964: the mainframe is the best platform to run mission critical business applications.
Africa truly is the next great frontier for IBM mainframes, as well as mainframe storage and its connectivity. And, it's being driven out of South Africa. Brocade is right there.
I look forward to future trips to South Africa, and meeting many of the new mainframers in other African nations.
Greetings from Munich, Germany. I am here this week at the EMEA IBM System z Technical University. This event is IBM's primary System z technical conference in Europe each year. Brocade is proud to be the Gold Sponsor at this year's event. We have a prominent booth, are giving three presentations, and have several people at the event including yours truly.
Why is this important? It's just the latest example of the deep commitment Brocade has to IBM System z, and our mutual mainframe customers.
Brocade's CEO, Lloyd Carney recently had a Q&A session/interview with Enterprise Executive Magazine. The interview appears as the cover story in the May/June issue of Enterprise Executive. The article had three really important points that I felt needed to be shared and highlighted. They sum up our experience, and commitment to IBM System z and our mutual customers very well.
Which brings me back to the Munich IBM System z Technical University. Brocade recognized the importance of this event. Its the primary IBM System z Conference in Europe in 2013 We're supporting IBM and the System z platform as a Gold Sponsor. We're meeting with customers, giving presentations, advice, etc. We view this as very important. When we talk the talk and say System z is important to Brocade, we walk the walk so to speak. Actions speak more loudly than words. For another European example, Brocade is a member of Guide Share Europe.
How important is IBM System z and its customers to Cisco? I expected a huge Cisco presence here given their recent new product announcement with their MDS 9710 directors.
How important is IBM System z and its customers to Cisco? I wanted to ask someone here from Cisco, but unfortunately that was not possible. You see, IBM System z and its customers are so important to Cisco, that Cisco did not show up here. Yes, that is right, Cisco did not show up at the most important IBM System z Conference in Europe this year. No booth, no presentations, not even any attendees here to learn. I almost felt like I was involved in a Where's Waldo episode entitled "Where's Cisco"?
Let me close this post by contrasting that absence with one of Lloyd's quotes:
"Brocade is proud that our FICON directors account for more than 80 percent of the installed storage area networking infrastructure in these global enterprises. They’re our most important and valued customers, and it’s an honor that we’re trusted for their mission-critical infrastructure."
Auf Wiedersehen from Munich!
So here I am at the end of my last day at this year’s EMC World conference. Yes, EMC World 2013 goes on for another three quarters of a day tomorrow but I fly back home to Atlanta tomorrow morning. My brain is full, my legs are wobbly from walking between meeting rooms and the Solutions Pavilion, and my nether regions ache from sitting in conference chairs – you know what I mean, you have been there yourselves!
Yesterday, as my blog pointed out, began on a sad note for me. In contrast, I had a really good day and wonderful experience with an attendee at the conference today. I met this gentleman on the first day of the conference and, as I related in my first blog, we set up a schedule at that point for a meeting to be held today. That meeting took place and, once again, my very good friend and exceptional professional AJ Casamento lead the discussion. This business professional, Blair, is a key officer in a company that is a premier systems integrator, located in California, that provides quality business automation, computer network and security solutions, remote and on-site support, and cost-effective IT outsourcing for small to mid-sized businesses. They do business on several continents...
Hello from day 2 at EMC World!
I had a sad experience right away this morning. On the elevator on the way down to the conference area I ran into a guy from a very well-known company in California. We were the only people on the elevator and both of us were wearing EMC World badges. I introduced myself and then we shook hands. I mentioned that I was one of millions of people who had connected to their website. He said thanks and then said “but it isn’t hosted on Brocade switching.” I asked him why and he said that Brocade’s presence was few and far between, from his perspective, at his company location and that our competitor was there all the time. Then they were offered a deal that was hard to refuse and, as far as he knew, Brocade was never in the hunt for the business. Out of sight, out of mind I suspect. Sad, sad, sad.
By nature I am a hunter. Most of you probably know what that means. Marketing and sales are often broken up into personality groups known as hunters and farmers. The breakdown is that there are people good at finding new opportunities (hunters) and people good at nurturing current relationships (farmers). Of that group, I am a hunter. And when a hunter hears that an opportunity has gotten away from them, we are always sad. It is one thing to lose an opportunity in a fair competition – but to lose because we never showed up to the tournament – that is hard. No person and no company can be everywhere all of the time. And when there are conflicts it is critical to use resources to target the more important and valuable of those opportunities even at the expense of the others. But that does not make it any easier to lose an opportunity. This was not a mainframe opportunity, so I was never involved and would not have become involved, except for my accidental meeting with this gentleman on an elevator. I have already spoken to one of my upper management people about this and I will speak with others today. Lost is lost but we need to reflect on the reason why we lost and see if there are things we can do going forward to avoid as many situations like this as possible. In my mind all Brocadians must take responsibility for finding and nurturing prospects and customers and making sure that, ultimately, they feel like a part of the Brocade family. That is just one of the many ways in which we excel as a company and become trusted advisers to our customers and partners. I guess I am just on my soap box because I am competitive by nature and hate to lose so much. Enough said.Read more...
Just thought I'd post a little about my experiences at EMC World 2013 being held in Las Vegas, Nevada this week at the Venetian Hotel. Today was day 1 of the conference and the halls were crowded with technology leaders and users from all over the world. I do not know the actual numbers but I heard that as many as 15,000 participants are here to learn and network and visit the Expo to see and touch the latest computer technology in the world.
I am here as a member of the Brocade contingent attending and supporting this conference. I was invited to attend as a subject matter expert (SME) in mainframes and FICON and, in particular, to meet and work with that specialty area of our customer set. Of course, I am available to any customer that I can assist, but I really do like working with our mainframe customers.
As a value-add to this conference, Brocade is hosting meetings between customers and Brocade management and leading Brocade technologists. It is a very well-orchestrated effort and all of the main participants are running around with a report that tells them what customer they will be meeting with and what the purpose of the meeting is all about. It is all about getting the right people in front of a customer to discuss their needs and requirements and begin a process that will lead to a mutually beneficial outcome...Read more...
This blog post is based on an article on the same subject I recently wrote for Enterprise Tech Journal Magazine. The article appeared in slightly different format in the March/April 2013 issue.
The introduction of the FICON I/O protocol to the mainframe I/O subsystem provided the ability to process data rapidly and efficiently. As a result of two main changes that FICON made to the mainframe channel I/O infrastructure, the requirements for a new Resource Measurement Facility (RMF) record came into being. The first change was that unlike ESCON, FICON uses buffer credits to account for packet delivery and provide a data flow control mechanism. The second change was the introduction of FICON cascading, and the long distance capabilities it introduced, which was not as practical with ESCON.
Similar to the ESCON directors that preceded them, FICON directors and switches have a feature called Control Unit Port (CUP). Among the many functions of the CUP feature is an ability to provide host control functions such as blocking and unblocking ports, safe switching, and in-band host communication functions such as port monitoring and error reporting. Enabling CUP on FICON directors while also enabling RMF 74 subtype 7 (RMF 74-7) records for your z/OS system, yields a new RMF report called the “FICON Director Activity Report”. Data is collected for each RMF interval if FCD is specified in yourERBRMFnn parmlib member AND… in SYS1.Parmlib the IECIOSnn says FICON STATS=YES. RMF will format one of these reports per interval per each FICON director that has CUP enabled and the parmlib specified. If you are using FICON Virtual Fabrics and have your FICON directors partitioned into multiple, smaller logical switches you will have one RMF 74-7 report per switch address (per logical switch). The FICON Director Activity Report captures information based on an interval which is set for RMF and tells it when to create this report along with others. In essence, the report captures a snapshot of data and the counters based on a time interval, such as 20 minutes. Often, you need to run these reports more than once and change the interval periods for troubleshooting to determine if there is a trend.
This RMF report is often overlooked but contains very meaningful data concerning FICON I/O performance—in particular, frame pacing delay. Frame pacing delay has been around since fibre channel SAN was first implemented in the late 1990s by our open systems friends. But until the increased use of cascaded FICON, its relevance in the mainframe space has been completely overlooked
The article “Performance Troubleshooting Using the RMF Device Activity Report” (Oct/Nov 2012 Enterprise Tech Journal)continued a series of articles I have been writing on System z I/O performance. As a quick review, that article assumed there was an application Service Level Agreement (SLA)/Service Level Objective (SLO) for transaction response time that wasn't being met. It then went through one of the key RMF reports used in mainframe I/O performance management/troubleshooting: SMF 74-1, the RMF Direct Access Device Activity report. This report contains response time information and information on the various components of response time. It can be used to further narrow down what may be the root cause of the problem and provide a good idea of what other RMF reports we should check. This article continues where that discussion left off, and will examine the RMF 74-7 record, the RMF FICON Director Activity Report.
Figure 1 illustrates our environment and where the various I/O related RMF reports fit. Let us assume that our review and analysis of the RMF Device Activity Report showed that the specific components of response time that are outside of our normal parameters are PEND and CONN. We wish to determine if our FICON SAN could be the component of our infrastructure causing PEND and/or CONN to be abnormally high.
Figure 2 shows an example of a FICON Director Activity Report. It should be noted that this report is vendor agnostic, meaning that it will contain the same fields and information regardless of which FICON director vendor manufactured the director. It also is the same for any model director/switch from a given vendor.
The fields of critical interest contained in the report are:
Other data in the report include:
For additional detailed information, please review Resource Measurement Facility Report Analysis, SC33-7991-12.
What is frame pacing and what is the difference between frame pacing and frame latency?
Frame Pacing is an FC-4 application data exchange level measurement and/or throttling mechanism. It uses buffer credits to provide a flow control mechanism for FICON to assure delivery of data across the FICON fabric. When all buffer credits for a port are exhausted a frame pacing delay can occur. Frame Latency, on the other hand, is a frame delivery level measurement. It is somewhat akin to measuring frame friction. Each element that handles the frame contributes to this latency measurement (CHPID port, switch/Director, storage port adapter, link distance, etc.). Frame latency is the average amount of time it takes to deliver a frame from the source port to the destination port.
If frame pacing delay is occurring then the buffer credits have reached zero on a port for 1 or more intervals of 2.5 microseconds. Data transmission by the port reporting frame pacing delay ceases until a credit has been added back to the buffer credit counter kept by that port. Frame pacing delay causes unpredictable performance delays. These delays generally result in elongated FICON CONNect time and/or elongated PEND times that show up on the volumes attached to these links. Therefore, when you see abnormally high PEND and CONN metrics, particularly in a multi-site cascaded FICON architecture, one of the first places to look at should be the RMF 74-7 records.
Figure 2 shows an example of a “clean” FICON Director Activity Report, meaning that none of the ports are reporting frame pacing issues. All ports show an AVG FRAME PACING of “0”. This is the ideal. Note that this is a report from a non-cascaded FICON director (there are no ports connected to another SWITCH). Next, let’s look at a potential problem with frame pacing.
Figure 3 is an example of another RMF 74-7 report from a different FICON director. As you can see, Ports (PORT ADDR) 27, 29, 2E, 5E, and 5F are all reporting some degree of frame pacing issues of varying magnitudes. In other words, these ports all stopped transmitting for “x” intervals of 2.5 microsecond duration. This happened because each of these ports, for some reason, had an indication that the port it was connected to had run out of buffer credits. Recall that for two ports connected together, the transmit half of each port is in constant communication with the receive half of its partner port. The buffer credits based fibre channel flow control mechanism used by FICON storage networks has the transmit half keep track of how many buffer credits its partner port’s receive half has remaining. When this counter reaches “0”, frame transmission temporarily stops. This in a nutshell is what is reflected in the AVG FRAME PACING field.
The question then becomes “is everything that is a non-zero value of AVG FRAME PACING” bad? The answer is the usual “it depends”. Remember, you are likely closely looking at the RMF 74-7 reports because you are doing some performance troubleshooting to determine what is causing an abnormally high response time. You noticed higher than normal PEND or CONN time. Most of us do not look at the RMF 74-7 report as a first step. If the ports associated with the devices exhibiting abnormally high PEND and/or CONN times are showing non-zero values for AVG FRAME PACING, you likely have narrowed down the problem. As a next step, you should do some trend analysis by looking at a series of the RMF 74-7 reports for this specific FICON director to determine if the frame pacing issue occurs consistently, and are the values getting worse?
As a personal rule of thumb, the author looks for AVG FRAME PACING values that are >100. Rarely have I seen AVG FRAME PACING values <100 as cause for concern. It deserves scrutiny and looking for a trend to see if it becomes worse. Values greater than 100 such as shown by port 29, 2E, and especially 5F in this example deserve further analysis. Port 5F is attached to a control unit (CU) so this is likely a slow drain device issue on the storage host/fibre adapter. This is when we would turn to the ESS Link Statistics report for further analysis (which will be the subject of the next article in this series).
The one exception to the aforementioned rule of thumb is if the port exhibiting AVG FRAME PACING >0 is for interswitch links (ISLs) connecting cascaded FICON directors. An example is illustrated in Figure 4. Port (address) 25 shows an AVG FRAME PACING value of 570. Since we know that this port is attached to a port on another FICON director, there are more troubleshooting options available outside of RMF such as using the FICON director management software and/or Command Line Interface (CLI). It may be something as simple as not having enough buffer credits configured on the port attached to port 25 in this example. In which case, the port buffer credit configuration can be altered. Since ISLs are typically used for remote data replication such as PPRC or XRC, any frame pacing delay is cause for concern.
This article has explained how you can use the FICON Director Activity Report (RMF 74-7) to drill down further into a I/O performance problem. It provides a valuable way to narrow down the potential root cause(s), but is not a one stop place to completely solve the problem you are troubleshooting. Problems such as slow drain devices need further examination which can be done using the ESS Link Statistics Report. Fabric contention issues such as frame pacing delay being exhibited on ISLs should also be examined further with the IBM z/OS I/O health check mechanism. That and the ESS Link Statistics report will be discussed in greater detail in future articles.
I look forward to hearing your questions, comments and concerns. Thanks for reading!
Buffer-to-buffer credit management affects performance over distances; therefore, allocating a sufficient number of buffer credits for long-distance traffic is essential to performance. With FOS 7.1 Brocade introduced an enhancement to our FOS Extended Fabrics Feature: The portCfgLongDistance CLI command now includes the option to configure the number of buffers by using the -frameSize option command along with the -distance option. Prior to FOS 7.1, the only option was the –distance option.
Buffer Credit Background and Review
To prevent a target device (either host or storage) from being overwhelmed with frames, the Fibre Channel architecture provides flow control mechanisms based on a system of credits. Each of these credits represents the ability of the device to accept additional frames. If a recipient issues no credits to the sender, no frames can be sent. Pacing the transport of subsequent frames on the basis of this credit system helps prevent the loss of frames and reduces the frequency of entire Fibre Channel sequences needing to be retransmitted across the link.
Because the number of buffer credits available for use within each port group is limited, configuring buffer credits for extended links may affect the performance of the other ports in the group used for core-to-edge connections. You must balance the number of long-distance ISL connections and core-to-edge ISL connections within a switch.Buffer-to-buffer (BB) credit flow control is implemented to limit the amount of data that a port may send, and is based on the number and size of the frames sent from that port. Buffer credits represent finite physical-port memory. Within a fabric, each port may have a different number of buffer credits. Within a connection, each side may have a different number of buffer credits...Read more...
Brocade started its Brocade Certified Architect for FICON (BCAF) certification in summer 2008. The BCAF is part of the very successful Brocade Certification program, and is the only FICON/mainframe networking certification program in the industry. After the initial administrative work was completed, a team of our mainframe focused experts met in a secluded, secure location to develop the learning objectives, and met again one month later to do the item writing (question writing) for the certification exam. Once the exam was written, it was time to develop the training course and supporting material to help people prepare to pass the certification exam. A "beta" exam was made available for the truly hard core FICON fanatics out there. The "beta" version of a certification exam includes all the questions that were written. The results of the "beta" exam are used to determine which questions may be too difficult/too easy/confusing/ambiguous. In other words, it is used to weed out some questions. Following the "beta", two forms of the exam are published and made available for the public to take at Pearson Vue testing centers. In summer 2010 we repeated the process to update the exam to reflect new technologies in the marketplace: both Brocade technology as well as IBM mainframe technology. For a certification to be worthwhile, it needs to be kept current in this manner...Read more...
A line of demarcation is defined as: A line defining the boundary of a buffer zone or area of limitation. A line of demarcation may also be used to define the forward limits of disputing or belligerent forces after each phase of disengagement or withdrawal has been completed. The term is is commonly used to denote a temporary geopolitical border, often agreed upon as part of an armistice or ceasefire. The most famous line of demarcation in recent history is the Military Demarcation Line, also known as the Armistice Line which forms the border between North Korea and South Korea. There also was the late Libyan leader Muammar Gaddafi's ironically named "Line of Death". But enough history, you get the point. The more important thing is: How does this relate to your data center?
I meet with many of our customers across the globe, and one thing that the vast majority have in common is this: there are lines of demarcation that exist to separate the responsibility for the various teams who manage the mainframe, FICON SAN, mainframe storage, and the network. If you are running Linux on System z, there are likely more, but that is another story (future post). I'd like to focus the rest of this post on the line of demarcation that exists in many business continuity architectures. That is the line of demarcation between the team(s) that manage the mainframe/mainframe storage/ FICON directors-channel extension, and the team that manages the network for cross site connectivity. I like to call this the Storage-Network Line of Demarcation...Read more...
Greetings! I recently returned from a 3 week long around the world business trip to Brazil and Australia. I had a wonderful time meeting with many of our valued customers and OEM partners and would like to publicly thank them for their tremendous hospitality. I also would like to thank my Brazilian and Australian Brocade colleagues for their wonderful hospitality. In the rest of today's blog post I would like to talk about the zEC12, DCX 8510, cars and tires.
On 28 August 2012 IBM announced the zEnterprise EC12 (aka zEC12). For those of you who have read the full IBM announcement you're well aware that it is quite an impressive machine in terms of performance, scalability, and management. I'd like to focus on the channel subsystem enhancements introduced with the zEC12. They all deal with a topic very important to many of you (and near and dear to yours truly) and that is performance of the FICON environment...Read more...
Prior to yours truly taking a short vacation, the community in which I live (Gahanna, OH) was struck by a severe storm on June 29. The storm was quite destructive, leaving many without electricity for several days. Gahanna made the national news. Fortunately, there were very few deaths/injuries. However, the damage was extensive, particularly to the electric power transmission system/grid. Many were without electricity for over a week, and without air conditioning in the middle of a heat wave with temperatures of 100+ degrees F. The media started talking about how unreliable our existing electrical grid was. In hot weather, rolling brownouts are common because our power grid is built on outdated technology with many old power poles and above ground wiring that can't handle the workload, or the wind in a storm. Power outages are more common, and of longer duration than they were 5 years ago. In short, our electrical grid built on outdated transmission components no longer had the reliability, availability, serviceability (RAS) and performance required by customers.
Kind of sounds like mainframe customers' state of the art TS7700 Grid business continuity solution that still uses now outdated Catalyst 6500 series switches/routers for the IP transmission of the cross site data replication. Let's talk about how Brocade can help you build a more reliable, available, scalable and higher performance TS7700 Grid solution using our MLXe router. And oh, by the way I think you will like the simplicity of the solution, particularly in managing the hardware components.
Background: The IBM Virtualization Engine TS7700 family is the latest IBM virtual tape technology. It is a follow on to the IBM Virtual Tape Server (VTS), which was initially introduced to the mainframe market in 1997. The IBM VTS also had peer-to-peer (PtP) VTS capabilities. PtP VTS was a multi-site capable business continuity/disaster recovery (BC/DR) solution. In a nutshell, PtP VTS was to tape what PPRC was to DASD. PtP VTS data transmission was originally via ESCON, then FICON, and finally TCP/IP. Today, the TS7700 offers a similar functionality, known as a TS7700 Grid. A TS7700 Grid refers to two or more physically separate TS7700 clusters connected to one another by means of a customer-supplied TCP/IP network. The TCP/IP infrastructure connecting a TS7700 Grid is known as the Grid Network. The grid configuration is used to form a disaster recovery solution and provide remote logical volume replication. The clusters in a TS7700 Grid can, but do not need to be, geographically dispersed. In a multiple-cluster grid configuration, two TS7700 Clusters are often located within 100 km of one another, while the remaining clusters can be located more than 1,000 km away. This provides both a highly available and redundant regional solution while also providing a remote disaster recovery solution outside of the region. For a more detailed, extensive discussion of the TS7700 and TS7700 Grid, please reference this IBM Redbook.
The TS7700 Virtualization Engine uses the TCP/IP protocol for moving data between each cluster. Bandwidth is a key factor that affects throughput for the TS7700 Virtualization Engine.Other key factors that can affect throughput include:
1) Latency between the TS7700 Virtualization Engines
2) Network switch capabilities
3) Network efficiency (packet loss, packet sequencing, and bit error rates)
4) Inter-switch link capabilities (flow control, buffering, and performance)
5) Flow control to pace the data from the TS7700 Virtualization Engines
The TS7700 Virtualization Engines attempts to drive the network links at the full line rate, which may exceed the network infrastructure capabilities. The TS7700 Virtualization Engine supports the IP flow control frames so that the network paces the level at which the TS7700 Virtualization Engine attempts to drive the network. The best performance is achieved when the TS7700 Virtualization Engine is able to match the capabilities of the underlying network, resulting in fewer dropped packets. When the system exceeds the network capabilities, packets are lost. This causes TCP to stop, resync, and resend data, resulting in a much less efficient use of the network. In summary, latency between the sites is the primary factor. However, packet loss due to bit error rates or insufficient network capabilities can cause TCP to resend data, thus multiplying the effect of the latency.
Brocade's role in TS7700 Grid solutions
IBM customers who have implemented, or are considering implementing a TS7700 Grid solution for their mainframe environment, are typically very concerned about reliability, availability, serviceability (RAS) and performance. That is the primary reason why the vast majority of these same IBM customers have implemented Brocade FICON directors, such as the Brocade DCX 8510, for their mainframe storage connectivity. Also, many IBM TS7700 Grid customers previously used IBM PtP VTS. Prior to PtP VTS using IP based replication, it used ESCON or FICON and hence the end user required channel extension technology. This channel extension technology for PtP VTS was typically a Brocade device, such as the Brocade USD-X.
Fast forward to the present and the current TS7700 Grid solution. Most customers are utilizing a switch/router for the TCP/IP based data replication that is just as old (and now outdated) technology as the USD-X, and that switch/router is the Cisco Catalyst 6500 series.. Fortunately, as many end users are quickly finding out, Brocade offers a IP switch/router, the Brocade MLXe, that offers better RAS and performance than these old Catalyst 6500s. The Brocade MLXe offers such a high level of performance that the most data intensive organization in the world, CERN ( European Organization for Nuclear Research) recently standardized on it. This has resonated with many mainframe customers who are starting to implement the MLX for their TS7700 Grid solution. Let's take a look at an example below.
This diagram represents the "after" environment. The "before" environment was much more complicated, with lower performing, older technology hardware. The "before" environment consisted of IBM System z9 and System z10 mainframes, older DASD and VTS. The storage and extension network consisted of more,smaller Brocade M6140 FICON directors, stand alone legacy CNT (Brocade) Edge 3000 FCIP extension switches, Cisco Catalyst 6509s, and DWDM hardware from yet another vendor.
What is not obvious from the above diagram is that the DCX-8510 FICON directors also have the Brocade FX8-24 FCIP extension blade in one or more slots.So in this solution, the customer greatly simplified their technical support in going from 3 vendors to 1 (Brocade). They consolidated hardware footprint and lowered operating costs by moving to the newer, more energy efficient hardware. They improved the end to end performance of the entire environment, and in the process improved the efficiency of cross site network usage. As the most expensive Total Cost of Ownership (TCO) cost component in a DR/BC solution is network bandwidth, they can expect to save additional costs in terms of their cross site network bandwidth requirements.
Last, but certainly not least, this new solution allowed the customer to go from using four management platforms in the "before" environment, to one management platform as you can see noted in the diagram. Brocade Network Advisor (BNA) allows you to manage the Brocade FICON directors/switches, the FCIP extension blades/switches, and the Brocade IP switches and routers all from a single management tool-a "single pane of glass". This makes it far simpler to manage the day to day operations of this environment, not to mention coordinate management software upgrades.
All in all, a nice high performance, TCO saving, clean and simple solution to protect your data.
Watch for my next blog post when we'll take this solution a step further and discuss incorporating DASD replication and extension into this. As always, thanks for reading. Feel free to comment or ask questions, or follow me and my mainframe connectivity tweets on Twitter. My handle is @DrSteveGuendert.
Over the weekend I realized that I have violated one of the cardinal rules of blogging. It has been 2 months since our last blog post. That is an eternity in the era of social media where we're bombarded with tweets and Facebook updates conRead more...
Over the weekend I realized that I have violated one of the cardinal rules of blogging. It has been 2 months since our last blog post. That is an eternity in the era of social media where we're bombarded with tweets and Facebook updates continually. We're not going to go that long between posts again (actually I am going to do another post later today after this one). There are many interesting things going on here at Brocade when it comes to the mainframe connectivity part of our business, and there are many interesting things happening in the mainframe world in general that merit discussion. More on those later, back to the topic of this post: Mainframes and Social Media.
We started our efforts in Mainframe related Social Media here at Brocade with this blog. Thanks to you, our loyal readers and subscribers, this blog typically gets over 20,000 views per post within 2 weeks of a post "going live". This blog actually gets the most views of any of the blogs at Brocade. Thanks to you, this has led to bigger things. Two months ago we launched the Mainframe Solutions Community. Earlier this month we renamed it the Mainframe and FICON Solutions Community. Why the rename? We decided we did not want people to make the assumption that by Mainframe Solutions we meant only FICON products such as FICON directors and switches. Brocade also offers the best distance extension solutions for mainframe end users, such as the Brocade FX8-24 extension blade and the Brocade 7800 switch. Brocade also offers the highest performing network routing and switching products, such as the Brocade MLX, for your core networks, or for business continuity solutions such as the IBM TS7700 Grid.Read more...
I just returned to the office after two weeks in China. I was in Shanghai and Beijing to work with my good friend, IBmer Dennis Ng. Dennis and I, as mentioned in my last blog post started working on a 2 day joint IBM-Brocade mainframe I/O and FICON performance/performance management training workshop. We started putting these together this past December. Dennis and I taught these together at IBM's Shanghai office, and then again at the IBM Beijing office. As these were the first two sessions, we did them for an internal audience (IBMers and Brocadians). We had great attendance (full classrooms) at both sessions. They were very successful sessions. We received great feedback from the attendees. There was a great deal of interaction with the students with many questions. Following the 2 day training session in each city, Dennis and I visited with a very large Chinese bank (mutual customer whose name will remain confidential) and worked with them directly.
Given this article, yet another doom and gloom article discussing the "threat of the looming mainframe skills shortage", our 2 weeks in China gave me reason to smile. I will admit that I am noticing the things discussed in the article as I meet with customers and our OEM partners around the world. And yes, there does seem to be a "graying effect". However, what made me smile was that the average experience level of our attendees in Beijing and Shanghai was 4 years of mainframe experience. For the vast majority, they were in their first job post college graduation...Read more...
I have been visiting with customers in Asia and Europe recently who are in the midst of doing infrastructure and/or storage refresh in their data centers. Typically the results from upgrading your fibre channel infrastructure is good news foRead more...
I have been visiting with customers in Asia and Europe recently who are in the midst of doing infrastructure and/or storage refresh in their data centers. Typically the results from upgrading your fibre channel infrastructure is good news for all involved but I have found one instance where these technology refreshes might not provide the value that the customer is seeking.
FICON performance is gated by a number of factors with one of those major factors being the maximum link speed that each connection point will accept. As you know, if one end of an optical cable is 8Gbps but the other end of that cable is 4Gbps then the link is obviously auto negotiated down to 4Gbps. Usually this is not a problem, it is just a single link after all.
And if the customer (wisely) is upgrading their switched-FICON infrastructure to keep pace with their storage capabilities then our current switching products will transmit data at a maximum of either 8Gbps or 16Gbps. Our DCX 8510 16Gbps Director family is becoming very popular for technical refreshes but, of course, there are no DASD storage arrays that connect at 16Gbps – the fastest currently being 8Gbps. So those 16Gbps connections will auto negotiate down to 8Gbps or 4Gbps – the highest common link rate that both ends of the optical cable and their SFPs can provide. Of course, this is just what you would expect to have happen.
Before I touch on my concern let me lay just a little bit of groundwork.
DASD, regardless of vendor, is well known and widely used in mainframe shops. On DASD storage the typical use case is about 90% read I/O and about 10% write I/O. Every shop is different and that is not my point anyway. It is my experiences that there is still a lot of 2Gbps and 4Gbps DASD in mainframe shops but many mainframe enterprises are realizing that they need to upgrade their DASD to 8Gbps performance.
And, although it is not always architected or thought out very well, across all of the links that make up the total path between CHPIDs and storage, we should never have the target of the I/O exchange to be slower than the initiator of that I/O exchange. I will show you diagrams of what I mean very soon.
So what is my concern?
Below is a graphic representing what I consider to be a good deployment for an switched-FICON I/O infrastructure.
This is actually the ideal model for DASD since most DASD applications are 90% read and 10% write. So, in the case where the CHPID is reading DASD data, the "drain" on the I/O path will be the 8Gb CHPID and the "source" on the I/O path is the 4Gb storage port. The 4G source port (DASD) simply cannot send data fast enough to overrun the 8G drain (8G CHPID). Even if the DASD is upgraded to 8Gbps ports, the source will still not be able to overrun the drain. (And yes I know this is a simple picture as in reality there is a lot of fan in – fan out that could be taking place.)
What concerns me is that customers have decided to upgrade DASD arrays to 8Gbps even if the mainframe CHPIDs are still at only 4Gbps (FICON Express4). I have spoken with several customers where that has occurred. So what does that look like?
It actually works very similarly, regardless of whether cascaded links are in use or not, but cascaded links will create a worse scenario for what I am discussing with you than switched fabrics without cascaded links. We will see that in a few paragraphs.
But my point here is that this is potentially a very poor performing, infrastructure!
In this case the "drain" on the I/O path is the 4Gbps CHPID and the "source" on the I/O path is now an 8Gbps storage port. In this simple example configuration the I/O Source can out-perform the I/O Drain. Even without ISLs this can cause local connectivity back pressure towards the highly utilize CHPID. When you include ISL links the problem potentially becomes even worse. Regardless, the 4Gbps CHPID (actually its switch port) now has the potential to become a slow draining device.
Since the 4Gbps port on the local switch cannot keep up with the 8Gbps rate of the data that is being sent to it, the switch port servicing the 4Gbps CHPID will begin placing the data frames in its buffer credit queue (BCQ). Backpressure begins to build up within the infrastructure for access to that switch port.
The buffer credit queue on the switch egress port leading to the 4G CHPID will fill up. Of course, other local switch ports that have I/O frames bound for that very busy 4G CHPID switch port will have to save as many frames in their own BCQ as possible and then finally stop trying to transmit data until buffer credits become available for them.
In this case, once the switch egress port to the 4G CHPID finally fills up its BCQ, that switch port cannot receive any additional data. However, the 8Gbps data flow continues. So now it is the ISL ingress port’s turn to start having problems.
The ISL ingress port is on that same local switch as the 4G CHPID switch port. Since it is transmitting the DASD data to the now full queued up CHPID switch port, it will have to start filling up its own BCQ. It will slowly pass frames from its BCQ to the CHPID switch port BCQ as buffer credits become available. However, when the ISL ingress BCQ fills up – well, that is when really bad things start to happen.
ISLs are used to transmit I/O exchanges for many different storage devices and CHPID ports. If an ISL BCQ fills up then it affects not only the slow draining device data flow but all of the data flows for all of the other CHPID-storage port pairs that use that ISL (or trunking) link.
At this point we have BCQs all over the place on the local switch that are negatively impacting throughput and performance on local switch ports. Some of the local storage might want to send data to local CHPIDs other than the one that is causing the problem. Unfortunately, if they are storage ports that are also servicing the highly utilized 4G CHPID switch egress port, then their BCQs will be full so that those storage ports cannot transmit data to anyone.
That is backpressure at work in the infrastructure causing more and more problems for throughput and performance.
Of course other storage ports on the same DASD array, not transmitting data to the slow draining 4G CHPID, and therefore not filling up their BCQ, would still be transmitting frames. So performance and throughput become erratic on a port-by-port basis.
But the worst situation here is that I/O queuing is now impacting all of the I/O traffic flow (from many storage ports on both switches) that is attempting to use the ISL link (or trunk) that has now used up all of its BCQ and cannot transmit any more data. And it won’t transmit any more data until one or more buffer credits becomes available. Very inconsistent and erratic performance might now occur across the entire fabric and not just on the local switching device. Some or all of the ISL links (or trunks) are becoming congested and backpressure becomes intense across the entire fabric.
Keep in mind that this is a simple example. There are many things that I am ignoring in this blog in an effort to keep this posting simple – things like virtual channels and protocol intermix environments.
The real probability in this example (and in many shops worldwide) is that all of the mainframe CHPIDs are 4Gbps and are trying to service 8Gbps DASD. The problems become orders-of-magnitude worse at that point than the picture that I’ve painted above!
So I think that there are one or two things that an enterprise can do to keep away from this kind of trouble.
The best course of action would be to upgrade their FICON Express channel cards to match the maximum link rate of any of their storage ports. If storage uses 8Gbps ports then FICON Express8 or FICON Express8S should be deployed on the mainframe. Of course the FICON/FCP switching infrastructure elements also need to match the link rate capabilities of the storage and the CHPIDs. This helps the enterprise derive the full value of their investment in their technology refresh. Of course, some customers are utilizing earlier mainframe models that do not support FICON Express8/8S. If that is the case, and refreshing your mainframe is not possible, then my next suggestion is all that I can offer at this time as a way to overcome the backpressure issues that you will face.
The second, and in my opinion poor, course of action would be to manually set the higher storage port link rate to match the slower CHPID link rate. This would keep the source I/O ports from overrunning the target I/O ports. The FICON/FCP switching infrastructure elements would then just auto negotiate to meet the demands of the attached ports. This might be a good temporary remedy until you have time to deploy higher speed CHPIDs but it should not be considered as a permanent remedy. Of course even as a temporary remedy it has its problems since it is not a trivial task to change port speed and each port will take an outage as you adjust its link speed.
But if an enterprise is going to solve its problem by downgrading port speed, just what value does buying that new storage bring to that enterprise?
All things considered, I hope that you are not faced with the scenario that I have described above. But if you are having unintentional performance and throughput problems after upgrading your DASD farm then maybe this article has helped explain what is going on and what you can do about it.
And if you are just considering upgrading, or in the initial process of upgrading, DASD to 8G and were not thinking about making sure your CHPID link rates match and also that your FICON switching infrastructure matches, then maybe I have helped you keep your enterprise in tip-top shape.
I hope so.
In chapter two of this new blog series of mine I am going to tackle what I consider to be a data center FOLLIE.
Cables (multi-mode or single mode) and Optics (SFPs) go together, hand in glove. Short wave SFPs are always attachedRead more...
In chapter one of this new blog series of mine I am going to discuss a FICON FABLE first.
As you know, FICON Channel Features provide the System z with Fibre Channel port connectivity for FICON storage and FCP storage. Each of tRead more...
In chapter two of this new blog series of mine I am going to tackle what I consider to be a data center FOLLIE.
Cables (multi-mode or single mode) and Optics (SFPs) go together, hand in glove. Short wave SFPs are always attached to Multi-Mode (MM) cables while Long wave SFPs are always attached to Single Mode (SM) cables. Short wave SFPs will not work with SM cables and Long wave SFPs will not work with MM cables.
Less expensive MM cables are not built to the same, rigid and exact requirements of their more expensive cousins the SM cables. This results in MM cables having a lot more modal dispersion of their light within the cable with the result of a complete corruption of that light signal at some distance down the cable which makes the signal unreadable. So MM cables have distance restrictions based on the speed of the link.
On the other hand, a Brocade switching device using long wave SFPs and SM cables (OS1) can easily transmit signals as far as 25km (15.5 miles) from point-to-point at 8Gbps. The SM core has a smaller diameter and the light traveling through it suffers much less modal dispersion so it provides a much longer distance link than MM can provide.
So at what distance does MM modal dispersion occur? Well, that depends on the signal source (speed) and the type of MM cable used as well as the condition of that cable and all of its connections. For our example we will assume that the MM cable is in top shape and the cable connections are good and tight.
When the physics of light through a FC cable comes into play it dictates that for a specific link rate a valid light signal can only be received at up to a maximum specified distance from the transmitter. For example, a link running at 4Gbps can send a short wave signal across an OM2 MM cable as far as 492 feet (150 meters). When using the better OM3 MM cable, a link running at 4Gbps can send a short wave signal as far as 1,247 feet (380 meters). And if using the latest OM4 MM cable a link running at 4Gbps can send a short wave signal as far as 1,312 feet (400 meters).
When you look at those numbers it is obvious that the type of MM cable helps to determine the longest distance reach that a MM cable has between two connection points.
And this is where the FOLLY is coming into play.
From about 1997 to 2007 customers had a choice of OM1 (62.5 micron, 200 mHz) or OM2 (50 micron, 500 mHz) multi-mode cables. OM1 was a holdover from ESCON and quickly was replaced by OM2. OM2 was developed for 2Gbps FICON and was usually orange in color – although there are no standards for cable coloring. But then 10Gbps came into the market. It actually came out ahead of 4Gbps. And everyone knew that 8Gbps was not too far off on the horizon. So to allow customers to have adequate distance at these new, higher speeds new MM cables (OM3 and then later OM4) were developed.
Another rule of thumb (ROT) that applies to MM cables is that as link speed doubles distance across that cable reduces in half (not quite true but close enough for an ROT). And we can see that distance decreasing as link speed increases with the little table below:
When your eyes travel down the columns it is easy to see that as speed increases the distance that the data (frames) can be sent decreases.
It is also easy to correlate what happens when upgrades are made to your cable farm and newer and better cables carry the data frames. For example, 4Gbps utilizing an OM2 cable (max of 492 feet) has less distance capability than 8Gbps utilizing OM4 cables (max of 623 feet). OM3 and OM4 are superior to OM2 at carrying data frames for longer distances as the link rate increases.
So both speed and quality of cable play a role is distance connectivity across a FC link.
Next up in the technology pipe will be 16Gbps. 16Gbps is already available from Brocade in its switching products but there are no hosts and no storage at those speeds -- yet. And, of course, 32Gbps host/storage is probably only 4 or 5 years away.
The FOLLY that I want to draw your attention to is that with technology getting 2x faster about every 18-24 months, customers are having to either rip and replace MM cables fairly often or they are having to reposition peripheral equipment closer to the server so that the lesser cable distance for higher speed links on the same old MM cables can still be made to work. Lots of effort and lots of waste. And at some point Multi-Mode is just simply not going to get the job done.
I would urge you to consider replacing your old Multi-mode cables with Single Mode cables which are then going to last you for 4-8 or 10 years. Of course there is a cost.
Multi-mode cabling is less expensive than Single mode cabling and short wave SFPs are less expensive than long wave SPFs. But how many times do you have to rip and replace MM cables before the lesser expense at the first rip and replace becomes much more than SM cables at the 3rd or 4th rip and replace. Plus all of the manual effort, project management, and potential risk to your environment at each and every rip and replace.
I really do understand that budgets are tight but I would certainly urge you to make an investment in long wave SPFs and single mode cables and just remove that hassle right out of your life. It is a FOLLY to spend good money time and time again knowing it will have to be ripped out and replaced in only a few years time. Around the world the vast majority of the mainframe customers I am visiting with have decided to go to all long wave SFPs and SM cables in their shops or they are currently considering it. They realize that making this kind of an investment will drive much more value over time with far less risk within their I/O infrastructure in the long term than anything that multi-mode and short wave will allow them to ever do.
OK, I am off of my soapbox now. Thanks for reading my rant.
In chapter one of this new blog series of mine I am going to discuss a FICON FABLE first.
As you know, FICON Channel Features provide the System z with Fibre Channel port connectivity for FICON storage and FCP storage. Each of these connectivity cards have ports: 1 (very old 1G FICON), 2 (old 1G, early 2G and new PCIe 8G) or 4 (last 2G, all 4G, first 8G) channel ports (CHPIDs) on the blade. On a channel card all of the CHPIDs will contain either long wave optics (SPFs) or short wave optics but never a combination of both. And on a System z there is basically no cost difference between long wave channel cards and short wave channel cards. That is one of the principal reasons that the vast majority of System z mainframes are ordered with FICON channel cards that contain long wave optics since there is more benefit to be gained from long wave connectivity without paying any additional acquisition cost over what short wave costs.
So those are the basics of mainframe channel cards. So what is the FABLE that I want to make you knowledgeable about?...Read more...
Happy February?? I thought perhaps I missed two months, I never would have thought it would be 60 degrees (Fahrenheit) in Columbus, OH on February 1st. Anyway....Read more...