F5 Certification – Concepts

F5 certification bridges the gap between Networking and Advanced Application Layer Stack. It takes about 8-12 months to develop a test. I was fortunate to be part of the Item Development Workshop (IDW) for F5 201v2 exam and wanted to share some of the information I learned during the IDW.

Key Development Concepts utilized during the IDW:

Reliability: Consistent and precise questions.

Fairness: Does not put any group under disadvantage.

Validity: Accurately and appropriately measures what is relevant.

Reliability is not just related to the individual items but the exam as a whole. In essence, reliability of an exam is measured by the consistency of an individual’s score over multiple attempts, assuming the individual’s ability hasn’t changed substantially over the many attempts.

Validity is similarly not just about an item but the overall exam. Validity is how well the proposed purpose of the exam meets the outcomes of the exam.

Minimal Competency:

Minimally Qualified Candidate (MQC) is someone who meets the minimum requirements defined by the syllabus. A rough definition of MQC is that of an MQC Lawyer who may or may not have the skills to become a Supreme Court Judge but society is comfortable with them practicing law as the MQC Lawyer satisfies widely accepted qualification standards.

Cognitive Complexity and Difficulty:

The difference between low and high difficulty is knowing pi is 3.14 or 3.1415926535. Cognitive Complexity is using pi to find the area of circle with specific radius. Based on the blueprint of the exam, up to the 3xx level exams, only Remember and U/A were used extensively in the topics. A/E shows up more in the 4xx level exam. The difference in cognitive complexity for multiple topics are provided in the blueprint of the exam.

Cognitive Complexity:

  1. Remember (R)
  2. Understand / Apply (U/A)
  3. Analyze / Evaluate (A/E)
  4. Create (C)

Remember:

The Remember cognitive complexity generally tests rote memorization and information retrieval. There is a general preference against Remember (R) questions. So, instead of asking a question to list the TCP flags, a question that requires understanding of TCP flags in order to answer the question is preferred.

Understand/Apply:

U/A is utilized to test application of concepts within standard operations. U/A requires an understanding of processes and the ability to pick the right process to solve a problem while being able to compare multiple processes.

Analyze/Evaluate:

A/E tests the ability integrate new information with existing information to provide answers. Diagnose a problem and understand the relationship between the concepts and how one concept influences other concepts.

Create:

Create (C) tests the ability to create new products/solutions by utilizing new or existing concepts.

Items:

Items are the questions and possible options that show up in the exam. An item consists of a Stem and Options. Stem is a combination of Problem statement and Question Statement. Options can be Distractor options or Key option(s). The Key option(s) is the right answer in the item.

Remember requires only question statement. Problem statement is required for U/A, A/E and C. An ideal MQC should be able to determine the Key without having to read the options. This is one of the reasons why time is an important aspect to differentiate the competence of an exam taker. An ideal MQC should know the answer without looking at the options. Others may have to check all the options which means they will end up spending more time per question and may run short on time.

The Stems were constructed with positive words (Positive Construction). Almost all the stems eschew negative words like NOT, NEVER, EXCEPT that could potentially lead to a wrong answer as the candidate may miss the key negative word while reading the Stem.

Each item is intended to focus on a single trait that is being tested instead of multiple traits as much as possible. A trait is a subtopic that is utilized within the blueprint. Higher complexity question could have multiple traits utilized. The item is intended to be congruent to cognitive complexity & content identified in the exam blueprint without introducing any irrelevant variance that is not required to answer the question.

Response options (Distractor options and Key options) should be similar in terms of length and logic in order to prevent the option from being an obvious wrong/right answer.

Delayed ACK and Nagle’s Algorithm

In this article, I am taking a shot at trying to explain the interaction between Delayed ACK and Nagle’s algorithm and how this could add latency during TCP session that requires transmission of small packets.

MSS:

Maximum Segment Size or MSS denotes the data that is being sent in the “Segment” utilized in the TCP layer of the OSI model. Default MSS is 536 Bytes. Default MTU is 576 Bytes.

MTU = MSS + 20B (TCP Header) + 20B (IP Header)

TCP Transmission:

RFC 1122 talks about the conditions under which data can be transmitted in TCP implementations. Data is transmitted when any of the following 3 conditions is met.

1. Immediately, if a full MSS or more can be sent.

2. {[No unacknowledged data] && 
    [(PSH flag is set) || 
     (Buffered data > 1/2*(SND Window))]}

3. {(PSH flag set) && (Override timeout expired)}

1/2*(SND Window) is implementation dependent and can differ across Operating System and within different versions of Operating System. Override timeout is roughly 200ms. This value could change between OS too. ACK Number represents bytes and not packets.

Delayed ACK:

Delayed ACK helps in avoiding “Silly-Window-Syndrome” (SWS) at the Receiver. The Receiver will delay sending an ACK in response to data received when all the 3 conditions match.

1. When there are no 2 packets / 2*MSS received.

2. When the client has no data to send.

3. When the Delayed ACK timer has not yet expired.

In the UNIX World, 2*MSS has to be received by Receiver in order for it to send an ACK and in the Windows World, 2 packets of any size has to be received by Received in order for it to send an ACK.

Nagle’s Algorithm (RFC896):

The goal of Nagle’s algorithm is to lower the number of small packets exchanged during a TCP session. This helps in avoiding “Silly-Window-Syndrome” (SWS) at the Transmitter.

Nagles algorithm can be summarized as follows:

A. If there are unacknowledged data 
   (i.e., data in flight > 0 Bytes), 
   new data is buffered.

B. If data to be sent is less than MSS, it is 
   buffered till the data to be sent is 
   greater than or equal to MSS.

Problem:

Under the right conditions, the 1-3 points outlined under Delayed ACK and A-B points outlined in Nagle’s Algorithm will freeze the interaction between the sender and the receiver for the duration of timeout which is roughly 200ms. This is often seen in applications that rely on smaller packet sizes.

A Simple Scenario:

Sender is a client machine that updates the Receiver with information. Receiver could be some kind of a data warehouse which stores information on financial transactions. In this case, the Sender has data to send to the Receiver and the Receiver acknowledges the data received and does not transmit any data to the client in response other than a simple ACK.

During the course of a TCP session, Sender has just sent 500B of data to the Receiver and this matches condition A outlined in Nagle’s algorithm. The application at the Sender side moves 400B of data to the TCP stack. At this point, Sender has not yet received an ACK from the Receiver and because the next 400B of data meets the B condition outlined in Nagle’s algorithm.

Thus, the 400B of data will be buffered till either one of the Nagle’s condition is met:

A. ACK is received from Receiver for the 
   previously sent 500B of data.

B. Application sends the TCP stack more data 
   that will push the existing buffered data 
   (400B) more than MSS i.e., application needs 
   to send 136B or more to the TCP stack in 
   order to push the buffered data to or beyond
   the MSS limit.

On the Receiver side, the receiver will refrain from sending an ACK after receiving the first 500B of data because the 1-3 conditions outlined under Delayed ACK hasn’t been met.

1. Only 1 packet of 500B (less than MSS) has been 
   received.

2. Receiver does not have any data to transmit, 
   other than ACK.

3. Delayed ACK timer has not yet expired.

Sender keeps the 400B of data buffered. Receiver will not ACK the previously sent 500B of data. Effectively, there will be a communication freeze between the Sender and the Receiver till the timeout expires. Usually this timeout is 200ms in different OS implementations.

For further understanding, I would highly recommend this youtube video on this subject by Hansang Bae.

F5 iRule – URI Masking

Requirement:

Client sends request to http://xyz.com/

Server needs to process http://xyz.com/append but client should only see http://xyz.com/ i.e., the URI  /append should not be visible to the client.

when HTTP_REQUEST { 
if { ([HTTP::host] equals "www.xyz.com") and ([HTTP::uri] eq "/") } { 
HTTP::uri "/append" 
} 
} 
when HTTP_RESPONSE {
if { [HTTP::header values Location] contains "/append" } {
HTTP::header replace Location [string map {/append /} [HTTP::header value Location]]
}
}

The F5 will complete the following steps using the iRule provided above:

F5 will add URI “/append” to the incoming request.

F5 will replace “/append” with “/” in the response from the server to the client.

Reference:

Mask URI – Devcentral Thread

Github

F5 – SSL Cert Expiration

K14318 – Identifying expired certs and certs about to expire in 30 days.

K15288 – Email reminder for cert expiration.

A few one-liners from bash to identify the cert expiration date:

Identifying the expiration date from the certificate name:

~ # tmsh list sys file ssl-cert domain.crt | grep expiration
    expiration-date 1505951999
    expiration-string "Sep 20 23:59:59 2017 GMT"

 

Identifying the Client SSL profile for a certificate:

~ # tmsh list ltm profile client-ssl one-line | grep domain.crt | awk '{print $3,$4}'
    client-ssl CLIENTSSL-domain.com

 

Identifying the Virtual Server from Client SSL profile:

~ # tmsh list ltm virtual one-line | grep CLIENTSSL-domain.com | awk '{print $2,$3}'
    virtual VS-10.10.10.10-Public

 

Identifying the expiration date for cert associated with VS:

~ # echo | openssl s_client -connect 10.10.10.10:443 2> /dev/null | openssl x509 -noout -dates
notBefore=Nov 21 00:00:00 2016 GMT
notAfter=Nov 22 23:59:59 2017 GMT

 

F5 Virtual Server – Order of Precedence

The VS order of predence differs with code version and the tm.continuematching db variable. This tm.continuematching db variable is set to false by default and hence, a lower predence VS does not handle the traffic if there exists an higher predence VS in a disabled state. If the traffic has to be handled by lower precedence VS when the higher precedence VS is disabled, we would have to set this db variable as true:

11.x Code Version:

(tmos)# modify /sys db tm.continuematching value true
(tmos)# save /sys config

9.x – 10.x Code Version:

bigpipe db TM.ContinueMatching true
bigpipe save all

The order of predence for VS processing for different code version is provided below.

Order of Precedence for code version: 9.2 – 11.2.x

<address>:<port>
 <address>:*
 <network>:<port>
 <network>:*
 *:<port>
 *:*

Order of Precedence for code version: 11.3 and later

Order Destination Source Service port
1 <host address> <host address> <port>
2 <host address> <host address> *
3 <host address> <network address> <port>
4 <host address> <network address> *
5 <host address> * <port>
6 <host address> * *
7 <network address> <host address> <port>
8 <network address> <host address> *
9 <network address> <network address> <port>
10 <network address> <network address> *
11 <network address> * <port>
12 <network address> * *
13 * <host address> <port>
14 * <host address> *
15 * <network address> <port>
16 * <network address> *
17 * * <port>
18 * * *

F5 – Bleeding Active Connections

Scenario:

A Virtual Server is load balancing connections to a pool with 2 pool members. During maintenance window, one of the two pool members is disabled and maintenance is completed followed by the other pool member.

However, as the users make continuous API calls every 5 seconds, the existing TCP connection never bleeds out. Even after waiting for 24 hours, connections still exist on the disabled pool member.

Solution:

By default, F5 makes load balancing decision when the 1st HTTP request within a TCP connection is received. Subsequent HTTP request within the TCP connection are sent to the same pool member as the very 1st HTTP request.

By enabling OneConnect profile with a /32 netmask (255.255.255.255), we were able to force the F5 to make load balancing decision for every HTTP request instead of its default behavior.

The OneConnect profile used along with disabled or forced-offline setting will move the connection from the failed pool member to the active pool member.

Reference Link.

Sub-Domain Delegation GTM/DNS

 

Lets say that you have domain.com hosted with a 3rd party DNS provider and you would like to create GTM (BigIP-DNS) DNS load balancing by utilizing Sub-Domain Delegation.

In this scenario, there are 2 GTM. One in each DC (DC-1 & DC-2). The basic set up has been completed and the GTMs are in a common sync-group.

Create A-Records for the 2 GTM using their Listener IP addresses:

 gtm1.wip.domain.com. IN A 100.100.100.100
 gtm2.wip.domain.com. IN A 200.200.200.200

gtm1 and gtm2 exist in DC-1 and DC-2 respectively and 100.100.100.100 & 200.200.200.200 are the listener IP address configured on gtm1 and gtm2.

Delegate the sub-domain to the GTM using NS Records:

 wip.domain.com. IN NS gtm1.wip.domain.com.
 wip.domain.com. IN NS gtm2.wip.domain.com.

Use CNAME records:

www.domain.com. IN CNAME www.wip.domain.com.

The above DNS records (A, NS & CNAME) will be added to the 3rd party DNS records that is hosting domain.com. Any request for

www.domain.com

will be sent to the 3rd party DNS provider which will then resolve to

www.wip.domain.com

because of the CNAME and that will be handled by the GTMs because of the NS & A records.

SOL277 – Sub-domain delegation.

OneConnect & HTTP Requests

This is a copy/paste of a Q&A in devcentral. I didn’t change it as it is quite descriptive and gets the point across.

Current Setup:

We are using Cookie Insert method for session persistence. So LTM adds “BigipServer*” Cookie in the http response header with value as encoded IP address and port name. Subsequent requests from the client (in our case browser) will have this cookie in the request header and this helps LTM to send the request to same server. This LTM cookie’s expiry is set to session, so this cookie will be cleared when we close the browser or we expire it using iRule.

Use Case:

We have set of servers configured as pool members serving traffic to users who are logged in. During release time, we will release the code to new set of servers and add those servers also to the LTM pool. LTM will now have servers with both old code as well as new code. We disable all servers which has the old code so that LTM routes only the requests which already has “BigipServer*” Cookie value pointing to those servers. This will not interrupt the users who are already logged in and doing some work. All new requests (new users) will be load balanced to any of the active servers which has new code. We will ask our already logged in users to logout and login back again once they are done with the current work. We have an iRule configured to expire the LTM cookie during logout, so our expectation is that users will be connected to new servers when they are logging in again.

Problem:

Even though iRule expires the LTM cookie during logout and the cookie is not present the request header of login, users are still routed to the same disabled server when they are logging in again. Ideally, LTM should have load balanced the request to any of the active servers.

Root Cause:

Upon analyzing this further with network traffic, we found that, whenever the browser has a persistent TCP connection open with LTM after logout, browser uses that existing TCP connection for sending the login request. LTM routes this login request to the same disabled server which handled the previous request even though LTM cookie is not present in the request header. If we close the TCP connection manually after logout (using CurrPosts or some other tool), the browser establishes a new connection with LTM during login and LTM load balances this requests to any active server. One option for us is to send “Connection: close” in the response header during logout, but the browser may hold multiple persistent TCP connections (I have seen browser holding even three connections) and hence closing a single TCP connection will not help. Other option is to close the browser, but we don’t have that choice for reasons I cannot explain here (trust me).

SOLUTION:

Try using the following:

  1. OneConnect Profile in VS with netmask of /32.
  2. Action on Service Down in the Pool set to Reselect.

(1) will force the load balancing decision to be made for every HTTP request instead of the the default of lb decision being made only for the 1st HTTP request within a TCP connection.

(2) will force the HTTP Request to be sent to a new pool member when the selected member is down as the load balancing decision is made for every HTTP request instead of the very 1st HTTP request within a persistent/keep-alive connection.

Keep-alive Connection (also referred to as Persistent Connection) is used to refer to the same feature provided by HTTP1.1 where you can utilize a single TCP connection in order to send multiple HTTP requests within a single TCP connection.

 

Adding a Blade to a Viprion

Normally, when you add the new blade, the current master blade will synch it’s configuration onto the new blade. Make sure that the existing blade is master. Backup all relevant configuration on the device and off the device before adding new blade.

Make sure that the blades are same model and see K16992

Look at K13965 🙂 to identify the master blade.

Considerations when moving blade between chassis: K10541271

F5 Code Upgrade Steps

This is a rough template of F5 Code Upgrade steps that could be of help for your maintenance work.

  1. Before performing any F5 code upgrade, make sure that the “Service Check Date” on the device is AFTER the License Check Date for the new code version as listed here in SOL7727
  2. Upload the new code to the partition that you prefer on the F5.
  3. cpcfg to the new code version location – Example: cpcfg HD1.2

    Although “cpcfg HD1.x” has worked most of the times, I would recommend backing up the .UCS file in a remote location and also saving a copy in “/shared/tmp/<UCS File>“. After saving the UCS file in the “/shared/tmp/” location, you can utilize “load /sys ucs <path/to/UCS> no-license” to load the configuration as noted in SOL12880

  4. Reboot.This will take about 5-10 minutes for Hotfix updates and about 15-20 minutes when migrating major code version.

Recommended maintenance window is about 1 hour. This could change depending on any application level testing that you would like to incorporate within your maintenance window.

Reference:

F5 Code Upgrade – 10.x to 11.x