F5 iControl REST

F5 utilizes iControl REST API as part of their automation toolkit. REST API is a powerful way to automate F5 management. iControl REST API was introduced by F5 in 11.5 code version. 11.6 code version is the first major code version with a relatively stable release. However, 11.6 does not support remote authentication like TACACS+. For iControl REST API with remove authentication, it is important to utilize 12.x code version. F5 programmability training documentation and related information are available here.

GTM Code Upgrade

These are a few quick checks as part of the GTM code upgrade maintenance that will be useful.  As part of the maintenance preparatory work, check the license “service check date” as per K7727

Before starting the code upgrade and after the code upgrade, the following can be utilized to check the status of the devices:

From tmsh:

(/Common)(tmos)# show sys software
(/Common)(tmos)# show gtm server | grep -e "Gtm::" -e "Availability" -e "State"

From bash:

/shared/bin/big3d –v

From another client machine:

dig @<GTM1_IP> <WIP_FQDN>
dig @<GTM2_IP> <WIP_FQDN>

Just after the code is upgraded, make sure to run the big3d_install commands as per K13312. This will help to make sure that all the devices run the latest big3d version.

F5 GTM – DNS Query Processing Order

When a DNS query arrives at a F5 GTM/DNS, this is the processing order for the DNS query.

1 – DNS Query is processed by the Listener.

2- If Recursion Desired (RD) flag is set in the incoming query and if the DNS Profile associated with the Listener has “Process Recursion Desired” enabled, the following is done:

a. DNS iRule

b. DNSSEC Key Processing

c. DNS Express

d. DNS Profiles

3 – If Recursion Desired (RD) flag is set in the incoming query and if the DNS Profile associated with the Listener has “Process Recursion Desired” disabled, the query is considered “Un-handled” and dispatched according to “Unhanded Query Action” set in DNS Profile.

4 – DNS Cache is used to handle any DNS query that doesn’t match Big-IP GTM/DNS or DNS Express Records.

Reference: K14510

Thoughts on F5 Deployment

This is a simplified check list for GTM & LTM deployment based on my experience.

Don’t deploy GTM in HA pair in a single DC:

GTM devices work in a synchronization group across geographic regions. If you deploy GTM-1 in DC-1 and GTM-2 in DC-2, these 2 GTM devices will serve as Active-Active HA Pair for most deployments.

There is no reason to have GTM-1A & GTM-1B in DC-1 and GTM-2A & GTM-2B in DC-2, where the A & B devices are part of an HA pair. Option-B design is just an overkill.

F5_GTM_Deployment

GTM to LTM VS Health Monitors:

Let the LTM monitor its pool members. The GTM will obtain the status of the LTM pool members from the LTM. GTM directly monitoring the LTM pool members or LTM VS is not required in most cases.

Don’t deploy LTM in Active-Active state:

There is a lot of room for flexibility with this statement. I haven’t found the Active-Active pair to be reliable in production environment. Such deployments are rare in the industry and hence, if there are any bugs with Active-Active pair, there is a greater chance that your deployment could be affected by the software bug than the Active-Standby setting which is widely deployed and stable.

Code Upgrade Checklist:

  1. Check license reactivation date before code upgrade.
  2. Check to make sure that the code you are utilizing is not End of Support.
  3. Save the configuration on the box and outside the box whenever upgrading code.
  4. Save the license number on the box in a /var/tmp file and off the box.

Code Version:

You are better off picking one of the Long Term Stable code version releases as provided in this K5903.

Route Domains:

Route domains are F5’s version of VRF. Although they are easy to deploy, they can be a pain to troubleshoot with the “%” addition to IP address.

iRule Vs LTM Policy:

My personal preference is always iRule as it provides greater granular control and flexibility. However, LTM Policy may be better optimized and can provide lower latency for the same function. LTM Policy can be an effective substitute for simple iRules.

Pool & Pool Members:

As a rule of thumb, all the members in a pool should have the same port. For example: POOL_WEB_80. This naming convention tells you that this is a pool of web servers that are listening on port 80 even without looking into the individual pool members. Having multiple members listening on a common port as part of a single pool helps in the long run to scale the configuration without adding significant operational complexity.

Naming Convention:

While creating custom VS, Pool, Monitor etc, I recommend starting with capital letters. For example:

POOL_Web_80
VS_domain.com_80
VS_domain.com_443
MON_HTTP_80

The default configuration elements like tcp, http monitor tends to be lower case in F5. The capital letters helps to distinguish between user-created (custom profiles) and default profiles which are all lower-case.

Custom & Default profiles:

NEVER use default configuration elements. Create custom configuration elements with the default configuration elements as parent profile.Whenever configuration change is required, always change only the custom profiles. This will prevent someone from accidentally changing the default profiles while working on the F5.

F5 TMM Crash

We were using a DNS VS listening on port 53 but configured to handle TCP protocol as shown here:

ltm virtual /Common/VS_DNS {
 destination /Common/10.10.10.10:53
 ip-protocol tcp
 mask 255.255.255.255
 pool /Common/pool_dns
 profiles {
 /Common/tcp { }
 }
 source 0.0.0.0/0
 source-address-translation {
 pool /Common/SNAT-10.10.10.10
 type snat
 }
 translate-address enabled
 translate-port enabled
 }

An iRule was using RESOLV::lookup against the configured TCP VS. RESOLV::lookup uses UDP requests and since the VS was configured to handle only TCP, the F5 crashed with core file generated.

According to F5 Engineers, bug alias 570575 is associated with this condition where RESOLV::lookup against a TCP Virtual Server causes the F5 to crash generating a core file in /var/core/

The workaround involved using “ip-protocol” as “any” and “profiles” as “fastL4” for a configuration that looks like this:

ltm virtual /Common/VS_DNS {
 destination /Common/10.10.10.10:53
 ip-protocol any
 mask 255.255.255.255
 pool /Common/pool_dns
 profiles {
 /Common/fastL4 { }
 }
 source 0.0.0.0/0
 source-address-translation {
 pool /Common/SNAT-10.10.10.10
 type snat
 }
 translate-address enabled
 translate-port enabled
 }

 

F5 Certification – Concepts

F5 certification bridges the gap between Networking and Advanced Application Layer Stack. It takes about 8-12 months to develop a test. I was fortunate to be part of the Item Development Workshop (IDW) for F5 201v2 exam and wanted to share some of the information I learned during the IDW.

Key Development Concepts utilized during the IDW:

Reliability: Consistent and precise questions.

Fairness: Does not put any group under disadvantage.

Validity: Accurately and appropriately measures what is relevant.

Reliability is not just related to the individual items but the exam as a whole. In essence, reliability of an exam is measured by the consistency of an individual’s score over multiple attempts, assuming the individual’s ability hasn’t changed substantially over the many attempts.

Validity is similarly not just about an item but the overall exam. Validity is how well the proposed purpose of the exam meets the outcomes of the exam.

Minimal Competency:

Minimally Qualified Candidate (MQC) is someone who meets the minimum requirements defined by the syllabus. A rough definition of MQC is that of an MQC Lawyer who may or may not have the skills to become a Supreme Court Judge but society is comfortable with them practicing law as the MQC Lawyer satisfies widely accepted qualification standards.

Cognitive Complexity and Difficulty:

The difference between low and high difficulty is knowing pi is 3.14 or 3.1415926535. Cognitive Complexity is using pi to find the area of circle with specific radius. Based on the blueprint of the exam, up to the 3xx level exams, only Remember and U/A were used extensively in the topics. A/E shows up more in the 4xx level exam. The difference in cognitive complexity for multiple topics are provided in the blueprint of the exam.

Cognitive Complexity:

  1. Remember (R)
  2. Understand / Apply (U/A)
  3. Analyze / Evaluate (A/E)
  4. Create (C)

Remember:

The Remember cognitive complexity generally tests rote memorization and information retrieval. There is a general preference against Remember (R) questions. So, instead of asking a question to list the TCP flags, a question that requires understanding of TCP flags in order to answer the question is preferred.

Understand/Apply:

U/A is utilized to test application of concepts within standard operations. U/A requires an understanding of processes and the ability to pick the right process to solve a problem while being able to compare multiple processes.

Analyze/Evaluate:

A/E tests the ability integrate new information with existing information to provide answers. Diagnose a problem and understand the relationship between the concepts and how one concept influences other concepts.

Create:

Create (C) tests the ability to create new products/solutions by utilizing new or existing concepts.

Items:

Items are the questions and possible options that show up in the exam. An item consists of a Stem and Options. Stem is a combination of Problem statement and Question Statement. Options can be Distractor options or Key option(s). The Key option(s) is the right answer in the item.

Remember requires only question statement. Problem statement is required for U/A, A/E and C. An ideal MQC should be able to determine the Key without having to read the options. This is one of the reasons why time is an important aspect to differentiate the competence of an exam taker. An ideal MQC should know the answer without looking at the options. Others may have to check all the options which means they will end up spending more time per question and may run short on time.

The Stems were constructed with positive words (Positive Construction). Almost all the stems eschew negative words like NOT, NEVER, EXCEPT that could potentially lead to a wrong answer as the candidate may miss the key negative word while reading the Stem.

Each item is intended to focus on a single trait that is being tested instead of multiple traits as much as possible. A trait is a subtopic that is utilized within the blueprint. Higher complexity question could have multiple traits utilized. The item is intended to be congruent to cognitive complexity & content identified in the exam blueprint without introducing any irrelevant variance that is not required to answer the question.

Response options (Distractor options and Key options) should be similar in terms of length and logic in order to prevent the option from being an obvious wrong/right answer.

Delayed ACK and Nagle’s Algorithm

In this article, I am taking a shot at trying to explain the interaction between Delayed ACK and Nagle’s algorithm and how this could add latency during TCP session that requires transmission of small packets.

MSS:

Maximum Segment Size or MSS denotes the data that is being sent in the “Segment” utilized in the TCP layer of the OSI model. Default MSS is 536 Bytes. Default MTU is 576 Bytes.

MTU = MSS + 20B (TCP Header) + 20B (IP Header)

TCP Transmission:

RFC 1122 talks about the conditions under which data can be transmitted in TCP implementations. Data is transmitted when any of the following 3 conditions is met.

1. Immediately, if a full MSS or more can be sent.

2. {[No unacknowledged data] && 
    [(PSH flag is set) || 
     (Buffered data > 1/2*(SND Window))]}

3. {(PSH flag set) && (Override timeout expired)}

1/2*(SND Window) is implementation dependent and can differ across Operating System and within different versions of Operating System. Override timeout is roughly 200ms. This value could change between OS too. ACK Number represents bytes and not packets.

Delayed ACK:

Delayed ACK helps in avoiding “Silly-Window-Syndrome” (SWS) at the Receiver. The Receiver will delay sending an ACK in response to data received when all the 3 conditions match.

1. When there are no 2 packets / 2*MSS received.

2. When the client has no data to send.

3. When the Delayed ACK timer has not yet expired.

In the UNIX World, 2*MSS has to be received by Receiver in order for it to send an ACK and in the Windows World, 2 packets of any size has to be received by Received in order for it to send an ACK.

Nagle’s Algorithm (RFC896):

The goal of Nagle’s algorithm is to lower the number of small packets exchanged during a TCP session. This helps in avoiding “Silly-Window-Syndrome” (SWS) at the Transmitter.

Nagles algorithm can be summarized as follows:

A. If there are unacknowledged data 
   (i.e., data in flight > 0 Bytes), 
   new data is buffered.

B. If data to be sent is less than MSS, it is 
   buffered till the data to be sent is 
   greater than or equal to MSS.

Problem:

Under the right conditions, the 1-3 points outlined under Delayed ACK and A-B points outlined in Nagle’s Algorithm will freeze the interaction between the sender and the receiver for the duration of timeout which is roughly 200ms. This is often seen in applications that rely on smaller packet sizes.

A Simple Scenario:

Sender is a client machine that updates the Receiver with information. Receiver could be some kind of a data warehouse which stores information on financial transactions. In this case, the Sender has data to send to the Receiver and the Receiver acknowledges the data received and does not transmit any data to the client in response other than a simple ACK.

During the course of a TCP session, Sender has just sent 500B of data to the Receiver and this matches condition A outlined in Nagle’s algorithm. The application at the Sender side moves 400B of data to the TCP stack. At this point, Sender has not yet received an ACK from the Receiver and because the next 400B of data meets the B condition outlined in Nagle’s algorithm.

Thus, the 400B of data will be buffered till either one of the Nagle’s condition is met:

A. ACK is received from Receiver for the 
   previously sent 500B of data.

B. Application sends the TCP stack more data 
   that will push the existing buffered data 
   (400B) more than MSS i.e., application needs 
   to send 136B or more to the TCP stack in 
   order to push the buffered data to or beyond
   the MSS limit.

On the Receiver side, the receiver will refrain from sending an ACK after receiving the first 500B of data because the 1-3 conditions outlined under Delayed ACK hasn’t been met.

1. Only 1 packet of 500B (less than MSS) has been 
   received.

2. Receiver does not have any data to transmit, 
   other than ACK.

3. Delayed ACK timer has not yet expired.

Sender keeps the 400B of data buffered. Receiver will not ACK the previously sent 500B of data. Effectively, there will be a communication freeze between the Sender and the Receiver till the timeout expires. Usually this timeout is 200ms in different OS implementations.

For further understanding, I would highly recommend this youtube video on this subject by Hansang Bae.

F5 iRule – URI Masking

Requirement:

Client sends request to http://xyz.com/

Server needs to process http://xyz.com/append but client should only see http://xyz.com/ i.e., the URI  /append should not be visible to the client.

when HTTP_REQUEST { 
if { ([HTTP::host] equals "www.xyz.com") and ([HTTP::uri] eq "/") } { 
HTTP::uri "/append" 
} 
} 
when HTTP_RESPONSE {
if { [HTTP::header values Location] contains "/append" } {
HTTP::header replace Location [string map {/append /} [HTTP::header value Location]]
}
}

The F5 will complete the following steps using the iRule provided above:

F5 will add URI “/append” to the incoming request.

F5 will replace “/append” with “/” in the response from the server to the client.

Reference:

Mask URI – Devcentral Thread

Github

F5 – SSL Cert Expiration

K14318 – Identifying expired certs and certs about to expire in 30 days.

K15288 – Email reminder for cert expiration.

A few one-liners from bash to identify the cert expiration date:

Identifying the expiration date from the certificate name:

~ # tmsh list sys file ssl-cert domain.crt | grep expiration
    expiration-date 1505951999
    expiration-string "Sep 20 23:59:59 2017 GMT"

 

Identifying the Client SSL profile for a certificate:

~ # tmsh list ltm profile client-ssl one-line | grep domain.crt | awk '{print $3,$4}'
    client-ssl CLIENTSSL-domain.com

 

Identifying the Virtual Server from Client SSL profile:

~ # tmsh list ltm virtual one-line | grep CLIENTSSL-domain.com | awk '{print $2,$3}'
    virtual VS-10.10.10.10-Public

 

Identifying the expiration date for cert associated with VS:

~ # echo | openssl s_client -connect 10.10.10.10:443 2> /dev/null | openssl x509 -noout -dates
notBefore=Nov 21 00:00:00 2016 GMT
notAfter=Nov 22 23:59:59 2017 GMT

 

F5 Virtual Server – Order of Precedence

The VS order of predence differs with code version and the tm.continuematching db variable. This tm.continuematching db variable is set to false by default and hence, a lower predence VS does not handle the traffic if there exists an higher predence VS in a disabled state. If the traffic has to be handled by lower precedence VS when the higher precedence VS is disabled, we would have to set this db variable as true:

11.x Code Version:

(tmos)# modify /sys db tm.continuematching value true
(tmos)# save /sys config

9.x – 10.x Code Version:

bigpipe db TM.ContinueMatching true
bigpipe save all

The order of predence for VS processing for different code version is provided below.

Order of Precedence for code version: 9.2 – 11.2.x

<address>:<port>
 <address>:*
 <network>:<port>
 <network>:*
 *:<port>
 *:*

Order of Precedence for code version: 11.3 and later

Order Destination Source Service port
1 <host address> <host address> <port>
2 <host address> <host address> *
3 <host address> <network address> <port>
4 <host address> <network address> *
5 <host address> * <port>
6 <host address> * *
7 <network address> <host address> <port>
8 <network address> <host address> *
9 <network address> <network address> <port>
10 <network address> <network address> *
11 <network address> * <port>
12 <network address> * *
13 * <host address> <port>
14 * <host address> *
15 * <network address> <port>
16 * <network address> *
17 * * <port>
18 * * *