What is Static Code Analysis?

What is Static Code Analysis?

Static Code Analysis, also called Static Application Security Testing (SAST) or source code analysis, is a white box testing process of analyzing and evaluating a program’s code for vulnerabilities while it is being developed. In recent years, demand for software companies to deliver high security in software products has established a concern among developers about security-related threats connected to their products’ source code. The purpose of such an analysis is to identify all possible loopholes of the software system which can compromise a system’s security when deployed into the customer’s network. 

Static Application Security Testing can occur very early in the software development life cycle because it does not require the application to be executable. It helps developers identify vulnerabilities in the initial stages of development and solve them before the application is complete.

Static Analysis 

Static Code Analysis provides developers with an understanding of possible errors in their code, with which they can remove any loopholes before proceeding to the next stage of development. The reported representation of code errors makes navigation easier. SAST Tools can also provide in-depth guidance on the issue and how it can be fixed.

What Comprises a Static Code Analysis?

Static analysis involves a set of rules and methods used to analyze the source code of a program and establish criteria to check its correctness and reliability. It reviews the source code without executing it and reveals a wide variety of information such as the structure of the model used, performance modeling and optimization techniques, control flow, syntax accuracy, fault localization, assertion discovery, clone detection, debugging, application reliability and many more.

But given the variety of the information, this process is very extensive, requiring multiple developers and a lot of time and resources to complete the testing manually. Depending upon the type of information that the developer wants to obtain, there are multiple Static Code Analysis techniques:

Interface Analysis

This static code analysis focuses on risks associated with the factors of user interaction with the program. It includes the user interface structure and any errors associated with the user interaction model. With the help of this analysis, developers can accurately identify any risks or errors associated with how the interface and interaction are simulated to the user.  

Control Analysis

This static flow analysis works on finding errors and loopholes in the control flow of the calling structures used in the source code. The sequence of calling these processes or functions is analyzed along with their associated conditions. The control transfers are mapped through this static analysis technique, and any liabilities are identified. 

Fault Analysis

Fault analysis utilizes logic to determine faulty or inoperable parts of the source code. Such faulty code can lead to possible vulnerabilities in the system. These risks are identified and prevented through static code analysis by analyzing the applied conditions in the code.

Data Analysis

Data analysis is used to collect data related to objects such as data structures. This analysis ensures that the code has accurate operations and that it adequately utilizes the defined data. Static analysis using this technique helps maintain the accuracy, definition, and context of data. The program checks if they use these things accurately or have any vulnerabilities.

Types of Static Code Analysis

Static code analysis can be done at any stage of the software’s development cycle, but performing it at earlier stages reduces the cost, time, and risk of detecting these errors later. Static flow analysis is traditionally done manually, but the process is time and resource intensive. For more extensive programs, performing manual analysis becomes almost impossible when bound by time constraints. Such rigorous testing also requires the analyzer to know proper code security measures and techniques of static analysis. Furthermore, even if the manpower is increased to allow faster testing, it is still prone to many errors.

Modern-day developers have access to static code analysis tools that automate the process of this security analysis, making it much faster and easier. Both types are explored below:

Manual Static Analysis

Manual static code analysis is performed by human editors, and the time-consuming process requires analysis of only a limited number of liabilities at a time. Developer’s reviews and 3rd party reviews are the two main types of manual static analysis based on whether the program is being developed by an individual or an organization.  

In the personal review, the programmer manually performs the static code analysis to evaluate the code for errors and risks. While in the case of a team or organization, the analysis is performed by one or more people rather than the original developer. For a more professional manual analysis, the developer can even hire third parties to perform the analysis.

Self-analysis performed by the developer involves analyzing the code as it is written and structuring a proper review process after the code is written. A third-party analysis is a formal approach that provides:

  • Documentation of all the defects found.
  • An estimate of rework effort.
  • Any suggestions to improve the control or data flow of the code. 

Automated Static Analysis

Automated static code analysis is performed with the help of tools. They consistently check the code for errors in real-time. This type of analysis does not require the developer to know the static code analysis techniques. It follows a set of rules predefined to the tools to find any errors, risks, and loopholes in the code. These tools drastically reduce the human labor and time required to perform static analysis. 

These tools analyze the code and provide report feedback highlighting major flaws, errors, and security issues found in the code. By automating the analysis process, the developer now only has to check the identity errors for false positives and negatives and correct the accurate errors manually. 

Conclusion

Static Code Analysis analyzes and evaluates a code during development to check for security risks. It’s an increasingly significant process, due to rising client concern for high security products. That’s why we’ve discussed what happens in it, what its types are, and how they function. We’ve also written a number of articles on other essential analysis tools such as lexical and data flow analysis, so to secure your systems further, be sure to check those out.

Static and Dynamic Code Analysis

Static and Dynamic Code Analysis

Code analysis is the testing of source code for the purpose of finding errors and vulnerabilities before a program is distributed to the customer or made open to the market. This code is a written text which is written by developers in text editors and then provided to automated tools to obtain an analysis of possible errors and loopholes in it. 

Static Code Analysis vs Dynamic Code Analysis

The output of a source code analysis consists of high-level information relating to the program, including metrics identifying program properties, function invoke rates, call graphs and other automated representations. Based on these outputs, with one focusing on code structure and code understanding and the other on the code’s performance during the program’s execution, two types of code analysis can be characterized. To summarize both types of code analysis, Static Analysis is concerned with code properties and source code metrics, and Dynamic Analysis is with execution metrics.

Static Code Analysis

Static Code Analysis is classified as a white box testing performed at the early stages of program development. It highlights vulnerabilities in the source code without executing it. Some ways to perform static analysis are Data Flow Analysis and Taint Analysis. 

Modern Static Code Analysis is performed by tools that automate the process of finding security flaws, which are reviewed by the developer to analyze the results and take the proper actions to perform corrections. These tools can be integrated into famous platforms to perform real-time analysis as the program is being developed to reduce the time of performing a code analysis separately. This real-time analysis is more practical than finding vulnerabilities at the end stages of the Software Development Life Cycle. 

Advantages

Static Code Analysis, when performed by an analyzer which has a proper understanding of the tool and its rules, offers a variety of benefits:

  • Thorough Code Analysis:

Static Code Analysis checks the complete code from start to finish, even the code that is not used by the program. This ensures that all errors, bugs and vulnerabilities are identified.

  • Test According to Manually Set Rules:

Performing Static Code Analysis according to preset rules can still provide adequate results. But it is best to personalize the rules according to your project to lower the rate of false positives and get a more accurate result. 

  • No Need for Execution:

Static Code Analysis does not require the program to be a finalized product or an executable code. This analysis assists the developer in the development process by performing real-time analysis of the code. 

  • Highlight Exact Error and its Location:

This analysis can point out the exact code causing the error or bug. Additionally, it provides a short synopsis of what that error could be, making it easy to solve. 

  • Reduced Cost and Time Consumption:

Performing this analysis at the early stages of the SDLC can help solve the risks early. Therefore, you avoid the cost and time extensive process of tackling these errors at the final stages.  

Disadvantages

  • False Positives:

Performing Static Code Analysis without proper configuration can result in false positives being reported as errors, resulting in more time consumed to get to the more critical errors.

  • Not Concerned with the Runtime Environment: 

Static code analysis is applied without running the application, focusing mainly on the structure rather than the code’s operations or the program’s performance factor. 

  • Not All Languages are Supported:

The static code analysis tool might not support the language used in your program.  

  • Can be Time Consuming:

If the analysis is not performed along the development process and is performed separately, the tools can take a lot of time to finalize an analysis of the complete code.

Dynamic Code Analysis

Dynamic code analysis is designed to test an executable application for any vulnerabilities that can compromise the application’s security or proper working. This analysis simulates an actual runtime environment to identify errors associated with compile time and runtime processes.

When performing this analysis, tools attack the executed program from multiple angles with potentially malicious inputs. All negative responses are reported as errors and vulnerabilities. Dynamic Analysis is performed at the testing phase of the SDLC as it requires an application to run for it to be tested.

Advantages

  • Simulate the actual runtime environment

Dynamic Analysis can simulate a realistic deployment environment to analyze how the program will react to real-world attacks and potential exploits. This reduces damages caused by any errors found after the application is deployed into the market.

  • Can test the application without Code

Dynamic Analysis can be performed on any executable as it is concerned with the black box testing of the application. 

  • Can help Identify False Negatives in any Other Code Analysis

This analysis can point out any errors overlooked by other code analyses that are performed without executing the application. 

  • Detect Liabilities that Other Analysis Can Not: 

Since Dynamic Analysis detects vulnerabilities in a running application, it can detect loopholes in the application that are difficult to detect in source code. Some vulnerabilities, like memory allocation issues and configuration issues, are only visible when the code is active. These tools can also detect issues within third-party dependencies and libraries.

Disadvantages

  • Only points out the problem:

Dynamic analysis is not concerned with the code, so it cannot identify any structure errors. Furthermore, even if a problem is found, its source and cause are not detected. Thus developers have to find the source of the problem manually. 

  • Time Consuming:

The process is time-consuming and can take a huge effort to perform the initial analysis. After the analysis is done, the process of finding the cause of vulnerabilities and correcting them can involve backtracking, as it can only be performed after the application is complete. 

  • False Positives:

Like any code analysis, Dynamic Analysis can also result in many false positives, which need to be manually assessed by the analyzer. 

  • Difficult to Set Up:

Setting up an analysis depending on your project can be difficult. It might require expert help to perform an accurate, custom analysis. 

The Comparison

  1. Both types detect defects. The difference is in terms of the development stage that they can be implemented.
  2. Dynamic code analysis can miss errors relating to code properties and source code metrics, while static analysis can not identify vulnerabilities relating to execution metrics. 
  3. Testing the code comprises dynamic analysis, while reviewing the code defines static analysis.
  4. Both analyses can work independently but should be used to complement each other for the best security analysis.
  5. Static code analysis works best with code review, while Dynamic code analysis is performed as automated testing to generate real-time performance data.
  6. When performing Dynamic Code Analysis after Static Analysis, it is best to focus on factors such as performance, logic, and security validation as these are not concerned with the Static Code Analysis.
  7. In a development process, once the code issues are resolved through Static Code Analysis, they can be tested by real-time execution through Dynamic Analysis to remove all factors of data vulnerabilities.

Conclusion

Code analysis tests source code to detect errors before distribution of a program. It’s divided into two types: static code analysis, which identifies vulnerabilities in a code without execution, and dynamic code analysis, which simulates an actual runtime to identify error. Here, we’ve discussed how the work, and what their advantages and disadvantages are, so that you can determine which one you need to use. Good luck!

Data Flow Analysis

Data Flow Analysis

Data Flow Analysis (DFA) is a type of static analysis. It is a global optimization technique that provides information about the data flow along the line of a program/code execution with the goal of evaluating analysis at each and every program point. The most important characteristic of Data Flow Analysis is its “flow sensitiveness”. Flow sensitivity represents that DFA takes into account both the abso­­­lute order of the instructions within the programmes as well as their relative order, which is crucial to the analysis’s outcomes. 

This flow sensitivity is crucial because of the concept of dependency relations, which means that the instruction which is second in absolute order depends on a variable that has been defined in the first instruction. Due to the immense importance of these dependence relations, Data Flow Analysis (DFA) is usually represented in a data format that helps to not only visualize the entire program pathflow but also to represent the interrelations and the order of different instructions/statements with each other. 

Data Flow Analysis

Conventionally a format known as data flow graphs (DFG) is used in Data Flow Analysis in which the entire program is considered as a flow graph where all the possible outcomes of a statement can be visualized, and it becomes possible to describe how a specific statement shifts the program to an alternate pathway.

Constituents of Data Flow Analysis

Different constituents of DFA help in the optimization of the program in a specific way. These are as follows:

Available Expression Analysis

Available Expression Analysis is a forward analysis that analyzes data end-to-end, from start to finish. The goal of applying available expression analysis is to find out if, at a given program point, previously computed values or expressions can be reused along every path instead of recomputing the expressions again. 

It is a useful optimization technique to prevent recomputing of statements/expressions, i.e. to remove common subexpressions from the program/code. The occurrence of common subexpressions is fairly abundant, and it is important to optimize this problem in order to avoid complexity and ambiguity. Consider the following block of code:

x=a+b+p

d=q+r

y=a+b+d

By observing carefully, it is clear that the expression (a+b) has already been computed in the first equation. The reappearance of (a+b) in the third equation qualifies as a common subexpression, provided that the values of the variables “a” and “b” remain constant. The available expression analysis works to optimize the program by eliminating common subexpressions such as this (a+b) by substituting a single variable that stores the computed value for the common subexpressions.

Reaching Definition Analysis

It is also a type of forward analysis and is usually applied in cases of program optimization that involves one of the following scenarios:

  1. Elimination of dead code from Def-Use (DU) and Use-Def (UD) chains: 

A typical example of the Def-Use chain:

data flow analysis

By closely observing this DU chain, it can be concluded that only the variable “x” is active/live at the termination point, whereas all the other assignments, i.e. y,z and m, are regarded as dead code and must be removed through reading definition analysis in order to optimize this program.

  1. Copy Propagation:

The concept of copy propagation relates to that of common subexpressions. If dx: x = y is the only definition that gets to the point p with the requirement that y stays constant between the points of dx and p, then a copy propagation can be described as the replacement of a variable x at any program point p with a variable y. For example:

            if, x = y

         a = x + z, 

then x in this second equation can be replaced with y. The advantage of this optimization is that if all the “x” variables get replaced with “y”, then the definition of x becomes useless and thus can be removed. Even if all the variables are not replaced, it can still lead to simplification of the code with less instructions leading to its optimization and increased running speed.

Live Variable Analysis

Live variable analysis is used in the calculation of live variables, which are variables containing a value that might be required in the future. For example, consider a block of the following algorithm:

d = 6

b = 10

a = f(d * b)

data flow analysis

In the above example, b and d are live variables between line 2 and 3.

The Control flow graph of live variables is below, in which the liveness of the variable is through the edges:

In[b]= fb(out[b]}

Join node: a node with multiple successors

Meet operator:

            Out[b] + in[s1]U in[s2]U …. Uin[sn]. where

            S1,…..sn are all successors of b

Consider the following algorithms related to the above flowchart that will be used to represent a few properties associated with live variables in a general dataflow analysis framework:

input: CFG = (N, E, Entry, Exit)

// Boundary condition

in[Exit] = 

// Initialization for iterative algorithm

For each basic block B other than Exit

in[B] = 

// iterate

While (changes to any in[] occur) {

Every block B other than Exit {

out[B] = U(in[s]), 

in[B] = fB(out[B]) // in[B]=Use[B] U(out[B]-Def[B])

}
Properties of Live Variables

The properties of live variables in a general framework of a dataflow analysis in relation to the above algorithm are:

  • The direction of live variables is backwards: 
in[b] = fb(out[b])

out[b] =  in[succ(b)]
  • The transfer function associated with live variables is:
fb(x) = Useb  (x -Defb)
  • Boundary Condition for live variables in the data analysis framework is:
in[exit] = Φ
Types of Liveness

Each node of the flowgraph has an associated set of live variables, depending upon the Variable liveness is of two types:

  • Semantic liveness: A variable x is semantically live at a node n if there is some execution sequence starting at n whose behavior is affected by the changing value of x. It is usually undecidable and depends on the execution behavior of the application.
  • Syntactic liveness: This liveness is found in a variable x at a node if there is a path to the exit of the flowgraph along which its value and it may be used before having to be redefined. It is decidable and depends on the syntactic code of the program/code.

Conclusion

Data Flow Analysis is a form of static analysis that evaluates the data flow at each program point. It’s a flow-sensitive process, so it considers both the order of instructions within the program and their relative orders. That has significance for dependency relations. DFA optimizes a program through expression analysis, reaching definition analysis, and live variable analysis. All of these are crucial steps for static analysis that identifies errors in the code. That’s why we discussed them here. We also have articles on other analysis types, such as lexical and taint analysis. For more help, give them a read!

Lexical Analysis

Lexical Analysis

The very first stage of the compiler design process is lexical analysis. The changed source code that is written in the form of sentences is fed into a lexer. It transforms the input, which was originally just a string of letters, into a list of distinct tokens, including string and number constants, variable names, and programming language keywords. By eliminating any whitespace or source code comments, the lexical analyzer converts these syntaxes into a list of tokens.

An error is produced if a token is determined to be invalid by the lexical analyzer. A table of regular expressions and their related actions, which are expressed as C code fragments, make up the source code for a lex program. Therefore, if identification is discovered in the program, the action corresponding to the identifier is performed. In this situation, perhaps some details might be added to the symbol table. If a word like ‘if’ is detected, a different course of action would be pursued.

The syntax analyzer and lexical analyzer collaborate frequently. When the syntax analyzer requests it, it pulls character streams from the source code, verifies for legal tokens, and then sends the information.

Roles of Lexical Analyzer

The Lexical Analyzer completes the following tasks:

  • Aids in locating the token in the symbol table.
  • Eliminates blank lines and comments from the original program.
  • Relates error messages to the program’s source
  • Extends the macros for you if they are present in the source program.
  • The source program’s input characters for reading

Tokens

Lexemes are described as a token’s collection of (alphanumeric) characters. Every lexeme must conform to a set of preset rules in order to be recognized as a legitimate token. By using a pattern, grammar rules establish these rules. What constitutes a token is described by a pattern, and these patterns are defined using regular expressions.

Programming language tokens include words like “keywords,” “constants,” “identifiers,” “strings,” “numbers,” “operators,” and “punctuation symbols.”

After reading the input, this analyzer is in charge of sending tokens back to your program. By avoiding dealing with the specific characters that make up the input, you may focus on understanding its meaning. For instance, a straightforward lexical analyzer may provide tokens that are numbers that are formatted correctly. The input would then not need to be validated by your application before you tried to convert it.

It is possible to define precisely what values the tokens may take using regular expressions. Some tokens, such as if, else, and for, are merely keywords. Others, such as identifiers, can consist of any combination of letters and numbers as long as they don’t match a keyword and don’t begin with a digit. 

Here is an example of a code the analyzer receives.

#include <stdio.h>

    int maximum(int x, int y) {

        // This will compare 2 numbers

        if (x > y)

            return x;

        else {

            return y;

        }

    }

 

It will create the following tokens:

LexemeToken
intKeyword
maximumIdentifier
(Operator
intKeyword
xIdentifier
,Operator
intKeyword
YIdentifier
)Operator
{Operator
IfKeyword

Lexical Errors

Lexical errors are character sequences that cannot be translated into any legitimate token. Relevant details on the linguistic error:

  • Lexical mistakes are not very prevalent, but a scanner should handle them
  • Identifiers, operators, and keywords that are misspelled are regarded as lexical mistakes.
  • A lexical error typically results from the occurrence of an illegal character, usually at the start of a token.

Error Recovery in Lexical Analyzer

Here is a handful of the most popular mistake recovery methods:

  • Taking one character out of the remaining input.
  • Up until we achieve a well-formed token, we always disregard the following characters in panic mode.
  • Including the missing character with the input that is still present.
  • Swapping one character for another.
  • Swapping two serial characters around.

Token specifications

Let’s examine how language theory handles the following words.

Alphabets

Any finite collection of symbols can be divided into binary and hexadecimal alphabets (0, 1, 2, 3, 5, 6, 7, 8, 9), hexadecimal alphabets (0, 1, 2, 3, 5, 6, 7, 8, 9), and English alphabets (a–z, A–Z).

Strings

A string is any finite collection of alphabets (characters). The total number of times an alphabet appears in a string is its length. For example, the string has a length of 14 and is represented by |tutorialspoint| = 14. The symbol for an empty string, which is a string with no alphabets and zero length, is.

Language

An alphabetically limited set of strings is referred to as a language. Computer languages are seen as finite sets, and set operations can be applied to them theoretically. Regular expressions can be used to characterize finite languages.

Regular Expressions

Only a limited number of valid strings, tokens, and lexical items specific to the language at hand must be scanned and identified by the lexical analyzer. It looks for the pattern that the language rules have established.

By specifying a pattern for finite strings of symbols, regular expressions can express finite languages. Regular grammar is the grammar that regular expressions define. A regular language is a language that conforms to regular grammar.

For describing patterns, a regular expression is a crucial notation. Regular expressions act as names for a collection of strings since each pattern matches a particular set of strings. Regular languages can be used to describe programming language tokens. An illustration of a recursive definition is the definition of regular expressions.

Operations

The numerous language operations include:

Languages L and M are combined to form

L U M = s | where s is in either L or M.

L and M are two languages that have been combined.

If s is in L and t is in M, then LM = st

For example, a language L writes Kleene Closure as

L* = Language L used 0 or more times.

Using regular expression to represent a language’s legitimate tokens

The representation valid tokens of a language in regular expression looks like:

X must be a regular expression if:

x* denotes the absence of any instances of x.

It can produce e, x, xx, xxx, xxxx, etc.

x+ denotes the occurrence of x once or more.

In other words, it can produce either x.x* or x, xx, xxx,...

x? denotes a maximum of one instance of x

i.e., it can generate either {x} or {e}.

[a-z] is all lower-case alphabets of English language.

[A-Z] is all upper-case alphabets of English language.

[0-9] is all natural digits used in mathematics.

 

Advantages of Lexical Analyzer

  • Programs like compilers, which can take the parsed data from a programmer’s code to build a built binary executable code, use the lexical analyzer method.
  • Web browsers utilize it to format and display a web page using data that has been processed from JavaScript, HTML, and CSS.
  • You can create a specialized processor for the task that is potentially more effective with the aid of a separate lexical analyzer.

Moreover, the following aspects allow the Lexical analyzer to be preferred over parser analysis:

  • The design’s simplicity: By removing unnecessary tokens, it simplifies the lexical analysis and syntactic analysis processes.
  • To make compilers more effective: aids in increasing compiler effectiveness
  • Specialization: To enhance the lexical analysis process, specialized techniques might be used.
  • Only the scanner is necessary for communication with the outside world due to portability.
  • Greater portability: Lexer-only quirks relating to input devices.

Disadvantages of Lexical Analyzer

  • You must invest a lot of time reviewing the source code and splitting it into tokens.
  • Compared to PEG or EBNF rules, some regular expressions are relatively difficult to understand.
  • The lexer and its token descriptions require more work to build and test.
  • Lexer tables must be created, and tokens must be produced, adding more runtime overhead.

Conclusion

Lexical analysis feeds changed source code into a lexer, which transforms the input into tokens. Here, we discussed a lexical analyzer, its main roles, and its advantages and disadvantages. Furthermore, we explained lexical errors and how to recover them. For more guidance on similar processes crucial for your secure software development, check out our other articles on the topic.

Control Flow Graph

Control Flow Graph

A control-flow graph represents blocks’ computation or flow control during program and application execution. Developed by France E. Allen, a control-flow graph is primarily used in static analysis, where it is responsible for demonstrating the flow inside the program unit. The following are a few characteristics of a control-flow graph that will help you understand it better.

Control Flow Graph

The following are a few characteristics of a control flow graph that will help you understand it better.

  1. It is entirely process-oriented 
  2. It pinpoints all the possible paths that you can take during a program execution
  3. It comprises edges to show paths and nodes to show blocks

As a phoneme is the smallest unit of language that has the capability to change language, a basic block is the simplest unit of a control flow in any program. It is formed when different operations are sequenced together to execute properly. As such, if an operation falters or raises an issue, the block stops the execution. The basic block is a branch-free code sequence that begins with a labeled operation. The control enters a block at its first operation but exits at a branch, jump, or any other operation which also happens to be the signature operation of a block. This can be distinguished easily as:

  1. Entry Block: Point of entry of control in the control flow graph.
  2. Exit Block: Where the control leaves the block. 

The control-flow graph is also denoted as “cfg,” which is quite common in coding and programming. As you know, a basic block only operates when the entire sequence of operations is executed together. All the operational blocks incorporated in a flow diagram, including the start node, end node, and the flow between the nodes, comprise a control-flow graph. A cfg has a node for every basic block, while for every possible transfer between these blocks, the cfg has an edge. Now that you know the relationship between a cfg and a block, let us look at how they help in coding.

Determining the Leader

A basic block is simply a collection of three address code statements. The only way for a control to enter the block is through the first line, termed “leader.” There is no possibility of a jump or goto between the block, as the control exits only through the last statement. All the statements between the first and last one are a part of the basic block. This is how you can determine the leader in a basic block: 

  • The first line of any block will always be its leader. 
  • Moreover, the conditional or unconditional goto can also help determine the leader. It doesn’t matter if you are given the conditional or the unconditional goto because it will lead you directly to the leader. 
  • Keeping the goto in mind, another important point is that the line next to the one that has the mention of goto, is also a leader. Most students mistake this for the line directly after the address, which has goto labels.

So far, you have learned to determine the leader in many ways. However, one of the most common questions after determining all the leaders is what to do with the statements between the leaders. The following breakdown will help answer this question in a very simple manner. 

  1. The first statement is always the leader L1 (for understanding purposes)
  2. The address of conditional and unconditional goto is also a leader L2
  3. The statement that comes after the goto statement is also the leader L3

All the statements between the L1 and L2 will be a part of the first basic block. Similarly, the statements between L2 and L3 will be a part of the second basic block, and so on.

Variations in Control Flow Graphs

There can be many variations in the control-flow graph based on the statements and loops in question. However, the principle at heart remains the same. Let us look at some of the different types of images and loops to help you better understand the concept of flow-graph.

If-else: The if-else statement is used when you want to perform two operations for a single statement. As the name suggests, it is used to perform two different operations, i.e., one is for the correctness of any condition, and the other is for the incorrectness. If-else statements can be useful when the block you are working on cannot be executed simultaneously with the rest. Using this statement, you will have the option for an otherwise case with the given condition. A simple example of the if-else statement and its function is given below. 

In C language, if you want to determine whether the number is odd or even, you can simply use the following method:

#include<stdio.h>    

int main(){    

int number=0;    

printf("enter a number:");    

scanf("%d",n number);     

if(number%2==0){    

printf("%d is even number",number); 

}    

else{    

printf("%d is odd number",number);    

}     

return 0;  

}    

The respective output will be seen as:

Enter a number: 8

8 is an even number

Enter a number: 7

7 is an odd number 

The control flow graph of the if-else statement would be something like:

control flow graph

As you can see, the execution process becomes relatively simpler with the if-else statement because it does not restrict the block to a singular option. Programmers prefer this ease of use and incorporate this in coding regularly. 

The control-flow graph works similarly for other loops and statements such as while, do-while, and so on.

Conclusion

We used control flow graphs in static analysis to represent blocks’ computation or flow control when executing programs or applications. Cfgs and basic blocks are helpful components in coding, and we’ve discussed their benefits, how they work, and some of their variations. We hope this helps you in your secure software development. For more advice and information, take a look at our other articles about the related analytical processes, such as lexical, data flow, and taint analysis.