Regex is a useful string of characters that defines a search pattern. In the security world, regular expression plays a crucial role as a security measure. They are used across multiple infrastructure layers of a corporation to implement:
Regex can customize malware detectors to identify and isolate dangerous file content. This is one of the most common uses for Regex, by searching for patterns in files and databases.
Validating user input
Accepting any user input opens an application to a wide array of vulnerabilities, such as an SQL injection. Regex filters and clears this input as a defense against possible attacks.
System firewalls can be given an added boost with Regex. It is used to create custom rules that can be implemented to block certain requests from specific IP addresses, for example, or even known malicious agents.
Creating patterns to help match and locate text, many regex patterns are popular for their utility. Incorrect composition of Regex can lead to vulnerabilities that can allow attackers to siphon sensitive information.
Some of the most basic tools Regex utilizes include:
- sed: This allows you to find and replace a text. Providing a match and the text with which to replace will cause the action to occur
- grep: This filter tool allows filtration of regular expressions. A large amount of text can be filtered to produce only the wanted results
Regex quantifiers indicate the numbers of characters or expressions that need to be matched. They match the number of instances of a character, group, or class in a string. They match the preceding elements between n and m times, where n and m are integers.
Open redirect filter bypass
Vulnerabilities caused by a faulty Regex include open also include open redirect filter bypass. The app prevents open redirect by requiring specific criteria to be fulfilled:
- The URL must contain “sample.com/”
- The extension at the end should be image-related, such as jpg or png
A Regex that is too permissive such as this:
Allows for too much room in user input, as the “.*” pattern will match any character. Such an open filter exposes the system to vulnerabilities. We can overcome it with:
Matching Email Addresses
Email address restrictions are generally quite extensive, making these Regex a little harder. While there are no valid email Regex, some that can get you on your way include:
This is where the expression starts, which can then be customized. The expression will have to be tested against a valid list of email addresses.
A general Regex code to match URLs may be:
As per URL requirements, a specific format helps generate the expression. It matches the:
- Query strings (preceded by ?)
- Fragments (preceded by #)
This is not a foolproof expression by any means. It is meant to act as a starting point on which further modifications can be made as needed.
Special characters can be included in the criteria as well:
Enforcing the presence of certain characters, such as a number, a positive lookahead is used (?=…). This can be exemplified in the following expression:
This expression ensures that the password contains:
- At least one upper case letter [A-Z]
- At least one lower case letter [a-z]
- At least one number [0-9]
- At least one special character
- At least 8 total characters
SSRF blacklist protection
The Server Side Request Forgery (SSRF) occurs when a malicious attacker can send requests acting as the server. The vulnerability allows attackers to fake or ‘forge’ the request signature of the actual server. This privileged position allows access to a secure network, bypassing any firewalls and gaining internet access.
A legitimate image request, for example, looks something like this:
SSRFs are prevented by the website by rejecting img parameters which have certain URLs in a blacklist. The corresponding Regex looks like this:
This pattern is intended to check all user input against the blacklist of local IP addresses. Requests are denied if a match is found. However, this is not a foolproof method. Another possible IP address case is “0.0.0.0”, which is used to refer to the local machine. This protection can then be bypassed using the expression:
Limitations of Regular Expression Testers
As with every other process, Regex has its limitations. Software security and testing to locate vulnerabilities and guard against them is an essential aspect of development, however; this should not be done with limitless expectations of the tools in use.
Testing software for potential vulnerabilities focuses on identifying hidden errors. While this white-box testing lets you uncover many errors, it is limited to those that are known. The approach is never 100% foolproof, as it gives no information about uncovered errors.
Cannot Guarantee Functionality
Additionally, while software testing is designed and intended to improve the final functionality of the product for the end user, it is never a guarantee. While it can identify improper functioning, it by no means should be taken to assume that all possible errors have been identified and resolved. Exhaustive, rigorous testing still cannot predict with absolute certainty the functioning of the product.
Clash of Resources
The clash of resources in software testing adds another major limitation to its role and effectiveness. Thorough testing requires an adequate time frame and attentive planning to be effective in its goal. Budget priorities however make this difficult, and the compromise is usually a more hurried testing plan to stay within budget limits. Any quality goals which would considerably improve functionality require time and effort that such constraints are unable to provide.
Incomplete Testing Protocol
As testing cannot be conducted within system requirements and only against them, it leads to an incomplete testing protocol. Errors also cannot be detected, and levying unrealistic expectations of tools to conduct such tests wastes both time and effort. Meanwhile, the testing process remains inadequate to fulfill security demands of the project.
In the same vein, time, budget and effort requirements are often miscalculated in accordance with what the project and the testing protocols require. This ends up costing the project, as the tool cannot be used to its full potential, nor can rigorous testing be carried out. That said, projects often incorporate too much dependence on testing tools, thereby using them as guarantee of secure and sufficient testing. This leaves potential vulnerabilities in the system that are overlooked and never tested for.
Impossible to Test Every Aspect
The greatest limitation of any test protocol is that it can never be exhaustive enough. Every last cent of budget and time limit could be used up without ever running out of testing scenarios. It is not possible to test every path, or every valid or invalid testing input. This leaves the system always open to vulnerabilities. Even if the very last bug is found and resolved, there is no possible way to know or confirm this, leaving further progress dependent on assumptions and incomplete testing.
The lack of absolute proof means no claims of the correctness of the product can be made. It is not possible to test every formal specification and ensure it is correct, based on which this claim can be made of the product. As a result, testing remains inhibited to the constraints of the testing tools and the testers’ ability to implement sufficient testing scenarios.
A regular expression is a string of characters that defines a search pattern. It’s extremely helpful in terms of security, because it can be used across multiple infrastructure layers of a company. Here, we’ve explained how they work, what they do, and what their limitations are. For more guidance, be sure to check out our other articles on secure software development.