Semgrep - More Than Just a Glorified Grep
Written by Nikos Vaggalis   
Tuesday, 26 May 2020

Introducing a tool to search through code for flaws where plain regexes fall flat and using Static Application Security Testing would be overkill.

Semgrep proclaims itself as:

"a tool for easily detecting and preventing bugs and anti-patterns in your codebase. It combines the convenience of grep with the correctness of syntactical and semantic search".

It isn't just a glorified grep, though. It occupies a space somewhere in between grep and a SAST tool - more expressive than grep, but not as hard to tweak and learn as a SAST.

An example that showcases its ability that goes beyond the boundaries of simple grepping for a pattern, is when looking for a file handle that is opened but not closed. That is, I want to know that after a $FILE = open(...) somewhere in the flow of the code there's also a $FILE.close().

In this case grep would fail because regular expressions will only take you so far and it has to work across multiple lines, but also because grep can't work in the broader context and monitor the flow of the code.

The simplest, but still extremely powerful, pattern that Semgrep offers in this case is the Ellipsis operator which with a rule written in YAML like the following, can satisfy the condition of looking for the missing $FILE.close() call:

  - id: open-never-closed
      - pattern: $FILE = open(...)
      - pattern-not-inside: |
          $FILE = open(...)
    message: "file object opened without 
corresponding close" languages: [python] severity: ERROR

This rule looks for files that are opened but never closed. It accomplishes this by looking for the open(...) pattern and not a following close() pattern.

The $FILE metavariable ensures that the same variable name is used in the open and close calls. The ellipsis operator allows for any arguments to be passed to open and any sequence of code statements in-between the open and close calls.

We don't care how open is called or what happens up to a close call, we just need to make sure close is called.

Another example provided is looking for lines with a call to setcookie() but catering for all instances of the function since it can accept a variable number of arguments.

Semgrep's rules can be as simple as $X == $X which looks for false equality such as if ( == where the coder actually meant if == '', but can also also more complex, like the example already gone through.

What's even better is that there's a whole rule registry where you can find all sort of rules to check your code against in the supported languages of Python, Javascript, Go and Java.

Example rules on Java:

Detected a potential path traversal. A malicious actor could control the location of this file, to include going backwards in the directory with '../'. To address this, ensure that user-controlled variables in file paths are sanitized. You may also consider using a utility method such as:
to only retrieve the file name from the path.

Rule Pattern:

- pattern-either:
  - pattern: |
      $RETURNTYPE $FUNC (..., @PathParam(...) $TYPE $VAR, ...) {
        new File(..., $VAR, ...);
  - pattern: |-
      $RETURNTYPE $FUNC (..., $TYPE $VAR, ...) {
        new File(..., $VAR, ...);

on Javascript: 

User controlled data in 'yaml.load()' function can result in Remote Code Injection.

Rule Pattern

- pattern-inside: |
    var $X = require('js-yaml');
- pattern: |

and so on.

The great thing with such as collaborative registry is that you can leverage the expertise of people writing rules in their domain of knowledge, submitting them to registry for others to import and reuse. Of course, the rules are customizable too.

Apart from writing rules for finding security bugs, Semgrep can also be used to enforce code specific patterns, best practices and scan PRs for vulnerabilities.

At the HELLA Security conference, Drew Dennison of r2c, the maintainer of the tool,demonstrated the tool's power by running it live against Apache's Libcloud GitHub repo against the pattern $X == $X which found an actual bug in the codebase! He then had to open a PR to notify the maintainers of the repo.

You can do it yourself too and scan your GitHub repos through Semgrep's Live Editor at and its Scan option.In there you can also play with examples and rules to get the feeling of it.

I actually run a scan of my own Android repo against Java rules java.lang.correctness and After a few nail bitting moments moments the results came out clean.What a relief!

General language rules aside there's also domain specific rules you can use such as on java.spring.This means that when an expert in that domain writes a rule and submits it to the registry you can take advantage of it right away.

The tool is distributed as binaries for macOS and as a script for Ubuntu.For all others there's also a Docker image.

So open source, better than grep and simpler than a SAST and with a much nicer price tag - free!


More Information

Semgrep on GitHub

Live editor

Rule registry

Apache LibCloud


Related Articles

EU Bug Bounty - Software Security as a Civil Right
Exposing The Most Frequent Mistakes In Programming


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, Facebook or Linkedin.


Apple Announces App Privacy Revamp

Apple has released details of an overhaul of app privacy disclosure that will affect any developer with an app on the Apple Store.

AlphaFold Solves Fundamental Biology Problem

Back in 2018, we reported on DeepMind's attempts to create a neural network that would predict protein folding. Now we have the news that it so good that scientists are queuing up to use it. What is t [ ... ]

More News





or email your comment to:

Last Updated ( Tuesday, 26 May 2020 )