Continuous integration

Continuous Integration #

CI/CD integration #

In this chapter, we will walk you through the process of integrating Semgrep into your GitHub repository as part of your continuous integration (CI) and continuous deployment (CD) pipeline.

We recommend integrating Semgrep with GitHub Actions using the following approach:

  1. Schedule a full Semgrep scan on the main branch with a broad set of Semgrep rules (e.g., p/default).
  2. Implement a diff-aware scanning approach for pull requests, using a fine-tuned set of rules that yield high confidence and true positive results.
  3. Once your Semgrep implementation is mature, configure Semgrep to block the PR pipeline if there are unresolved Semgrep findings.

Understanding Semgrep CI configuration options #

Familiarize yourself with the available environment variables and their default values by reviewing the Configuration reference. The following are key points to note:

  • Semgrep checks for new versions by default, as controlled by the SEMGREP_ENABLE_VERSION_CHECK variable.
  • By default, Semgrep sets a five-minute timeout for each individual Git command that Semgrep runs (SEMGREP_GIT_COMMAND_TIMEOUT).
  • Semgrep attempts to scan each file with a 30-second timeout (SEMGREP_TIMEOUT) and retries up to three times (--timeout-threshold).
  • The SEMGREP_RULES environment variable defines the rules used by Semgrep. You can specify multiple rule sources by separating them with a space.
  • By default, the CI process fails if findings are detected but passes if internal errors occur. For more information, see Passing or failing the CI job.
  • See the example job that uploads findings to GitHub Advanced Security Dashboard.

Adding custom Semgrep rules to CI/CD #

When you want to use your own custom rules in addition to the standard rulesets (such as p/default or p/javascript) passed to the SEMGREP_RULES, follow the steps below:

  1. If your custom Semgrep rules directory is in the same repository as the scanned code, just pass the directory path in the SEMGREP_RULES variable: (e.g., SEMGREP RULES: p/default custom-semgrep-rules-dir/)

  2. If your custom Semgrep rules are in another private repository, do the following:

    a. Generate an access token for the repository with Semgrep rules. Remember to select the least scopes necessary (e.g. a fine-grained token for your repository with read-only access over the repository contents).

    b. Add the generated access token as a secret to the repository where the workflow is run.

    c. Add the actions/checkout step in a job after the main source code checkout with:

    • The repository name
    • Personal access token (PAT) used to fetch the repository
    • Relative path to place the repository

    d. Pass the path to the directory with custom Semgrep rules in the SEMGREP_RULES environment variable

    If your repository with custom rules is publicly available, just omit the steps where you create the PAT and do not pass the token in the checkout step.

    For example:

    # Set up an environment variable containing the name of the private repository with custom Semgrep rules
    env:
      SEMGREP_PRIVATE_RULES_REPO: semgrep-private-rules
    steps:
        # Main checkout of the repository source code
      - name: Checkout main repository
        uses: actions/checkout@v4
        # Checkout of the repository with custom Semgrep rules
      - name: Checkout private custom Semgrep rules
        uses: actions/checkout@v4
        with:
          repository: ${{ github.repository_owner }}/${{ env.SEMGREP_PRIVATE_RULES_REPO }} # organization-name/semgrep-private-rules
          token: ${{ secrets.SEMGREP_RULES_TOKEN }} # Configured PAT
          path: ${{ env.SEMGREP_PRIVATE_RULES_REPO }} # Relative path to place the repository
        # ...
      - run: semgrep ci
        env:
          # Pass the directory with the checked-out Semgrep rules repository
          SEMGREP_RULES: ${{ env.SEMGREP_PRIVATE_RULES_REPO }}
    

GitHub integration steps #

Follow these steps to integrate Semgrep with your GitHub repository:

  1. Create a semgrep.yml file in the .github/workflows directory of the repository you want to scan.
  2. Copy the code snippet below into the semgrep.yml file. This workflow is based on two jobs:
    • The first job:
      • Runs on a schedule basis (once per month).
      • Runs when a pull request is merged.
      • Runs when there is a direct push on the main/master branch.
      • Uses the broad p/default Semgrep rule.
    • The second job:
      • Runs specifically for pull requests.
      • Uses multiple security-related rules.
 1# Define the name of this GitHub Actions workflow.
 2name: Semgrep
 3on:
 4  # Run the workflow on pull_request events for diff-aware scanning.
 5  pull_request: {}
 6  # Run the workflow on push events to mainline branches to report all findings.
 7  push:
 8    branches: ["master", "main"]
 9  # Schedule the workflow to run periodically using cron syntax.
10  schedule:
11    - cron: '0 0 1 * *' # Schedule Semgrep to run once per month (at 00:00 on day-of-month 1).
12# Define the jobs that run as part of this workflow.
13jobs:
14  # Define the first job for scheduled scanning and mainline branch scanning.
15  semgrep-schedule:
16    # Define the conditions for running this job. Run on schedule, push to master/main, or merged PR.
17    # Skip any PR created by Dependabot to avoid permission issues.
18    if: ((github.event_name == 'schedule' || github.event_name == 'push' || github.event.pull_request.merged == true)
19        && github.actor != 'dependabot[bot]')
20    # Name this GitHub Actions job.
21    name: Semgrep default scan
22     # Define the environment in which the job runs.
23    runs-on: ubuntu-latest
24    container:
25      # Use a Docker image with Semgrep pre-installed.
26      image: returntocorp/semgrep
27    # Set up an env variable - the name of the (private) repository with custom Semgrep rules
28    # env:
29      # SEMGREP_PRIVATE_RULES_REPO: semgrep-private-rules
30    steps:
31      # Use the GitHub Actions Checkout step to fetch the project source code.
32      - name: Checkout main repository
33        uses: actions/checkout@v4
34      # In case you have a (private) repository with custom Semgrep rules:
35      # - name: Checkout custom Semgrep rules
36      #   uses: actions/checkout@v4
37      #   with:
38      #     repository: ${{ github.repository_owner }}/${{ env.SEMGREP_PRIVATE_RULES_REPO }}
39      #     token: ${{ secrets.SEMGREP_RULES_TOKEN }} # If the repository is private
40      #     path: ${{ env.SEMGREP_PRIVATE_RULES_REPO }}
41       # Execute the "semgrep ci" command within the Semgrep Docker container.
42      - run: semgrep ci
43        env:
44          # Set the SEMGREP_RULES environment variable to specify which rules Semgrep should use.
45          # Use common security-related rulesets for this job (starting with `p/`)
46          # or use a directory with your custom rules from the current repository (such as `semgrep-rules/`).
47          SEMGREP_RULES: >
48            p/default            
49          # If you have a directory in the current repo with your custom rules:
50          # semgrep-rules/
51          # Pass the directory with the checked-out Semgrep rules repository
52          #  ${{ env.SEMGREP_PRIVATE_RULES_REPO }}
53  # Define the second job for scanning pull requests.
54  semgrep-pr:
55    # Define the conditions for running this job. Run only within Pull Requests, excluding Dependabot PRs.
56    if: (github.event_name == 'pull_request' && github.actor != 'dependabot[bot]')
57    # Name this GitHub Actions job.
58    name: Semgrep PR scan 
59    # Define the environment in which the job runs.
60    runs-on: ubuntu-latest
61    container:
62      # Use the GitHub Actions Checkout step to fetch the project source code.
63      image: returntocorp/semgrep
64    steps:
65      # Fetch project source with GitHub Actions Checkout.
66      - uses: actions/checkout@v4
67      # Execute the "semgrep ci" command within the Semgrep Docker container.
68      - run: semgrep ci
69        env:
70          # Set the SEMGREP_RULES environment variable to specify which rules Semgrep should use.
71          # Use common security-related rulesets for this job (starting with `p/`)
72          # or use a directory with your custom rules from the current repository (such as `semgrep-rules/`).
73          SEMGREP_RULES: > 
74            p/cwe-top-25
75            p/owasp-top-ten
76            p/r2c-security-audit
77            p/javascript
78            p/trailofbits
79          # If you have a directory in the current repo with your custom rules:
80          # semgrep-rules/

This configuration ensures that your codebase is scanned regularly for potential issues and that new code introduced through pull requests is thoroughly checked for security vulnerabilities.

This content is licensed under a Creative Commons Attribution 4.0 International license.