Katherine Lim
- Apr 30, 2023
- 5 min read

How to find leaked secrets in Git repositories

In today’s digital age, cybersecurity is of utmost importance as cyber threats have become increasingly frequent and sophisticated. With more businesses and individuals relying on the internet for their day-to-day operations, cyber-attacks can result in significant losses, including financial, reputation, and legal liabilities. Therefore, it is essential for organizations to implement robust cybersecurity measures to protect their assets and information.

Those responsible for information technology and security, such as CISOs, CTOs, and engineers, must be vigilant in identifying potential vulnerabilities and implementing measures to prevent cyber-attacks. One critical aspect of cybersecurity is secrets management, which involves managing critical components such as passwords, API keys, and certificates to secure an organization’s applications, infrastructure, and data. Failure to manage secrets effectively can result in unauthorized access, exposing organizations to potential cyber-attacks.

A common mistake that engineers make is storing credentials such as API keys and passwords in source code control. This error can expose sensitive data to malicious actors, as seen in the case of Uber’s data breach, where hackers stole data on 57 million Uber users and licence details on 600,000 drivers due to the inadvertent checking of an AWS API key into a GitHub repository. Uber paid a hefty price of $100k ransom to the hackers, and an additional $148 million to settle civil litigation because of failure to disclose the breach.

To prevent such incidents, it is crucial to implement a robust secrets management strategy that ensures the confidentiality, integrity, and availability of secrets throughout their lifecycle. In the following sections, we will discuss the importance of secrets management in more detail and provide insights into best practices for implementing a secrets management strategy.

The Problem

Engineers need to automate using the credentials when working with these systems. Secrets such as passwords should be stored in secrets management systems, where they can be retrieved securely.

The problem occurs when credentials, especially passwords, are entered in source code files. When the source code is saved locally and then committed to a source code control system (SCCS), such as Git and then uploaded to a remote SCCS such as GitHub, then the secrets could be inadvertently leaked.

The Solution

To find credentials in source code, there are 3 steps:

Choose a scanning solution.
Scan source code before committing it to source code control.
Scan source code already in source code control. Credentials may have already been leaked even if deleted - source code control ensures that the secrets are still in the commit history.

Remediation of credentials found in source code will not be covered in this article but some options to look into would be BFG Repo-Cleaner or using git filter-branch.

GitHub already scans public repositories for credentials provided by a select number of service providers, e.g., Amazon Web Services (AWS), Atlassian, Azure, Google Cloud, etc. When a secret is detected, the service provider will automatically revoke the secret, and the owner(s) of the repository will be alerted. For enterprise users of GitHub, secret scanning is an option that comes with a GitHub Advanced Security (GHAS) license.

Organisations can also select from a range of other products to scan for credentials, for example:

Git-secrets from AWS labs
Repo-supervisor from Auth0
TruffleHog from Truffle Security

Git-secrets

Git-secrets primarily scans for AWS keys. It’s possible to add other providers but the only provider available “out-of the-box” is for AWS. Pre-commit hooks for Git are provided. This tool runs in Git, so it will prevent developers/engineers/etc committing secrets to a Git repository.

Repo-supervisor

Repo-supervisor runs using a GitHub repository webhook. Therefore, it can scan GitHub Pull Requests to prevent committing secrets. It can also be used from the command line to scan a Git repository directory for secrets. Only JSON, JavaScript and YAML files are supported.

TruffleHog

TruffleHog also uses a pattern-based approach and has over 700 credential detectors. It supports scanning GitHub, GitLab, file systems, and S3. One of its features is being able to scan an entire GitHub org.

How to scan a GitHub org for leaked secrets using TruffleHog

TruffleHog supports scanning an entire GitHub org, here’s how to do it.

Step 1

Configure a GitHub Personal Access Token (PAT), GitHub tokens cannot be allocated enough permissions to perform the scan. A PAT is required to access https://api.github.com/user.

1. Navigate to https://github.com/settings/tokens or via profile photo > Settings > Developer settings > Personal access tokens > Tokens (classic) > Generate new token

2. Choose a name for the token that will identify it is used for trufflehog and the following permissions: repo, read:org and gist. Save the generated token for the next step.

Step 2

Choose where to run TruffleHog. Scanning an established GitHub org with more than a hundred repositories varying in size can take hours. For an initial Proof of Concept, a modern laptop with a reliable network connection to GitHub should be sufficient. To implement a scheduled scan, run the TruffleHog container in Kubernetes or a Virtual Machine instance.

Trufflehog can be installed from Homebrew on a Mac:

brew install trufflesecurity/trufflehog/trufflehog

or run via Docker:

docker run -it --rm ghcr.io/trufflesecurity/trufflehog:latest [arguments]

Mount the local directory using -v "$PWD:/pwd" if you need to reference local files, e.g., for config or output.
Add --platform linux/arm64 if you are on an M1 Mac to use the native image for better performance.

Step 3

Run the TruffleHog scan. To scan a GitHub org, here’s an example command:

trufflehog github --token="ghp_XXXXX" --org=innablr --only-verified

The example command scans the “innablr” org for only verified keys, an example of trufflehog output is:

For an org scan which could present many results, JSON output can be more easily processed with tools like “jq” to filter out false positives or saved into a json file to be viewed later:

trufflehog github --token="ghp_XXXXX" --org=innablr --only-verified --json > trufflehog-output.json

Conclusion

Sensitive data like credentials could be inadvertently leaked by an unintended commit to a Git repository hosted in the cloud on systems like GitHub. Leaked credentials could be used by an attacker to compromise an organisation’s network, or sold on to other cyber criminals. Use a secret scanning solution to find those secrets before they can be exposed. Scan existing repositories for hidden secrets in the commit history. Finally, remediate by ensuring that the secrets are deleted from the source code and revoked from use. Source code can be considered to be one of the foundations of DevOps Engineering. Innablr can help you to rapidly build a secure cloud foundation, get in touch to discuss more.

Innablr prioritises security in all of its Cloud and DevOps consulting services. With the increasing threat of cyberattacks and data breaches, it is essential to implement robust security measures to protect sensitive information. As a result, Innablr’s team of experts ensures that security is integrated into every step of the process, from initial planning to deployment and ongoing maintenance. By prioritising security, Innablr helps its clients mitigate risks and ensure that their data and systems are protected at all times.

Acceleration

Cloud Native Platforms

AWS Cloud Migration

DORA Acceleration

Engineering Excellence

Security

Site Reliability Engineering

Efficiency

Sustainability

FinOps

Google Cloud Migration