Searching git repositories for secrets
Getting .git content inadvertently exposed on websites
Theres a good article about this here.
The simple approach is:
wget --mirror -I .git http://website/.git/
Manually searching for secrets
Clone the repo to get a local copy of all the refs
git clone <address>
If retrieving content from git directly, mirror the repo to get a local copy of all the refs, this will give you a raw copy of the repository and can hide additional things not seen in a clone as shown above
git --mirror clone <address>
Show the commit history
git log
List all the changed files in a commit
git diff-tree --no-commit-id --name-only -r [commit_id]
List all the files present in the repository at the time of a commit
git ls-tree --name-only -r [commit_id]
Show files changed between commits
git diff --name-only [commit_id_1]..[commit_id_2]
Show the contents of a file from a particular commit
git show [commit_id]:[filename]
Search for file contents across all commits (add --no-pager
after git to not pipe through less or similar)
git grep --all -n 'search pattern' $(git rev-list --all)
If the above is performed on a repository with too many commits to list in one command you can get a subset of the first 10000 lines like so
git grep -n 'search pattern' $(git rev-list --all | awk 'NR >=1 && NR <= 10000')
Searching across the repository for a file with a particular name
git rev-list --all | xargs -I '{}' git ls-tree --full-tree -r '{}' | grep target_filename.txt | sort -u
Helper code
Git repositories can be analysed from a security perspective to find secrets in older commits, if you can obtain a locally cloned copy of the repository.
There is some helper code to automate some of the commands needed to do this here.
The code does not do all the work for you and is meant to be used as part of a workflow - its intended to be imported in (e.g. exec) and used from an interactive Python session (e.g. iPython).
The code is essentially providing a wrapper around some of the following commands, making it easier to manage the output thats generated, find the bits you’re interested and present the information in a usable fashion:
- Search the local git repository in the present working directory for a given pattern across all commits and branches
git grep -in 'search_string' $(git rev-list --all)
- Show the contents of a given file of a given revision
git show <revision>/<file>
At a high level the code runs the git command for you (so it needs to be installed and its path set in the “git” global variable), and searches all the branches and commits for a list of strings as specified in the global “words” variable. The results are placed in a structure that you can search through to manually eliminate entries that dont interest you. There are default settings for both of these, but you can change the values to suit.
The analysis process that the code is meant to enable looks like this:
- Run
git_secrets_grabber('PATH')
. This returns a hash that has entries for each discovered repository found under the parent directory (you can put a parent path that contains multiple local git repositories as subdirectories if you wish), and contains child entries for each search term. Child entries beneath each search word are raw, parsed, search, which contain different views of the same data. raw is raw output from the “git grep” command, parsed is the data split by line and field, including information about commits and filenames, and search is unique matching lines only. - Look at “search” output for each search word, identify the lines you are interested in, pick a unique part of the line of interest, and add those to a list in a new “results” key for each repo:word hash entry. This is essentially eliminating lines that are not of any security concern - false positives. There will likely be a few.
- Once you’re done, process the modified results hash with the
gitsearch_results_parser
function - it creats a new output structure with the relevant information about each of the lines you are interested in - which line number in which file in which commit. This is mainly intended to be a storage format, there are helper functions to output the results in a few useful ways. - The results at any stage of this process can be saved out to disk with either of the
json_file_write_gz
orjson_file_write
functions (and read back in with the read equivalent). The “gz” functions obviously make a smaller (compressed) output file and make it harder to check the output in a text editor (if you were so inclined to view it that way). - The output from
gitsearch_results_parser
in step 3 can be further parsed with thegit_results_file_data
function or other code to display the results in various ways.
Some examples, where variable gr
contains the output from gitsearch_results_parser
:
- Show files for each repo containing secrets
print('\n'.join(sorted(gr[repo].keys())))
- Show matching lines and line numbers with content matching the search terms, check you are outputting correct results
git_results_file_data(gr)
- Show a breakdown of matches and commits
git_results_file_data(gr, contents=['matches', 'commits'])